Switching Our Corporate Version Control System to GitHub

Moving our VCS to GitHub is the best decision we’ve made in the last year. The benefits are huge, the drawbacks are minor. We now have 16 closed-source repositories hosted on our GitHub organization, plus a few open-source ones that don’t count for the billing. Read on for more details about our GitHub switch.

Context

At eTF1, we’ve been using Subversion and Redmine for a long time to track our source code and configuration files. The collaboration tools are important because more than 40 people interact with the code at any given time. Project Managers create Redmine tickets for issues and tasks, architects comment on a ticket to discuss the best implementation, developers systematically refer to a ticket when committing, and lead developers check the code quality everyday by randomly inspecting a few commits from each developer they follow.

About a year ago, we switched to Git to ease branching and merging for all new projects. Subversion clearly lacks on this side, and we wanted to force the "one feature, one branch" rule on our Software Configuration Management system. Redmine started showing its limits for code reviews with Git, and most of the code modifications on a branch remained hidden until the moment the branch was merged.

So we decided to switch to GitHub, the hosted version (not the GitHub Enterprise version, too expensive for us), in January. We foresaw possible issues about storing our code outside of our company network, about the need to manage a list of users in yet another place, about the difficulty to manage branches, forks, and organizations, and about the ability to reach GitHub servers from behind a firewall. We had great expectations regarding the collaboration workflow and the quality of code.

Pull Requests are GitHub’s Killer Feature

GitHub’s pull request workflow is awesome. It invites developers to discuss about the implementation of a feature, improve this implementation, notify other developers about architecture concerns in the context of the feature. It’s the perfect tool for lead developers to control the quality of the code, not by reviewing a subset of the committed code (commit by commit), but the entire implementation of a feature (in a feature branch).

We kept Redmine for non-technical ticketing (tasks, stories, bug reports). Managing two ticket lists sounded like a bad idea at first, but it turns out they are used by different people in our organization. Redmine is used by the customers and the project managers, GitHub PRs (we didn’t enable Issues) by the developers and the release manager (more on that role shortly). We still force every commit message to reference a Redmine ticket number, and it works fine.

Thanks to GitHub’s Pull Requests, the overall code quality now improves, while all we could do with Subversion was to decrease the growth rate of the technical debt. That’s a big deal, and it's enough to motivate the switch. But that's not all.

Release Manager

GitHub has created a new role in our organization: the Release Manager. Each project has its own RM. This person reviews and merges the PRs, and prepares deployments to production. Each project typically has a master branch, which is equivalent to the production code. All feature branches are merged to a develop branch, which is merged to the master branch before pushing to production. Hotfix commits may happen in the master branch, too.

The release manager reviews every single line of code merged to the develop branch through PRs. In the past, only critical parts of the code and a subset of the rest passed through peer review. We had one lead developer for roughly 8-10 developers; now we estimate that we need one Release Manager for roughly 4-5 developers. Code quality comes with a price; the switch to GitHub made this price apparent and affordable. Also, since every line of code is reviewed, developers learn best practices more quickly.

We also use a continuous integration system (Jenkins) to monitor the develop branch (we’re currently looking into a way to run the test suite on PRs, similar to what travisbot does). With a detailed understanding of the code and the confidence coming from a comprehensive test suite, the Release Managers can deploy easily, even very large portions of code. The projects using GitHub at eTF1 deploy to production about twice a week.

Timeline and Graphs

Just like Redmine, GitHub offers a timeline for the whole organization – it’s called the News Feed. It’s great for the organization manager to have a bird’s eye view of the whole activity, but it’s not yet granular enough (GitHub, if you read this, please add News Feeds per repository).

We’ve chosen to avoid forks, because the activity in a developer’s fork doesn’t appear in the timeline, so we would only discover the changes when the developer creates a pull request. Also, access rights managements with fork was a nightmare. Instead, all the developers work on the central repository, and create feature branches.

In addition to the timeline, we use the Network Graph a lot. It’s the tool that convinces every new developer of the interest of feature branches, as it shows very clearly who did what for which feature. Now, when we draw the technical steps for a new release on a whiteboard, we tend to draw it just like a GitHub Network Graph – we think in terms of branches, and that makes releases very easy.

The other graphs GitHub offers are also a great help to measure the progress of projects and of developers. They are a welcome addition to our static code analysis tools.

Hosting Source Code Outside

Our security experts were reluctant to use the hosted GitHub service for our corporate code. What if the code that runs on our server was to fall into the hands of our competitors, or of hackers with any reason to harm us?

Let’s face it: a developer can always copy the code from his own checkout to a USB stick and publish it on the Internet, or send an archive to a private email to do all the bad things he/she could think about. Having the code in your premises doesn’t protect you from leaks or attacks. Trusting your teammates is the only path.

I estimate that the probability of a GitHub breach is roughly the same as the probability of a breach into our own network. These guys take security seriously (https everywhere, Responsible Disclosure of Security Vulnerabilities), and their servers are probably much more hammered by script kiddies than ours (although we have our share of that kind of attacks).

Lastly, most of our code isn't so secret. We open-source libraries that are enough documented and tested, and that may benefit to others. The frontend part (HTML, JS and CSS) is already public. The rest is of poor interest to anybody except eTF1. And the data is secured in a completely separated infrastructure. To sum it up:The key to a good plate is not the recipe, it's the cook.

Once we understood that, hosting our code beyond our corporate network wasn’t that big a deal. We had to open specific ports on the firewall to reach GitHub by SSH (of course, restricting the target IPs to GitHub itself). We had to tell every developer not to fork and open-source our repositories (GitHub, if you read this, please remove the ability to open-source a repository if it is a fork of a private repository). We had to setup security groups on GitHub to allow a per-project read/write access. This can be a hassle to replicate if you already have an internal ActiveDirectory, but we tend to be pretty liberal on developer’s access to code.

Storing Passwords in the VCS

Since we were storing our VCS inside our corporate network before switching to GitHub, we had the bad habit to let passwords in the configuration files. Moving to GitHub implied cleaning the configuration files of passwords in their latest version as well as in the whole history.

We removed passwords from configuration files and put tokens instead. The actual passwords are stored in a private (unversioned) folder, and the tokens are replaced by the passwords during deployment (we use Capistrano for that).

We’re not the first ones to deal with the issue of rewriting the git history to remove sensitive data, and the GitHub blog published a tutorial on the subject. With a little help from git filter-branch and a custom script, we managed to reset the history of all sensible files. There is a drawback: all the commit hashes changed, and all non-committed work is lost. But that’s the price to pay when a project is not secure from day 1.

Billing and Customer Service

One disadvantage of GitHub private repositories is that you can only pay with a credit card. My company doesn’t supply with corporate credit cards, so I paid GitHub with my own card and asked a company refund afterwards. To avoid too much paperwork, I asked GitHub to charge me once per year instead of once per month. They switched our organization to a yearly billing diligently.

We had a surprise when switching plans. We first bought 10 private repositories (Bronze Plan, $300/year), then upgraded to a Silver Plan ($600/year) for 10 more repositories a couple months after. The new repositories became available instantly, but my credit card was never charged. The customer service explained me that the Silver Plan would be billed at the end of the Bronze Plan, which means we could use the 10 new repositories for free for the next 10 months. This may sound great, but since I was paying with my personal credit card, I didn’t want it to be debited of $600 in ten months time, when I wouldn’t expect it. So I asked the customer service to debit me immediately, which they did promptly.

This leads me to the Customer Service itself. The GitHub staff I’ve been in contact with is extremely helpful, efficient, and comprehensive of our specific requirements. Most important: you deal with real people with real names, not with an anonymous service. When a member of the support staff takes your case (which happens in minutes, even for us Europeans), you get in touch with this person each time you interact on the case. And their CRM tools seem good enough so that your customer history is quickly available to all the support staff. I must say that I didn’t expect such a good customer service from a company with a reputation of automating everything they can, but they really take customer relationship seriously.

Developer love

GitHub hosts some of the most popular open-source projects (Symfony, Node.js , jQuery, and of course Propel, to name only my favorites). Developers already know GitHub for using these projects, and they tend to love its features. Forcing a developer to come back to Redmine after using GitHub is just like forcing a developer to use Subversion after using Git.

Some of our developers were not yet users of GitHub. Asking them to be part of our organization gave them the opportunity to discover and watch other repositories, and I’m convinced that seeing the code produced by others is one of the best ways to progress in programming. GitHub is also a social network for developers, where our staff can meet and exchange with peers. And since GitHub is the best public code repository, it’s a good insurance that we pick the best tool for the job.

Externals

We used to manage dependencies between libraries using svn:externals. When switching to Git, we first tried git submodule. It was painful. Then we switched to packagist for our PHP projects, and npm for our Node.js project. These are the right tools for the job of managing dependencies – a VCS isn’t.

The same goes for deployment, but we’ve long switched to Capistrano for this task instead of using and VCS and a custom build script.

Conclusion

You have to switch to GitHub. If you’re not convinced yet, it’s probably because CVS already fits your needs (!). Or it may be because you’ve found a better (and probably more expensive) alternative to Redmine.

If you’re considering moving from Subversion to Git, take the bigger step and upgrade directly to GitHub. You won’t regret it, your developers will love you, and one of the most brilliant start-ups of these last 4 years will earn money thanks to you.

Published on 20 May 2012 with tags management not technical

comments powered by Disqus