Overview
Developers use a “version control system” to both track their changes to code, and to allow sharing those changes with other developers. Over my career, I’ve used many proprietary and free systems, but in the past decade or so, most of the industry has settled on the tool “git,” very often hosted on sites like GitHub.com.
If you’ve never used a version control system, you can think of it sort of like an extra-powerful shared document. Our projects have thousands of files in them, and multiple developers may be working on the same parts of a project at once, and making many interdependent changes to multiple files simultaneously. To facilitate this, they will “branch” the code, make changes, and then subsequently merge it back into the main line.
Over the years, various systems I’ve used have had different names for this main branch - “trunk,” “base,” “main.” Unfortunately, git’s default name was “master,” which is not inclusive. As bad as this usage is, it gets even worse with “master/slave” for a running system and its backup - thankfully the industry is moving to replace those with “primary/secondary.” Google has put together a useful resource for developers about inclusion in technical naming.
In fall of 2020, GitHub added support for renaming the base branch on an existing repository, and made it so that the default for new repositories was “main.” I’d been idly meaning to look into this functionality, but one of our developers, Isaiah, prompted me on our plans for it. I went and researched the status out at GitHub and discovered that, while it was possible to do, it was still difficult - you needed to do a bunch of manual steps on the repository before the rename. However, they promised they’d have a one-step tool to do it for you released “by the end of the year.” We have approximately 250 repositories, so I decided we’d just wait for the tool to get it done more easily.
2020 came and went with no tool. I don’t know exactly what date they released it, but by the time I thought to look towards the end of the first quarter, we were in our last push to get our quarterly commits completed, and I decided to put it off until the second quarter. Once we got through planning for the quarter, though, I got to work trying to figure out exactly what needed to happen.
At the high level, it’s reasonably simple. You navigate to your project’s “Settings” page, then choose “Branches” on the left, then simply click the “Default branch” box and type the name of the new branch. Since new projects had by default been named “main” since fall, it was an easy choice to have that be what we changed to for everything. That way everyone’s muscle memory would be the same.
As an aside, GitHub has done a good job of making “master” be a synonym for “main” on changed repositories. So if you have scripts (or muscle memory) that refer to “master,” you don’t have to go through the difficult process of changing everything simultaneously - just change the repository default branch, and clean up the rest of your code after.
The Fix
With so many repositories, it was hard to know where to start. The first thing was to get all of them locally so we could use Unix command-line tools on them. A little searching turned up a Python library to check out all repositories in a single organization:
pip3 install gitimpython3 -m gitim -o <organization>
This then enabled us to discover something of the scope of the problem:
Initially for us, this number was about 240.
Unfortunately for a lot of our projects, it was more complicated. We use CircleCI for our integration and deployment, and a common pattern in our configuration files was to say “you can’t deploy to production except from master.” The first step, before we did anything else, was to change this to be “you can’t deploy to production except from master OR main.”
Next step was to find these particular issues:
grep master */.circleci/config.yml
This listed about 40 projects which included “master” in such a file. Unfortunately, the only solution was to have a developer look at these and fix them. In most cases, we were able to allow deploys from “master or main,” again, so that we wouldn’t have to carefully coordinate all the changes simultaneously.
We made a special Slack channel for volunteers to come together and crowd-source this. We made, reviewed, and deployed all these changes, and, in a couple of days, we were able to get the hardest of it done. As they were completed, I went into the repositories and renamed the default branches.
Finally, this left about 200 repositories that didn’t need any changes except to have their default branch renamed on GitHub. Unfortunately, you need to be a project owner to do this - and there are very few people at OneSignal who own every project.
So, on our quarterly Hackathon on June 2, I resolved to wrap this up. The workflow was as follows.
First, make a list of the configuration pages of all the project you need to fix:
ORG=OneSignal
grep main */.git/HEAD | perl -pe "s/^(.*?)\/\.git.*$/https:\/\/github.com\/$ORG\/\$1\/settings\/branches/" > to_fix
Then, edit “to_fix”. Start at the first file, hit <enter> to make a gap. Copy-and-paste that link into your browser, and change the default branch to “main.” As you complete each project, cut-and-paste it to above the line, so you know where you are in the process.
Once that’s complete, execute the following bash script (I called it “remain”):
#/bin/sh
if [ $# -ne 0 ]
then
cd $1
fi
git branch -m master main
git fetch origin
git branch -u origin/main main
git remote set-head origin -a
if [ $# -ne 0 ]
then
cd ..
fi
So, for example:
remain OneSignal
Execute it from the directory that all your repositories reside in. This will re-base your local copy from “main.” If everything goes well, congratulations, that one is done! Move onto the next one. If it emits errors, you’ll need to figure out what the issue was and fix it. Across 200 files, maybe five times I had failed to actually rename the main branch before I ran it. And once, I renamed the main branch to “remain.” Oops!
As you move along, you can check your progress with this script (which I called “remaining”):
#!/usr/bin/perl -w
$master = `grep master */.git/HEAD | wc -l`;
chomp($master);
$main = `grep main */.git/HEAD | wc -l`;
chomp($main);
$other = `grep -v main */.git/HEAD | wc -l`;
chomp($other);
$other -= $master;
$total = $master + $main + $other;
printf("%i / $total = %0.2f%% complete; %i left to convert\n", ($main + $other), (($main + $other) / $total) * 100, $master);
Issues
- Often you’ll see a brief message from GitHub that “main already exists” as you make the change. This seems to be a simple race condition, and is benign.
- Sometimes you’ll get the error “Could not rename branch ‘master’ at this time: delete the branch protection rule for ‘main’ and try again.” This is because you have a branch protection rule for master, which GitHub has for some reason already duplicated to main. The tool wants to rename the “master” rule to “main,” but can’t, because such a rule already exists. If you simply delete the “main” rule, and then rename the default branch, GitHub will rename the “master” rule to be a “main” rule for you, and you’ll be done.
- This is idiosyncratic to us, but on the very last repository we changed, it broke deployments. That was when we realized we had some Ansible rules that also explicitly referenced “master” in a way that wasn’t compatible with GitHub’s magical synonyms. So that might be another place to look for possible problems before you roll things out.
While I wish there had been an obvious command-line way to iterate these repositories, it seemed to depend on actually using the GitHub web interface to make the change; there was not an obvious, clean CLI alternative. It might’ve been possible to write a command-line client that would “hit the button” automatically, but I was a little worried about that not working perfectly, and getting repositories into an inconsistent state. And, even for this large group of projects, it was only a few hours of tedious work, improved by good music in the background.
Summary
I’m very glad we were finally able to exorcise this demon from our codebase. It was a little tedious in places, but it was certainly well worth the effort, and I’m very glad our industry has grown to the point of realizing the importance of this sort of work. Also very proud of all the folks on all the engineering groups that rolled up their sleeves and got this done!