The decision about using a monorepo vs multiple small repositories is complex and involves a lot of variables. Ultimately it boils down to the development practices your team follows and what works best for the services you are trying to build. At Bolt, we had multiple small repositories, and after a lot of deliberation decided to move towards a monorepo and start by merging our two biggest repositories into one. These repositories, named storm and hail, had our front end and back end code respectively and contributed to > 75% of the changes to code at Bolt over any given week.
This article goes over how we combined two repositories while maintaining our git history and having minimal downtime where engineers could not commit code and send out reviews.
Identifying the MVP
When we decided that we were going to merge these repositories, engineers were pretty excited, no more copying over the graphQL schemas from one repo to another, we could share quite a few things and there was an immediate laundry list of the problems this would solve.
Now being a startup, we couldn’t afford the luxury of a team working on this merge. If we were going to keep this merge successful we had to identify the quickest, cheapest way of getting there. So we started off by identifying what would qualify a successful merge, and also understand if this was even a viable project.
We came up with the following criteria
- Maintain git commit history across both repositories
- Merge both repositories with storm code under a storm directory and hail code under a hail directory.
- CI pipeline works from the new repo
- Deploys work from the new repo
We kept aside all problems like merging schemas for the future. Once we had a “monorepo”, these would be simple enough to accomplish.
Prepping for the merge
Each of the 5 criteria we came up with needed some work to be done and we spent ~1 month making sure we could do all of them. The first thing we did was schedule the merge day. We put aside 2 days, day 1 for the merge, and day 2 to re-deploy the current commit and make sure everything worked as expected.
Building the CI pipeline
We use CircleCI for our CI at Bolt, and the first thing we had to do was design a CI workflow that worked for the two repositories put together. While a very simple version could run two Circle workflows one for the frontend (storm) and one for the backend (hail), there are a lot of optimizations to take into consideration.
We didn’t want to run backend tests if only frontend changes were in the PR. We also wanted to build binaries / assets only if both frontend and backend tests passed on master.
We designed the monorepo workflow and iterated till we got a working version in a test repo with a snapshot of the codebases. We backported as much of the workflow changes to the current repositories and hid all monorepo specific logic behind the environment variable
We also extracted all non Bolt specific Circle logic to an orb. We started this before we moved to a monorepo, and this was helpful with our integration testing. Read more about that here. An example of the CircleCI build pipeline we use for our monorepo using Swissknife can be seen in this repo. The idea here was to only run the parts of code that were modified in order to keep runs short. If server side code was modified, only server code was tested, and client code needn’t be tested.
A few times before the actual merge day we made a test monorepo and copied the source code in the structure we were going to set up the repo. We skipped the git history etc as the main goal here was to test the build pipelines.
Another thing we had to do was write up a custom git hook management scheme for the monorepo. This accounted for code modified in one repo not the other or code modified in both repos. This was also something we checked this in hidden behind the
We manage all our deploys from Jenkins and accounting for a new repository wasn’t too hard. In our case, we stored our deploy scripts in their own repo. We needed our current deploys to work while also making sure we were ready for the monorepo merge day. Similar to what we did in our two source code repositories, we flagged monorepo related deploy changes behind an environment variable called
A few times before D day we tried deploying from a fake monorepo which had source code to make sure the deploys worked right.
Once we were done with this we had ensured the build pipelines worked and the deploy scripts were also in place.
D: Day – Merging two git repositories
The following are the steps we followed:
- We first stopped all commits to our two main repositories.
- We then made a monorepo branch in each of the repositories, storm, and hail where we moved all contents of the storm repo in a storm folder, and the hail contents into a hail folder
- We made an empty repo which would be our monorepo
- We ran the following commands on our monorepo which merged git histories and brought both repository contents into our main repo.
# Add hail remote git remote add hail email@example.com:BoltApp/hail.git # Add storm remote git remote add storm firstname.lastname@example.org:BoltApp/storm.git git merge --allow-unrelated-histories hail/monorepo git merge --allow-unrelated-histories storm/monorepo
- At this point, our directory structure looked as follows and we had the contents of hail and storm within one repo and our git histories intact!
/ - storm/... - hail/...
While we could have simply called it quits here, we were at a point where we could alter history! Let’s face it, there’s a point in time where we all have wanted to alter history and haven’t had the opportunity, but this time we did!
One of the things we had wanted to do was clean up large files which had somehow made their way into the git repo in the earlier days. This is where BFG comes in. We used the BFG repo cleaner and removed large files from our repo history. Turns out we had checked in binaries way back in the day when Bolt was still a few people.
We went through running BFG on the repo and removed 93MB worth large files (all binaries). Now usually changing history is not good. This is one of those times when all engineers are starting from a new repo, so if there was any time to alter history this was it.
Once this was done, we set
MONOREPO_MODE=true in our build systems and deploy tooling and went ahead and deployed to our staging environment. After running regression tests to make sure everything was in shape we moved on to sandbox and production environments.
Finally after testing all parts of our workflows, a grueling 4 hours later, the monorepo was open for business. We marked our older repos as archived and let engineers know they could start opening pull requests.
With all big launches, we had a rollback plan in place for the monorepo. Now code storage being asynchronous this wasn’t as hard of a rollback plan.
If by EOD on D day we weren’t able to deploy to all environments, we would simply have opened our two repos back up and let engineers continue there while we went back to the drawing board. The harder case would have been if we found something wrong ~3 days after the merge. The plan was to manually copy the commits over to the old repos and resume work there.
Obviously going back and forth as the time grew longer would be harder so we pre-determined that any problem found on day 4 and onwards would mean we would stop and fix forward. Lucky for us the rollback plan was never activated, and things went smoothly.
Easy Developer transition
At the time of this transition Bolt had ~30+ developers, so we had to make sure the interruption was minimal. So we spent some time thinking about how we could make this transition smooth.
The first step was to update all our documentation (onboarding etc) to account for setup in the monorepo world. Granted we only had one engineer onboarding that week, but it was one thing that would help.
The bigger, more important thing, we did was a script that migrated you from the old repos to the new repo. Now our developer practice was to test and develop locally which involved some local .env files with credentials etc. So we wrote a script that had asked where you locally stored your old repos and copied over the git ignored files needed for local development.
Finally, in order to make scripting easier in the future, all our guides for setting up the monorepo were very prescriptive. We asked all developers to clone the repo to ~/workspace. Now we haven’t yet come to a point where this setup might be useful, but if there is any time we want to run a script on our developer’s laptops, there are some assumptions we might be able to make.
We managed to move from two git repos two one, and this laid the foundation for a lot of development ease. Soon after we made the migration, developers spent time optimizing/fixing a lot of pain points we had previously. Over time we saw our other repos fold into our monorepo. Stay tuned for our next blog post to see how we optimized our build pipelines as our monorepo grew even bigger and handled more complicated workflows.
At Bolt, checkout is the engine that drives ecommerce. While we are hyper-focused on perfecting checkout, this is just the beginning. Our vision is to make the best ecommerce technologies accessible to anyone who wants to sell online. If this resonates with you, come join us and help build the next generation of ecommerce products. Apply at https://bolt.com/jobs