11/24/2019Time to read: 3 min
Almost every engineer joined Google will want to talk about Google's monorepo. It is such a unique platform that Google builds its technology upon. Most of the technical detail is already covered at Software Engineering at Google and other technical blogs written by Googlers. In this post, I will also share my view on the monorepo.
I came from a startup where we kept our front end code and backend code in the same repo. This was my first experience with monorepo. However, Google's definition of monorepo is much larger than that. The monorepo contains almost all the products Google works on, Search, Youtube, Map, Gmail, you name it. All the code is visible to engineers. Often than not, if we need to look up the usage for an API, we can just search the monorepo. This transparency is valuable, as it encourages engineers to learn about things in other areas and share knowledge. Google trust engineers not to share in-progress products to the public.
The technical aspect of monorepo also amazes me. Google's policy forbids storing code in engineers' local machines. It is also physically impossible to download all the source code and keep them in sync with thousands of commits every day. That's why Google's version control system(VCS) is centralized. We work on code on the server. Google has a plethora of toolings to make the process more natural. For example, we can create a virtual mapping on our local system so that we can navigate to the source directory just as if they resize locally. We can create a workspace that contains the current snapshot of the source and conveniently update and rebase to the newest version with one command. We have a code search tool that searches through the monorepo for a keyword in a blink of an eye. We have a web-based code editor and all the compilation, testing is done online. If we forgot to bring our laptop, we can get a loaner laptop and get up to speed very quickly, because there is no setup we need to do on our machine.
The overall infrastructure involved in increasing engineer's productivity is unthinkable for smaller startups and may be hard to mimic in a short amount of time. However, some of the processes could potentially be learned and adopted.
Google has a strong culture of code review. Every code change into the repo needs to be review by at least one reviewer other than yourself. Each directory in the codebase has an owner file containing IDs of Google engineers who are appointed to review code changes in the directory. Changes need to be approved by the owners of each directories containing the change to be able to check-in. The owner files are, as you guess it, checked in to the monorepo. This prevents self-approving your code or getting your friend to approve. The ownership is strictly maintained, modification to owner file should also be reviewed by the current owners or owners in parent directories. It is encouraged to break changes into small and digestible chunks, which makes the reviewer's job more manageable, because, when seeing a change of hundreds of files, very few people will carefully exam each file.
Code styles check is also built into the process. At Google, if we modify files written in some major programming languages, we need to get approval from engineers who are very skilled at those languages. The process is called readability review. It's understood that engineers might not be familiar with certain language and they probably followed some code samples when they implemented the change. Readability review makes sure that the code change meets the style guideline of the language. Readability is not about formatting, tab vs space, etc. All formatting issues will be handled by auto formatters before sending out for review.
Google values continuous integration. The code changes should ideally not break the existing code (unfortunately this does happen sometimes). This translates to running pre-submit tests. Git also supports that using a pre-commit hook. How many tests to run before check-in is a question. I was surprised to see that sometimes it takes an hour to run all the tests before a change is merged into the codebase. If any conflicting change is checked in while your change is in the process of testing, the process will halt and you would need to rebase to pull in the new change. The added friction to submit a code might be hard to adjust to for newcomers, but it also reduces the changes where a check-in needs to be reverted.
All those processes and infrastructure might make it slower to make some change, this is true. But as Google scale to tens of thousands of engineers, they become far more valuable to keep the health of the codebase to a high standard.