Git is by default the version control system used by most modern software projects. I have seen that the perspective is different if you have worked in worse version control systems at first. But for the new generation of programmers Git may be the their first ‘worst’ source control system. And for those many now and many more in the future, maybe it is good to start with a little about Git first.
Most SW developers who has worked for a decade or more would be familiar with one or the other of the centralised version control systems that was prevalent at that time. I started out with building a windows client for CVS as part of industry project. After that, for quite a lot of years was using the monster called ClearCase, and then I moved to a more modern project based on Java and Agile practices and there was introduced to Subversion/SVN and things started looking good. Everything looks good after ClearCase.
SVN looked good, and it seems there was nothing better to wish for. But then came Git and it suddenly felt empowering; that’s something strange to be said about a version control system. Git was built by a developer for the developer. Very few people would write a poem about SVN, ClearCase or the like, but Git is different.
Till then I dreaded to branch fearing the cost,
Or more the dread to think of the merge,
But then came Git,
And lo! I am in branching heaven!`
There was no more need to be heavily depended on the source control operations team. Since it is a distributed version control system, everyone has a copy of the repository in his laptop, that one can work offline with. That is incrementally commit to, branch out etc as with any SCM (source code management). Linus designed and implemented Git as a SCM replacement for Linux as the one they were using Bitkeeper became licensed. The key here is distributed version control system. Linus Torvalds presentation regarding the rationale of why he developed Git , in his inimitable style is a must watch before you do anything with Git.
So what is lost from centralised version control system; nothing much.You can always designate one repository in a server as the central repository- ‘origin’, to which all others raise a ‘merge request’ or ‘push’ their local commits.
Branching and Merging in Git,perfectly implemented:
Git which makes it very easy to branch and merge. All SCM’s support branch and merge; but the implementations were not as good to say the least. In SVN a branch is a copy of the folder in the server and all its contents. In Git a branch is nothing but a pointer to a commit.
This ability to easily branch and merge means that there quite a lot of ways to structure the flow of code from development to release. We will come to this part in a short while. Before that, maybe it is as well to dwell a little on some characteristics of Git that some may not realise at first.
Since Git is a DVCS (distributed version control system), if someone has committed a large file or set of files in a different branch,and pushed to ‘origin’ repo, it will be downloaded to everyone’s machine once they do a ‘pull’ or ‘fetch’ from ‘origin’; that is synced their repository with the designated ‘central’ repository. This is not a typical use case, but sometimes binaries or generated files gets ‘added’ and ‘pushed’ inadvertently. And even if the developer realises and immediately deletes the file, and commits and push again, the Git history has the file in its commit and history is usually kept immutable unless of course you do a ‘hard reset’ and do a forced push. This is not recommended as it may cause inconsistency if somebody already has ‘pulled’ the history with the commit into his or her local repository.
This means that if you are on a slow network,which strangely even now most of the enterprise IT systems are,the clone of big repository take some time to download. You can do some tricks like shallow clone etc, but Git was not meant for huge (>1 GB) repositories. Of course, if you store only source code, you would no have this problem. Note you could use something like Git Large File System (https://www.atlassian.com/git/tutorials/git-lfs) or other in your server to take care of it.
The other problem, if you can really call it that, is that it is slightly complex to use; primarily because it is distributed in nature and primarily it was written by a developer (see the video if you have not). But in the years hence many have contributed and now it is as usable as any other tool. It may look and feel similar to SVN, but you will commit mistakes if you do not take a day or two to learn the basics. I am no expert in it, but have used it for some years now, reverted merge commits across parent branches ( bit chilly) and prayed and did hard resets of the history more than once.
Tip :I have survived all this while with a few git commands;
git clone, git checkout (-b for new)
There are a ton of other commands out there; be wary, understand and use; or better not use if you do not understand the implications fully — example git rebase– https://medium.com/@fredrikmorken/why-you-should-stop-using-git-rebase-5552bee4fed1
Mono Repo or Multi Repo ?
There are two way’s to host a project; as a single repository -mono repository or a set of multiple repositories. Before we go any further, let us ask this question first.
What constitutes a micro-service project?
If we define a project as a set of micro-services that collaborate with each other using some sort of typed interfaces (like protobuffer), then there is a clear advantage in sharing interfaces unambiguously and singularly. For this there should be only one copy of the interfaces and I feel the best way to do is is using a single repository /mono- repository.
Is there a way to do singular versioning of Interfaces using multiple repositories ? Not in an elegant, non manual way.
The very nature of multiple repository means that you need to assign one repository as a truth for interfaces and have some or other non elegant work agreement to copy manually to and fro from this repository to your repository or use sub modules. Both are prone to mistakes. Managing this via reference to an external repository via sub-module, sub-tree seems like a good fit, but it is an abstraction that is quite leaky and need to used very carefully.
The better way to share interfaces is by a shared branch, not by a shared repository.
The caveat here is that if your project is composed of ten’s of thousands components and gigabits of source files, all sharing interfaces, Git won’t be good in handling that; and you need to break this into different repository.