Have you had your “Oh git” moment yet?
Have you ever had that “Oh git” moment, you know, that time in a project where the deliverable is due to a client and something unexpected occurs to the output after a data refresh over the weekend? It is usually followed by a sigh of relief because git is being used to manage the project and saves the day with its version control features that work like magic. Hopefully you have had an “Oh git” moment and not its sinister opposite, but just in case, here’s a quick git overview. Git is a free, open source platform that provides Source Code Management (SCM). SCM is synonymous with version control and provides a way to track changes made to source code, while keeping a running history of those changes. Those days of manually maintaining backup copies, appending versions to file name ad nauseum are a thing of the past. Being that all changes are tracked, it is quite easy to revert back to previous versions, and more beneficial, test any changes made to source code prior to adding it to the main source code. This is made possible through git’s awesome branching methodology. So, where do we start?
For those command line gurus, git is very easy to install on Linux while Git for Windows allows easy installation on Windows and provides a git-supported shell (command line) via git bash. Both versions are easy enough to follow and implement, and there is also a wide-range of tutorials and “how to’s” some of which we’ve provided below. For those who navigate towards using a Graphical User Interface (GUI) to do the heavy lifting, GitHub Desktop answers the call and comes with versions available for both Linux and Windows. There is also a vast amount of reference material to explain GitHub Desktop functionality.
What makes git so easy to understand is its branching design concepts. Similar to how a tree has branches that grow away from its trunk, a project has a branch that grows from its base. The difference in git is that the branch at some point merges back to its base after some change has occurred. The terminology commonly used to explain these actions in git are: "create a branch," "commit a change," and "merge a branch."
Let’s take a step back and look at the concepts of git which allow it to “save our bacon” when things go wrong with our edits.
Every time you make a change to a file tracked by git, it’s up to you to log that change into git’s database of changes. Also, you can log changes to more than one file at a time when these changes are closely related to one another. How git knows that these changes over various files “go together” is when you tell git by grouping the changes into a git structure called a commit.
Any effort on a project which changes files tracked by git over time shows up in the change history in git as a series, or “chain,” of commits. This chain is called a branch. Branches have a name, or label, so we can differentiate between them. There is a special initial branch, usually called “master” or “main” that is automatically created for you when you tell git to start tracking changes in a directory.
New branches can be easily created, starting from any commit on another branch (e.g., the main trunk) to explore any variant of the files (from that point in time), like a feature change, a bug fix, a what-if experiment, and so on. When the changes on the variant branch “check out,” they can be incorporated into the main trunk; otherwise the branch can be saved (i.e., “this might be a good idea for the future!”) or it can be easily discarded. This branching concept is made all the more crucial when other people are changing the project files, not just you.
So branching sounds like a great idea, but what if you want to switch from a feature branch, back to the main (trunk) branch? What happens to the current state of your project files? The nice thing about git is that when you switch branches (via the checkout command), it takes care of putting your files in the state that you last committed them on the target branch. And it happens very fast! And since a branch label is just a simple pointer to the latest commit on that branch, you can create a branch label on any commit in the “past,” and when you checkout that branch, your files revert to the content at that point in time, making git act as a time machine!
Another way to recover the state of a file at some point in the past is to tell git that you want it to show you (via the show command) file such-and-such name at commit such-and-such time.
So far we’ve been talking about how git manages your project’s file changes as if you were the only one working on the project. Git’s history of changes, in that case, is stored in a special directory on your local machine’s drive. To start collaborating with other people, git has the clone command which makes a copy of your project’s change history (called a repository, or repo for short). You could then pass this copy to another person, but this manual process is cumbersome. Thus, git allows you to post the copy of your repository to a server that is accessible by your teammates. This is called the remote repository, or remote for short. Your teammates can then obtain a copy of the repository from the remote via running the clone command on their local machine. After cloning, any changes made by teammates working independently of one another (by logging commits on their respective local repos) can be coordinated via pushing to, and pulling from, the remote repo.
You may already know about well-known internet-based service offerings that help your team store remote repositories and allow your team to interact with them. Among them are: GitHub, GitLab and Bitbucket, but there are several more, including self-hosted options, like Gitea and gitolite. They usually offer more than just a mere store for your remote repositories, like: forking (which is like cloning but happens on the server from user account to user account), issue tracking, discussion threads, wikis, project web pages, access to remote continuous testing, etc.
With the idea of one remote and multiple local repos, git allows for team collaboration of changes such that one person’s changes can happen simultaneously with another’s, along parallel lines of effort. When more than one person’s changes need to be incorporated into the main trunk (base), they have to be serialized on the trunk, and any conflicting changes, if they exist, need to be resolved. This process is called merging.
A great feature found in most commonly used web-based git collaboration services, like GitLab and GitHub, is the ability to incorporate a commit review process in which a collaborator requests permission to merge changes in a branch into the main trunk (base). In GitLab this is done by the merge request feature, while pull request is used for GitHub, both allowing the assignment of reviewers to those changes. There is nothing like an extra set of eyes on your work!
The main “trunk” is usually where all these changes are logged, after they are reconciled to one another as a result of the merge process, but your team can use whatever git team workflow makes sense for them. Git doesn’t enforce a workflow; it just gives you the low-level tools to manage changes quickly and effectively.
May your project experience be filled with plenty of “Oh git” moments.
Here are a few sites we recommend and please keep a lookout for additional blog posts discussing more project-related concepts:
-"Introduction to Git: A Talk by Scott Chacon,” https://www.youtube.com/watch?v=xbLVvrb2-fY
Roque is a Senior Operations Research Analyst. You can contact him at firstname.lastname@example.org.
Rick is a Senior Operations Research Analyst. You can contact him at email@example.com.