What is version control and what are the benefits of it for you

This is the second blog in the series Simple steps towards FAIR code and software. In this blog, Stefano Rapisarda tells you why version control is so important and how you benefit from it. Stefano is one of the consultants at RDM Support.

Computers have always been at home in my house for as long as I can remember. This did not make me a Bill Gates, but it definitively populated my childhood with all kinds of electronic beasts. I remember the colourful lights and the low frequency electrical purring of a big aluminium box, an uninterruptible power supply unit that costed my father one month’s salary. The job of that box was one and one only: to prevent my father to curse like only an Italian father can curse every time my mother was drying her hair, with laundry running, and while cooking something in the oven, overloading our electric net. At that point the big red light of the box would switch on, the buzzing would increase, and all its internal batteries would start providing juice, keeping on my father’s computer long enough to save from cybernetic oblivion all the progress he had made in the previous hours.

Another thing I clearly remember were the mountains of floppy disks spread out everywhere in his studio, sometimes organised in containers or most of the times just piled up in unstable jenga-like towers. "proj1", "ver1", "ver1b", "ver1_1992", "ver1_ultimate", "ver1_ultimate_better", etc, names tidily written on paper labels or brutally scribbled on the row plastic. Those disks were continuously picked up, fed to the computer, then pushed out ("ejected") and put back on top of a random pile.

All this gave my father a warm sense of security, the awareness that, regardless of any sort of natural disaster or new kitchen appliance connected to the net, he would not lose his work in progress.

At that time, all those efforts made little sense to me, but about thirty years later, while writing software for my research, I also ended up looking for that same sense of security. If all this makes little sense for you as well, wait just another few lines because today we talk about version control for research software. What is it? How does it work? And why is it beneficial both to the occasional developer as to the trained scientist?

All this gave my father a warm sense of security, the awareness that, regardless of any sort of natural disaster or new kitchen appliance connected to the net, he would not lose his work in progress.

What is version control?

In computer science, version control is any kind of system that allows you to track and document changes of files, software, a website, or any other digital object containing information. When dealing with software, you probably came across additional labels such us "V1.0", "beta" or just "2.3". Those numbers and letters specify the software version, i.e., they identify a specific release of the software. What is a software release? Basically, the point where software developers say "good enough" and decide that their software is ready to be shared with the public. It all sounds very technical, but as software is becoming more and more integrated in the published research output of research projects, version control is something researchers need to take care of as well.

Why do I need version control?

You may have not realised it, but most of the editors you use on a daily basis, already automatically perform a sort of version control: every time you push the button "undo" you are accessing previously saved version of your code or document. I.e., you are going back to previous versions of it. More explicit version control becomes almost a "must" when you are planning to work on a big project, when you want to make your software public, and if you are developing the code with other people.

At the most basic level, version control systems help you keeping track of changes in your code (development history). These changes are usually documented, so that you can always figure out when, where, and why you changed your software. Modern version control systems can also work in tandem with cloud services, so that your progress is saved both on your computer and in the cloud.

It is also essential to know the version history of software when it is released to the public. Computer programs are indeed dynamic animals: they change continuously, adapting to the needs of users and to new hardware. Software smoothly working in a five year old computer may not work on a more recent machine (and vice versa). It is therefore essential to know which version of software you are going to use for your work and what the difference is with its previous versions.

Version control becomes a real life saver when you collaborate with other people. Imagine a group of researchers or developers working simultaneously on the same code, each of them fixing, adding, and modifying the same files. No matter if one of them deletes everything or completely messes up the code, with version control you can always go back to the point where your software was perfectly functional and start again from there.

Most of the editors you use on a daily basis, already automatically perform a sort of version control.

How does version control work?

There are several version control systems out there, but the most predominant is Git. Git allows you to take "snapshots" of your in-development software on command. Every time you think you made significant changes to a file, you take a snapshot of it. If you are really, really sure about the changes, you "commit" to it (a git specific action that may be considered as "confirming changes"), officially storing that snapshot in the change history of that file. From now on, whatever happens to that file, you can always retrieve a past snapshot of it, compare it with your current version, and eventually start working again from there. Git is also connected with a cloud service (GitHub) that allows you to share files (documents, software, websites, etc) with other users and to save them on the cloud. Check out the GitHub environment of Utrecht University.

As mentioned before, working with Git becomes indispensable when many people work simultaneously on the same software. Git would allow each person to work on a different "branch" of the code, I.e., a copy of it. You make snapshots and commits as you would do if you were working alone, but when you think it is time to share your progress, you would submit a request for "merging". At that point, your colleagues would inspect the difference between your version and the original one, eventually proposing changes. If approved, your modified code will be merged with the main one becoming officially part of his history.

On a more individual level, when I write software, I always think about my main collaborator: my future self.

I am a researcher, not a developer, why should I care about version control?

In the past, the only thing that researchers had to care of when sharing scientific results, was a paper. We are now approaching the age where all the possible research output products of a project are going to be shared, including data and software. As a scientist, you must follow the principle that scientific results need to be shared to allow other scientists to reproduce your results. In this context, sharing data and software on a public repository is often not enough to guarantee the reproducibility of scientific results. Data and software can have a very high level of complexity, but they often are messy, following organisation criteria that are clear only (in the best case!) to members of your research group. Usually, data analysis is performed using a specific software version so that, as a matter of fact, analysis results may also be software version-dependent (especially if the software has a bug!). Sharing a well-documented software history allows your colleagues to figure out where your code comes from and which version has been used to perform data analysis. Furthermore, uploading your software on platforms like GitHub will make your software easily findable and downloadable in the first place. In other words, using version control contributes to make your research products more FAIR.

Using Git and GitHub is not mandatory (although some funders may explicitly ask for public repositories), however more than 150 Git and GitHub users are currently affiliated to Utrecht University, providing more than 800 research scripts and software. These numbers just blow up when looking at the national and international level. This also means that Git and GitHub can count on a very large base of (research) users and, as a matter of fact, they are becoming the dominating tool for sharing your software with the scientific community.

My future self

On a more individual level, when I write software, I always think about my main collaborator: my future self. My future self is an older, slower, slightly chubbier version of me who forgot anything about the project I am currently working on. Well, that person in the future usually is moved to tears when he sees that his past self (i.e., my present me) did an excellent job documenting properly all the life cycle of my software.

Finally, sharing your software on GitHub (or any other public platform) is an opportunity to share your work and to show its quality. Showing what you worked on and how you worked on it is fundamental independently from whatever career path you will decide to pursuit.

Ok, I got it! Where should I start?

Watch this short animation and start using Git and GitHub today and have a look at the Git documentation. Do you feel intimidated by the topic or need some human help? Join our Programming Café , it is full of Git users ready to share their enthusiasm for version control. We’ll share new dates in the events overview after the summer break.

RDM Support is always happy to answer all your questions about version control, just reach us by email and subscribe to our newsletter, you will find both in our contact page. We are here to help you. No matter where you start, just start with version control, your future self will be forever grateful.