Taco Steemers

A personal blog.
☼ / ☾

Some files and information should not be in source control

Which are they, what should we do with them instead, and how can we avoid mistakes?

Some files and information should not be stored in version control systems such as git. Which are they, what should we do with them instead, and how can we avoid mistakes?

Secrets

Examples of files that do not belong in a version control system are (unencrypted) files containing API credentials, keys, and anything else that is supposed to stay secret.

Over the course of a project's lifetime many people might get access to the source code. This makes the source code an unsafe place to keep secrets.

Another aspect is that some secrets change, such as credentials. This is easier to do if the secrets are stored separate from the source code. If we had them in a distributed source control system it would be more work to change them for all active versions of the software. It would also require a new release and deploy.

Loose secrets and files that contain secrets are usually made available to the applications through environment variables, or a shared data source such as a secret store. The environment variables can be set by the software responsible for deploying, starting and stopping the applications. An example of a secret store is Vault .

Secrets, in practice

In practice, we are likely to find that both methods are used to make secrets available. The URI of the secret store could be stored in the application database. To get that information we need to connect to the application database first. We can't connect to a database to get the database password from the database. Thus, the database connection details may be passed through environment variables, or the deployment software may write them to a file on the application server.

Generated files

Files that are the result of build steps, such as output from generators and compilers, should not be added to a source control management system. These are not source files. Any changes made to them will be overwritten the next time the build is run. Another example is files created by runtime environments, such as the __pycache__ directory which is created when a Python program is run.

An exception, generated interfaces

As far as I know there is only one type of exception to the rule. Generating a SOAP interface from a local WSDL file during every build is a waste of resources. It can be an acceptable solution to generate it once and add the output to the project source files. An alternative to adding the output to version control is to package it as an artifact (dependency) and add it to the organization's private artifact repository.

Other files

An example of other more mundane files is the .DS_Store file. This is a MacOS file for storing details of how a directory needs to be shown on the desktop. It is unrelated to the project.

IDE files such as IML files and .idea directories should also not be added. These contain the developer's personal settings and preferences. Occasionally we may share them to help a new colleague get up and running, but it is not part of the project source code.

What about backups?

There is no need to store several versions of the file next to each other in the project directory. The version control system controls file versioning. The previous version of the file is the backup.

Database backups don't belong in the source control system. They belong on a properly secured storage server.

What about documentation?

Personally I feel that some level of documentation can be good to add. This includes instructions about development dependencies, local development setup, and documents concerning integration with external APIs. Having this type of information close at hand can be very helpful to developers.

How to avoid adding secrets to version control

This is a problem that probably does not have a full technical solution. Awareness is key.

There are projects such as git-secrets that try to solve this. Personally I have not used git-secrets or similar tools. Secrets detection is tricky to automate and won't be fool-proof. I imagine that they can detect secrets that they already know; common secrets such as AWS related credentials. Secrets specific to your software on the other hand, I expect to be difficult to detect. The creators of the tool are not familiar with them.

How to avoid adding unwanted files to Git

Git has a special file, the .gitignore file. This can be used to specify a list of files that should not be added to the source control system. The file itself is always added to the source control system, that way every developer can benefit from it.

This file is easy to create. Here is an example .gitignore file for a website project generated with Pelican. The developers are using a MacOS computer and the IntelliJ IDEA development environment.

.DS_Store
*.iml
.idea
generated/
pelican/output/
pelican/__pycache__/

As we can see, we ask Git to ignore the MacOS-specific file, the IDEA specific files and directories, and the output directories.