Some files and information should not be stored in version control systems such as git. Which are they, what should we do with them instead, and how can we avoid mistakes?
Secrets
Examples of files that do not belong in a version control system are (unencrypted) files containing API credentials, keys, and anything else that is supposed to stay secret.
Over the course of a project's lifetime many people might get access to the source code. This makes the source code an unsafe place to keep secrets.
Another aspect is that some secrets change, such as credentials. This is easier to do if the secrets are stored separate from the source code. If we had them in a distributed source control system it would be more work to change them for all active versions of the software. It would also require a new release and deploy.
Loose secrets and files that contain secrets are usually made available to the applications through environment variables, or a shared data source such as a secret store. The environment variables can be set by the software responsible for deploying, starting and stopping the applications. An example of a secret store is Vault.
Secrets, in practice
In practice, we are likely to find that both methods are used to make secrets available. The URI of the secret store could be stored in the application database. To get that information we need to connect to the application database first. We can't connect to a database to get the database password from the database. Thus, the database connection details may be passed through environment variables, or the deployment software may write them to a file on the application server.
Generated files
Files that are the result of build steps, such as output from
generators and compilers, should not be added to a source control
management system. These are not source files.
Any changes made to them will be overwritten the next time the build is run.
Another example is files created by runtime environments, such as the __pycache__
directory which is created when a Python program is run.
An exception, generated interfaces
As far as I know there is only one type of exception to the rule. Generating a SOAP interface from a local WSDL file during every build is a waste of resources. It can be an acceptable solution to generate it once and add the output to the project source files. An alternative to adding the output to version control is to package it as an artifact (dependency) and add it to the organization's private artifact repository.
Other files
An example of other more mundane files is the .DS_Store file. This is a MacOS file for storing details of how a directory needs to be shown on the desktop. It is unrelated to the project.
IDE files such as IML
files and .idea
directories should also not be added. These contain
the developer's personal settings and preferences.
Occasionally we may share them to help a new colleague get up and running, but it is not
part of the project source code.
What about backups?
There is no need to store several versions of the file next to each other in the project directory. The version control system controls file versioning. The previous version of the file is the backup.
Database backups don't belong in the source control system. They belong on a properly secured storage server.
What about documentation?
Personally I feel that some level of documentation can be good to add. This includes instructions about development dependencies, local development setup, and documents concerning integration with external APIs. Having this type of information close at hand can be very helpful to developers.
How to avoid adding secrets to version control
This is a problem that probably does not have a full technical solution. Awareness is key.
There are projects such as git-secrets that try to solve this. Personally I have not used git-secrets or similar tools. Secrets detection is tricky to automate and won't be fool-proof. I imagine that they can detect secrets that they already know; common secrets such as AWS related credentials. Secrets specific to your software on the other hand, I expect to be difficult to detect. The creators of the tool are not familiar with them.
How to avoid adding unwanted files to Git
Git has a special file, the .gitignore file. This can be used to specify a list of files that should not be added to the source control system. The file itself is always added to the source control system, that way every developer can benefit from it.
This file is easy to create.
Here is an example .gitignore
file for a website project generated with Pelican.
The developers are using a MacOS computer and the IntelliJ IDEA development environment.
.DS_Store
*.iml
.idea
generated/
pelican/output/
pelican/__pycache__/
As we can see, we ask Git to ignore the MacOS-specific file, the IDEA specific files and directories, and the output directories.