Taco Steemers

A personal blog.
☼ / ☾

Notes on comparing text files

Category: Notes

Making a diff

Meld is a great graphical tool for text file comparison .

Meld can show you the changes between text files:

Note that we are not limited to comparing two files, meld can do more than just that.

This kind of change overview is called a "diff", because it shows the difference between two text files.

There are also commandline tools to get a diff. An example is the diff function in git, the version control system . See the manual entries for git diff and git difftool .

Sometimes text data needs to be formatted before it can be compared

Sometimes files have all their contents on one line. That doesn't compare well. We need to format the file in a human-readable way, with logical groups of content separated out on different lines with additional whitespace. Programmers use a somewhat related type of tool called linters. The act of using them is called linting. We can do a web search to find tools for the type of file we want to compare, by searching for formatters or linters.

The basic idea

Let's say we want to compare XML files. On Linux and macOS we can use a commandline tool called xmllint . (On Windows we could use Notepad++, see notes below.) On your system xmllint and meld may need to be installed first.

For example

$ xmllint --format ~/Downloads/first_input_file.xml > /tmp/xml1
$ xmllint --format ~/Downloads/second_input_file.xml > /tmp/xml2
$ meld /tmp/xml1 /tmp/xml2

Using IDEs to get access to formatters for most types of data and code

Programmers would probably use their IDEs to format data files, but they may be handy for others too.

One option is IntelliJ IDEA . It has an Ultimate Edition and a Community Edition. The Community Edition does not cost money to use.

Another option is Visual Studio Code .

Search online to find how code and data formatting works in these applications, and to find out if they support the kind of file you want to format.

Online formatters

There are many online formatters. For SQL we can find SQLinForm which provides an online SQL formatter . Search online for formatters for your specific text format.

XML, JSON and SQL formatting on Windows

On Windows we can try Notepad++ to format JSON, XML and SQL. To get this type of functionality we will need to install plugins. Notepad++ has functionality to search and install plugins from inside the application. Some useful ones might be JSON Tools, JSON Viewer, JSTool, XML Tools and SQLinForm.

In this example I will show how to use Notepad++ to format JSON files. To keep this example easy I am assuming that the Notepad++ on your computer doesn't have any plugins installed yet.

The first step is to go to the "Plugins" section of the menu bar, and click "Plugins Admin".

In the "Plugins Admin" we make sure we are on the "Available" tab. We can use the search bar to find plugins for the type of text file that we need a tool for. In this case that is a JSON file, so I search for JSON.

I find two plugins that promise to format JSON files. You can pick one, but I decided to try both. After selecting the plugins that we want and clicking install, Notepad++ will install them and ask to restart. Make sure your files have been saved and allow Notepad++ to restart.

After restarting we find that the new plugins are available in the "Plugins" section of the menu bar. We can now easily format JSON files.