Welcome to Mozilla Science Lab's Open Data Primers!

  Edit

Converting File Formats

As was mentioned in Primer 2, the data you'd like to use can come in any number of formats. The format in which you receive it will not necessarily be in a format you can use. Don't despair! Thankfully, many file formats can be converted into other formats that can (hopefully) work with the platforms you use.

First, be sure to fully understand the file format in which the data was produced or stored. Even common proprietary formats can cause interpretation issues. For example, Microsoft Excel is notorious in the data science community for how it handles date data. If the data is preserved in a proprietary or uncommon format the best thing to do is to preserve a copy of the original. This will avoid data loss and give you something to go back to in case the file conversion gets tricky.

Second, look for a tool to handle conversion of the file format to that being used by the platforms you are using for data analysis, such as rio for R. Sometimes the software or platform you are using will have an conversion utility built in, usually found in the "File" menu under "Import" or "Export".

Finally, save your newly converted file in a simple, non-proprietary format, such as .csv for tabular data. For more information on non-proprietary formats, see Primer 2: File Formats, FTW!