- Submit data
- Submission guidelines
- Preferred file formats
Preferred File Formats
The file formats you use have a direct impact on the ability of other people to to open those files and access the data at a later date. The ideal format is one that supports interoperability and reusability (the I and R in FAIR).
When selecting file formats for archiving or sharing, the formats should ideally be:
- Non-proprietary
- Unencrypted
- Uncompressed
- In common usage by the research community
- Adherent to an open, documented standard, such as
- Interoperable among diverse platforms and applications
- Fully published and available royalty-free
- Fully and independently implementable by multiple software providers on multiple platforms without any intellectual property restrictions for necessary technology
- Developed and maintained by an open standards organization with a well-defined inclusive process for evolution of the standard.
Additional note:
Some formats such as MS Excel (xls) are open specification, but data can be held in the format in such a way that it requires detailed knowledge to understand and interact with it. Complex storage structures should be avoided or supported by appropriate user documentation.
The table below contains some generic guidance on file formats recommended and accepted by the BODC for data sharing, reuse and preservation. You may need to convert your data files to a preservation file format. We welcome queries from researchers about appropriate file formats for working and preservation, particularly early in the research process. If you are unsure of the suitability of your file formats for the data you want to deposit with the BODC, please get in touch. Data type specific guidance is also available.
Type of data | Recommended formats (3 Star Open) |
Acceptable formats (2 Star) |
Conditional (1 Star) |
---|---|---|---|
Simple Tabular text data with minimal metadata Simple "1-D" type of data, e.g. instrument time series data with column headings and variable names |
Comma-separated values (.csv) Tab-delimited file (.tab) Delimited text with SQL data definition statements NetCDF (.nc) |
Delimited text (.txt) with characters not present in data used as delimiters Widely-used formats: dBase (.dbf), OpenDocument Spreadsheet (.ods) |
MS Excel (.xls/.xlsx), MS Access (.mdb/.accdb), |
Complex Tabular data with minimal metadata Model data and observational data with more than 1 dimension (e.g. time-height data) |
NetCDF (.nc) | Matlab (.mat) | |
Tabular data with extensive metadata Variable labels, code labels, and defined missing values |
SPSS portable format (.por) Delimited text and command ('setup') file (SPSS, Stata, SAS, etc.) Structured text or mark-up file of metadata information, e.g. DDI XML file |
Proprietary formats of statistical packages: SPSS (.sav), Stata (.dta), |
MS Access (.mdb/.accdb) |
Geospatial data Vector and raster data |
Shapefile (.shp, .shx, .dbf are mandatory: .prj, .sbx, .sbn are optional) Geo-referenced TIFF (.tif, .tfw) CAD data (.dwg) Tabular GIS attribute data Geography Markup Language (.gml) |
ESRI Geodatabase format (.mdb) MapInfo Interchange Format (.mif) for vector data Keyhole Mark-up Language (.kml) Adobe Illustrator (.ai), CAD data (.dxf or .svg) Binary formats of GIS and CAD packages |
|
Textual data |
Rich Text Format (.rtf) Plain text, ASCII (.txt) eXtensible Mark-up Language (.xml) text according to an appropriate Document Type Definition (DTD) or schema |
Hypertext Mark-up Language (.html) Widely-used formats: MS Word (.doc/.docx) Some software-specific formats: NUD*IST, NVivo and ATLAS.ti JSON |
|
Image data | TIFF 6.0 uncompressed (.tif) |
JPEG (.jpeg, .jpg, .jp2) if original created in this format GIF (.gif) TIFF other versions (.tif, .tiff) RAW image format (.raw) Photoshop files (.psd) BMP (.bmp) PNG (.png) Adobe Portable Document Format (PDF/A, PDF) (.pdf) |
|
Audio data | Free Lossless Audio Codec (FLAC) (.flac) |
MPEG-1 Audio Layer 3 (.mp3) if original created in this format Audio Interchange File Format (.aif) Waveform Audio Format (.wav) |
|
Video data | MPEG-4 (.mp4) OGG video (.ogv, .ogg) |
AVCHD video (.avchd) Motion JPEG 2000 (.mj2) |
|
Documentation and scripts | Rich Text Format (.rtf) PDF/UA, PDF/A or PDF (.pdf) XHTML or HTML (.xhtml, .htm) OpenDocument Text (.odt) |
Plain text (.txt) Widely-used formats: MS Word (.doc/.docx), MS Excel (.xls/.xlsx) XML marked-up text (.xml) according to an appropriate DTD or schema, e.g. XHMTL 1.0 |