Formatting files means generating ASCII text files in MEDATLAS or ODV (Ocean Data View) format which are formats defined for the SeaDataNet project. The SeaDataNet project aims to develop a pan-European infrastructure for ocean and marine data management, in order to standardise, definitively preserve and facilitate integrated access to these data via a single portal.
NEMO software is used for formatting. However, before using NEMO, an in-depth check must be run to control the uniformity of input files.
Stage 1 of formatting: file standardisation
If the input files are not uniform, the NEMO software will not be able to format the data. The format of the entry data must be flawless.
The data provided
Our data providers send us data files in a variety of formats (dat, cnv, txt, xls, rtf, etc.).
When we open the files, we are sometimes surprised by the lack of consistency of their content, their complexity, the lack of rigour in the creation of the files and the lack of information associated with the data. In some cases, it can take several hours to several days to make the files consistent and ensure they meet our strict requirements.
On the other hand, when the files are consistent (or nearly consistent) this saves us precious time.
The first main stage in the quality control process involves standardising the input files. This can be a tedious task as it depends on the state in which the files are provided and on the number of files.
In the set of input files, the information relating to the stations and measurements must:
- Always be in the same position: same row in the file and same position in the row or column (if separator)
- Always be in the same format
- For instance: for all stations, the latitude is
- On line 3 of the station header
- From character 21 to character 27
- format +DD.ddd
- For instance: for all stations, the latitude is
A few cases which complicate our work:
- When the station headers are not in the data files
- In this case we must correlate each station header with the data
- When we do not know which parameters are present in the files
- Archiving cannot be performed
- When we have the parameters but not their units
- It is difficult to choose the correct parameter
- When the metadata (dates, times, positions, etc.) are missing
- A station without a date or without a position cannot be archived
- When the metadata or data are not in the same format for all the stations
- The metadata or data must be made consistent either manually, using a programme or in Excel
- When the number of decimals in a column or in a given position is not identical
- The number of decimals must be made consistent so that all the data are aligned
- When the data are not always in the same place in the file(s)
- It can sometimes be necessary to reconstruct the file(s)
- When we do not know if the times indicated are in UT (Universal Time) or not
- We need to know the time difference to be entered in NEMO
This data homogenisation effort is a very meticulous and time-consuming task.
Once we are sure that the files are consistent, they can then be injected into the NEMO software.
Stage 2 of formatting: the NEMO software
NEMO is a custom software programme designed to format ASCII files. It is used to generate ODV and/or MEDATLAS files from homogeneous ASCII files consisting of vertical profiles (measurements at a fixed point from the surface down to the seabed, reference parameter: pressure or depth), time series (measurements at a fixed point and a given depth over a specific time period, reference parameter: data, time) and trajectories (measurements along a ship's route, reference parameters: latitude and longitude), ODV and/or MEDATLAS format files.
ASCII files can be:
- One file per station for vertical profiles and time series
- One file per cruise (or mooring) for vertical profiles, time series and trajectories.
NEMO's underlying principles
NEMO was designed to be capable of reading as many ASCII formats as possible to convert them to MEDATLAS and/or ODV formats. It cannot process Excel, Word or Open Office input files. Only text files can be processed.
NEMO users must describe the input files so that NEMO can find the necessary information in these formats.
The format of the metadata and data should not differ from one station to another. The description must be completed homogeneous.
Several steps must be carried out to convert input files. The user must describe:
- the file type
- the cruise to which the data relate
- station information
- the parameters measures
and most importantly, the exact location in the file and in what format they can be read.
Input file description
The user must be able to answer the following questions:
- Where are the input files to be read?
- Are they related to a cruise?
- Is there a single file for the cruise or a single file for each station?
- Is this a collection of files organised by cruise or not?
- Are there data separators (tabulations, semi-colons, commas, spaces)?
- Are they vertical profiles, time series?
- What format are they to be converted to?
When the user describes the station, he describes the position and format of the field containing the date, time, latitude and longitude, and depth of the station.
Same procedure as for stations. The user indicates the position of the parameter to be read, the default value, and the output format for each parameter.
All the parameter codes can be accessed via a list of parameters according to their units.
The check that the parameters have been described correctly, the values read in the file are displayed in the output format chosen be the user in a test column.
Once the description has been entered, the files can then be converted to MEDATLAS or ODV format.