What is long-term data archiving?
The definition of archiving at Ifremer is the following:
- Long-term retention of data selected by users. Making sure that a file is always present on the storage media and that it retains its integrity.
- Indexing enabling it to be easily found (opening and reading of the file).
- Intelligibility of data: make sure that they remain understandable by their potential users over time.
- Storage on magnetic tapes.
Securing the archiving system:
- Against the loss of a tape, this is written in duplicate.
- Against a major disaster (fire), the libraries are in two different buildings belonging to Ifremer.
- The archiving server is backed up in another building.
How is the long-term sustainability of such an archiving system determined?
- By its ability to follow technological developments. Effectively, the risks that threaten a file are obsolescence of the hardware, software or file format and the loss of meaning of its content.
- By the easy extraction of data and metadata when software is changed.
What enables this archiving?
The Information Systems Engineering RIC service of the "Information Technology and Marine Data" research unit is responsible for developing and managing Ifremer's common IT infrastructures.
Major technical resources
As one may suspect, long-term data archiving requires major technical resources.
How is this system managed on a day-to-day basis?
On a day-to-day basis, automatic checks and alerts are provided to operators to verify or inform concerning any backup problems. Before going on to HSM, the data are first stored on an intermediate disk (cache disk), which is backed up every day. This procedure for backing up the cache disk, which has knowledge of all files which are on HSM, ensures that for each file, the essential information is indeed present and backed up without any errors (each file has a particular number and is backed up on a particular tape).
These daily checks can verify the coherence of the system and that it is working correctly. If the operator receives an alert, he/she will display the problem and resolve it as soon as possible.
Once a week, the operators check that there are enough tapes available and that the robot is not nearing saturation (check the usage and remaining space on the robot).
Forward planning by anticipating the increase in the volumes that certain data may provide and making sure that the robot never reaches saturation is also a major job for the IT teams in charge of the long-term archiving of data.
Regularly, about every 4 years, the technology must be changed. The challenge is sizeable, because it consists of archiving, on new hardware, everything which is already archived, without losing anything. The history must be recovered and totally rewritten in the new archiving system.
This technology renewal means that the IMN department and the RIC service must anticipate budgets related to these changes, as well as the acquisition of these new technologies and their installation.
It is important to note that simply changing the technology requires all tapes already archived to be completely reread.
A long-term archiving system is therefore a system which is able to read old media, providing that they have been kept in good condition over time (check the humidity level in the media storage areas, no dust and no damage). A 4-year changeover renewal period is not particularly long for a magnetic tape and means that we can be sure that the media can easily by read.
To sum up, the data must have a longer life than that of the technologies and software that manage the archiving.