Using PDF for long-term document preservation

The Digital Preservation Coalition (DPC) has just published a report on digital preservation, which states that “PDF should be used to preserve information for the future”. This is an important step for the Future of Document - whether for records management, long-term archival, or other forms of preservation, it is important to choose a format that will make today’s archive documents readable and accessible a few decades  from now (or even later).

 The Digital Preservation Coalition (DPC) was established in 2001 to foster joint action to address the urgent challenges of securing the preservation of digital resources in the UK and to work with others internationally to secure our global digital memory and knowledge base.

 If you remember my blog on XML or PDF-A for archiving, this is fully in line with what I have been advocating for - you need to make sure whatever digital format you choose will still be accessible or queryable in a few decades - and there’s nothing like a stable, standard format, with long-term support commitment.

However, it is true that PDF is just a generic container, and will not contain the “semantic” information that a specialized XML would carry, and which will make your document “queryable” in the future (be aware, though, that general-purpose XML “standards” such as OOXML or ODF probably won’t carry much more semantic information than PDF…). However, the schema for querying your document will be vastly different from today’s, if at all supported. So, for the time being, I would agree PDF/A is your safest bet.

I highly recommend reading the actual DPC report (”Preserving the Data Explosion: Using PDF“) which provides detailed history, concrete tips and useful resources and links (including for specific verticals). My only builds would be common sense: embed as much information as possible in your PDF while complying with the standard (e.g. original hi-res image with text for a paper scan, full-text information for native electronic documents, add as much meta-data as possible etc…) and put a plan in place for preserving your files…

Post a comment

  -- required field
(not displayed publicly)
 

You may use HTML tags for style