PDF/A growing acceptance for archiving


This press release from the PDF/A Competence Centre confirms that PDF/A is gaining acceptance for records management and long-term archiving. Half of the organizations surveyed had plans to use PDF/A in the next 12 months. At the same time, older archiving formats such as TIFF, JPEG or simple PDF decreased by about 5 percent. “Nearly all archiving projects use PDF/A” is quite a misleading title though, the current penetration of PDF/A is still small, as only 16% of surveyed use it actively (although 75% plan to use it actively).

PDF/A is based on PDF 1.4, and became a published standard on October 1st, 2005. It is a stripped out version of PDF, which is intended for long-term compatibility. It is actually offers two levels of compliance: PDF/A-1b is the predominant one,  while PDF/A-1a conserves reading order and adds ”searchability” (e.g. OCR for paper documents). A new version of PDF/A is in the works – PDF/A-2, which will add selected features from later PDF versions (1.5 to 1.7).

So is PDF/A a good long term storage format? Yes, I still think so. The files are relatively large, but they are totally self-contained, which is vital for very long-term conservation – and you have the assurance of having a software to read it twenty years from now.  “searchability” are very important for the short term. “Reading Order” can be very important if you want to apply Natural Language Processing to analyze this data – but these technologies are only starting.

Both features are “standard” (and easy) for native electronic documents, but for paper documents, they are not that easy- and in a few decades OCR and Intelligent Document Recognition will have improved so vastly that the image document (embedded in the PDF/A) will be the best source to extract both reading order and text information so well that saving it today does not make so much sense. However, in the meantime, they allow a good indexing and metadata for your documents.

As always though, file format is just part of the story – you also need to make sure you’ll be able to read your archival medium  a few years from now… Who still owns a zip or floppy drive?

One Response to “PDF/A growing acceptance for archiving”

  1. Deepak Seth on Jun 18, 2009

    We’ve sure come a long way from the era when archiving was hieroglyphics etched into stone or Emperor Ashok’s edicts carved on rock faces.

    But the beauty of those and other subsequently evolved papyrus, copper plate or paper based archiving is that you can use your eyesight and fingers to browse through stuff. You may not understand stuff (like the Harappan seals from the Indus Valley civilization) but you can still make out what it is and with sufficient efforts (and some luck aka Rossetta stone) decipher it. Very “visual” as some may say.

    On the other hand a century or millenia from now some othe current archival media may become totally unrecognizable. Is it a floppy Disk or a place mat for coffee cups is what a generation far removed from us might think when they dig out during some archeological excavation !

Post a comment

  -- required field
(not displayed publicly)
 

You may use HTML tags for style