XML or PDF-A for archiving ?
The “Future of Documents” is not only about what formats our documents will be in 20 years from now; it’s also making sure today’s documents will still be usable in 20 years.
I often hear customer invoke XML as the magic wand for solving all of their long-term archiving needs. But I usually caution them against such a generalization.
In specific verticals with well-defined schemas, it is true that XML is an excellent way to store your documents in a way that (hopefully) can retain both the visual aspect and the content. But I would not bet that readers for these schemas (or even XML !) will still be around 20 years from now.
If your focus is to preserve your documents so that they will still be readable and “visualisable”, I would strongly recommend PDF-A. This stripped-down version of PDF does not have all the bells and whistles of the latest versions of PDF, but does the job – and Adobe commits to still have a reader 20 years from now.
If your focus is more about making the content of your documents “queryable” and reusable 20 years from now, then XML might be a worthwhile alternative (maybe). But make sure the XML schema you apply is specific to a vertical, don’t go XML just because of XML. Avoid “generic” standards such as ODF or OOXML (even though they might be standard) as they won’t bring any value to your documents and will evolve over time.
Instead, go for specialized verticals such as HR-XML for Human resources or HL7 for Health. With a little bit of luck, not only will your XML-based documents still be readable twenty years from now, but even reusable. Still, I would not be surprised if converting from PDF-A with images to whatever formats will be used by then is actually much easier than finding the right converter for your XML formats.
To read more on the topic, check this excellent blog on Document Archiving using PDF-A; although sometimes disputable, it lays out really well the different needs for archiving and the benefits of PDF-A vs XML-based OOXML and ODF.
One last comment: not only does format matter, but also the media on which it is stored - can you still read a floppy disk from 10 years ago, even though it’s a plain text file? Haven’t you burnt CDs of JPEG pictures a few years ago that were supposed to last over 100 years, but are unreadable today? Paper is universal, let’s hope we find such a media for electronic documents…

[...] you remember my blog on XML or PDF-A for archiving, this is fully in line with what I have been advocating for – you need to make sure whatever [...]
If PDF/A is ’semantic enough’, then XHTML should work as well?
Is Xerox committed to producing PDF/A from their PDF scanners?
Some times I think the mission of PDF is to provide an envelope to make TIFF files easier to access…