Friday, July 24, 2009

Info Dictionary vs XMP Metadata

The PDF/A-1 specification goes to great lengths to describe a mapping between the Entries in the legacy PDF Document Information Dictionary and their corresponding values in the more modern Document XMP Metadata. Section 3.4 in TechNote 0003 describes how the values in the Document Information Dictionary must be mirrored in XMP. Section 3.3 describes the requirements for Document Information Entries.

However, these requirements are not symmetric. What I mean by this is that it is perfectly legal for a PDF/A document to contain Document XMP Metadata and not to include a Document Information Dictionary.

Keeping the entries of the legacy and the more modern structures in sync is a headache for the software developer and this pursuit is littered with ambiguous scenarios. For example, many of the XMP Metadata fields can have multiple values. For example, multiple dc:title values for multiple languages or a seq of multiple authors for dc:creator rather than a single author. Each Entry in the legacy Document Information Dictionary is a simple string. There are no conventions on how to order or delimit these strings when mapping multiple fields from XMP to these single string values.

The solution is simple: don't use Document Information Dictionaries! Accept that Document XMP Metadata is the way forward and move on.

We'll be adding this to PDF/D as a constraint: the Info dictionary will be illegal in PDF/D - legacy software be damned.

No comments:

Post a Comment