Tuesday, April 21, 2009

Ambiguities in PDF/A Extension Schemas

The PDF/A XMP Technotes are not clear on the subject of optional/required for properties of the extension schemas.


Discussion with engineers from other PDF/A companies has resulted in the "if it doesn't say 'Optional' then it must be 'Required' assumption" which most of us are trying to abide by. The only properties in any of the extension schemas marked as Optional are:
  • pdfaSchema:schema - Optional description of schema
  • pdfaType:field - Optional description of the structured fields
That's it!  All the rest must thus be 'Required'.  Not so fast!

If this were true then both pdfaSchema:property and pdfaSchema:valueType are always required which means that all extension schemas must include both properties and custom value types. When we were creating RDF definitions for all the pre-defined PDF/A schemas, we noticed this issue because it made it impossible to correctly define the "Dimensions" valueType schema: this schema has one custom value type and no properties.

Exception #1: at least one of pdfaSchema:property and pdfaSchema:valueType should be present.

We've noticed with our vast test set accumulated through our free online services like www.freepdftoword.org and www.validatepdfa.com that several Adobe products create schemas which omit one or more of pdfaProperty:description, pdfaType:description and pdfaField:description. All three of these properties are purely descriptive in the same sense as the two properties mentioned about as 'Optional'.  We believe that these fields should also be optional but, for now, our validator still flags their absence as an error (not a fatalError though since we can add these fields to the schemas, containing filler content, to "fix" the issue).

Proposed Exception #2: pdfaProperty:description, pdfaType:description and pdfaField:description should be 'Optional' properties. Existing PDF/A creators are omitting them and it makes sense.

A value type containing fields is required to have a pdfaType:namespaceURI property. We've noticed customer samples created by reputable products which omit this field. In the case of the omission, the assumed namespace for the value type is simply the same as the namespace of the schema with a slash and the name of the type appended to it.  Our validator marks this issue as an Error to (and not a fatalError) since it can easily be repaired by explicitly inserting the assumed namespace.

Example:
Schema namespace:
  http://www.acme.com/ns/email/1/
Value type name:
  mailaddress
Assumed namespace of value type if pdfaType:namespaceURI  is absent:
   http://www.acme.com/ns/email/1/mailaddress/

Proposed Exception #3: if pdfaType:namespaceURI is absent, construct a default namespace for the property as described above.
 

No comments:

Post a Comment