Sunday, February 15, 2009

XMP Validator

I've been working on building a better XMP validator. My idea was to define all the pre-defined schemas as pdfaExtension schemas and pre-load them into my validator. With this approach, I only need one validator (that validates pdfaExtension schemas) to validate all the pre-defined schemas as well as any user defined schemas.


Part of pulling this off requires that I have RDF schemas for all the common pre-defined schemas. I thought I'd start with the PDF/A identification schema since it appeared almost trivial. It didn't take long before I ran into "undefined" ground.  I thought I could use "Closed Choice of Integer" to define the part property (only one choice: 1) and "Closed Choice of Text" to define the conformance property (two choices: A or B). So, using the samples I found on the in the tech notes on the PDF/A Competence Center site, I set  out to create my first pdfaExtension schema.

Soon I discovered that how to define a "Choice" is not defined in these tech notes. Next step was to wade through XMP documentation at Adobe. This doesn't really help much because, being new to this domain, it is not easy to tell when something is specific to XMP, RDF or pdfaExtension. On page 62 of the XMP Specification a Closed Choice is described. A vocabulary and lists are mentioned.  I can only assume this means defining a list of values using Bag, Alt or Seq. An example would really help to clarify.

I'm all ears ..

(Here is my work in progress: sample.rdf)

Next Idea: pdfaValidate Schema
There are not a lot of examples out there. Simple examples showing how to define a Closed Choice field would be great. The same goes for defining "Property Qualifiers". From what I read in the XMP specification they would be an ideal solution for me:
"Property qualifiers allow values to be extended without breaking existing usage."
The specification has pretty block diagrams but no sample code.

In the absense of decent implementation documentation I decided to just take a swing at it and came up with something that I think is probably what the XMP Specification describes as "Property Qualifiers". I created an RDF schema with two properties for validation:
  1. status: Closed Choice of Text - required|prohibited|restricted|recommended|ignored
  2. constraint: Text - regular expression for constraining simple literal fields for PDF/A compliance.

Here it is as RDF. I included the definition of pdfaValidate schema and included a "constrained" version of my pdfaid RDF schema definition as an example: pdfaValidate.rdf

Now I have what I need to make simple "constrained" RDF definitions for all the pre-defined schemas that we need to validate for PDF/A compliance. Moving right along ..

No comments:

Post a Comment