Wednesday, March 4, 2009

Flatness: Ambiguity in ISO 32000-1

From the ISO 32000-1 specification:

Table 53, 8.4.1 describing initialization of graphic state at the start of each page:

The precision with which curves shall be rendered on the output device (see 10.6.2, "Flatness Tolerance"). The value of this parameter (positive number) gives the maximum error tolerance, measured in output device pixels; smaller numbers give smoother curves at the expense of more computation and memory use. Initial value: 1.0. 

Table 57, 8.4.4 describing the "i" operator:

Set the flatness tolerance in the graphics state (see 10.6.2, "Flatness Tolerance"). flatness is a number in the range 0 to 100; a value of 0 shall specify the output device's default flatness tolerance.

Table 58, 8.4.5 describing the graphic state parameter dictionary entry

FL:

Number, (Optional; PDF 1.3) The flatness tolerance (see 10.6.2, "Flatness Tolerance").

10.6.2 Flatness Tolerance

The flatness tolerance controls the maximum permitted distance in device pixels between the mathematically correct path and an approximation constructed from straight line segments, as shown in Figure 54. Flatness may be specified as the operand of the i operator (see Table 57) or as the value of the FL entry in a graphics state parameter dictionary (see Table 58). It shall be a positive number.

Observation:

It appears to me that the above clauses are referring to exactly the same thing. If that is correct, then the range and default value for flatness tolerance is ambiguous:

Either the default is 1.0 or it is 0: pick one.

Either the range is 0 to 100, or is a positive number (any value > 0): pick one.

Comments?

Thursday, February 26, 2009

Anomalous Situations - Best Practices

PDF ISO-32000 has a note in clause 12.6.2 that is just dying to get the PDF/D Best Practices treatment:


"Conforming readers should attempt to provide reasonable behavior in anomalous situations. For example, self-referential actions should not be executed more than once, and actions that close the document or otherwise render the next action impossible should terminate the execution sequence."

How about insisting that the Next entry in Action dictionaries shall only contain acyclic graphs of actions?  When would endless loops of action sequences ever be a good thing?

Preferred prefix for Colorant Basic Value Type

xapG vs xmpG

















I Googled Adobe's site for clarification on this change, hoping to find a note on the subject: nada.
For the purposes of our XMP validator we're obviously going to assume that the most recent version is correct. The reason I made this blog post is so that it will pop up in Google when the next person stumbles into this question, wondering if it is a typo or a deliberate change.

Wednesday, February 25, 2009

Open Source PDF/A RDF Schemas

Inspired by the Isartor test set for validating PDF/A compliance we are working on a similar style set of negative tests for basic XMP compliance (PDF/A XMP TechNotes).


While it is clear that this work needs to be done, nobody appears to be tackling it. PDF/A 19005-1 is now heading into its 3rd year so we're attempting to fill this gap.

While each vendor will obviously implement their own XMP validator for PDF/A validation and conversion, there are some areas where we can easily collaborate. We believe that it is in all our interests to openly share an RDF and PDF/A compliant XMP implementation of the pre-defined schemas required to validate PDF/A files.

Today we released our first version of the PDF/A pre-defined schemas in RDF form. You can find these resources at the PDF/D website.

Monday, February 23, 2009

Isartor Truth

As promised, we've posted more tools for standardized compliance testing.


Today we added:
- Isartor Truth: an XML file with the expected results of the Isartor PDF/A tests
- CompareReports.exe: a tool to compare the above truth file to output from a validator

For more on our efforts to improve mechanical comparison of compliance testing reports, please visit the PDF/D site.

Friday, February 20, 2009

XMP: bag vs Bag, seq vs Seq

The RDF specification clearly uses "Bag", "Alt" and "Seq" for the names of these container elements. This is a requirement for the names of these array container elements:

rdf:Bag, rdf:Alt and rdf:Seq

Starting with the XMP Specification Part 1, the use of "bag " (as in "bag Text") was introduced as a notation to describe array types in schemas. This document is consistent in using the lowercase variant for type descriptions only.

I believe that the titlecase variant of this notation, first seen in XMP Specification Part 2, was introduced in error (example: XMP Media Management property definition for xmpMM:Ingredients is "Bag ResourceRef").  

This inconsistency really didn't matter while it was limited to being used as a notation format only in documentation. The arrival of PDF/A extension schemas changed all that. Specifically, as mentioned in TechNote 0009 clause 4.5, this notation is now used in the PDF/A extension schemas for the pdfaProperty:valueType and the pdfaField:valueType properties.

Our validator will support both variants but will generate warnings for the titlecase version. In other words, we are recommending the use of the lowercase variants as a best practice for PDF/D.

XMP pdfaValidate Schema

In building our new and improved validator we decided to use the pdfaExtension schema (and friends) to define all the schemas we are validating including all the pre-defined schemas. This process of eating our own dogfood has exposed numerous holes in both the XMP Specification and the PDF/A Specification.


The most obvious hole, which has already been discussed within the PDF/A Competence Center Working Group (TWG), is the loose nature of the definition of basic types in XMP. As mentioned earlier in my blog, one example is "Choice of " and "Open Choice of ". Another issue raised in TWG discussions is the ambigious use of case (seq vs Seg, bag vs Bag, etc).

The XMP Specification makes provision for extending existing Properties with Qualifier Properties that are ignored by applications that are not aware of them. We used this feature and the pdfaValidate schema to extend pdfaProperty and add validation information. When defining the schemas we wish to validate, we now add the following attributes:

status
Description: used by validator to flag errors of omission, inclusion or raise warnings.
Type: Closed Choice of Text
Values: required|prohibited|deprecated|restricted|recommended|ignored
'deprecated' is similar to 'prohibited' only it is flagged as a warning and not an error by validators.

constraint
Description: Regular expression used to constrain "Closed Choice of " values. We still need a way to flag Open vs Closed.
Regular expressions always need to match all input (start with '^' and end with '$'). Other valid constraint values include:
'base64': used to validate Thumbnail xapGImg:image property for example.
Numeric ranges like: '[0,255]',  '(0,)', '[-128,127]', etc.
Type: Text

standard
Description: This value determines which specification is violated when constraints are not met.
Type: Closed Choice of Text
Values: pdf|pdfa|pdfd|xmp

clause
Description: This is the clause in the specification which is violated when constraints are not met.
Type: Text
Value: string, typically dot delimited integers

We are continuing to work on our full set of these schemas for validation of PDF/A. These will then be available to PDF/D Consortium members. During this process, we may add more features to the pdfaValidate schema.