Monday, February 9, 2009

More on Numbers

Earlier I discussed Numbers in a general post about improving PDF for easier parsing.


I have two more notes to add on the subject of numbers.

"." is not a number

PDF ISO-32000-1:2008 states that:

A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).

Adobe Acrobat Reader 9 clearly ignores this and accepts a single period as zero. This example (7-3-3-t01-fail-b.pdf) from our PDF 1.4 test set clearly shows that the colors red (on the RGB page) and black (on the CMYK page) were parsed with no problem.

1 0 . rg 72 72 72 72 re f
0 1 0 rg 72 216 72 72 re f
0 0 1 rg 72 360 72 72 re f

..

0 1 1 rg 72 72 72 72 re f
1 0 1 rg 72 216 72 72 re f
1 1 0 rg 72 360 72 72 re f
. 0 0 rg 72 504 72 72 re f

Numbers in PDF/D

In addition to earlier notes on parsing numbers, the above behavior will be considered an error in PDF/D. 

Also, in our 10,000's of test files we have often seen number arguments terminated in content streams by the operator like this:

... 2 0 0 2 0 0cm ...

Acrobat does not tolerate this but we have seen other PDF software (including our own) look past this error. PDF/D will require delimiters or whitespace to terminate number tokens.

2 comments:

  1. You also need to apply this same test/validation to generic Cos objects. For example, consider the following array

    [. . 612 792]

    Some parsers also consider that as valid and treats the '.'s as 0s...

    Leonard

    ReplyDelete
  2. Thanks. I just tested with Acrobat Reader 9:

    /MediaBox [. . 612 792] works
    /MediaBox [.. 612 792] does not

    The 2nd variant crashed Solid PDF Tools. ;-)

    ReplyDelete