Earlier I discussed Numbers in a general post about improving PDF for easier parsing.
I have two more notes to add on the subject of numbers.
"." is not a number
PDF ISO-32000-1:2008 states that:
A real value shall be written as one or more decimal digits with an optional sign and a leading, trailing, or embedded PERIOD (2Eh) (decimal point).
Adobe Acrobat Reader 9 clearly ignores this and accepts a single period as zero. This example (7-3-3-t01-fail-b.pdf) from our PDF 1.4 test set clearly shows that the colors red (on the RGB page) and black (on the CMYK page) were parsed with no problem.
1 0 . rg 72 72 72 72 re f
0 1 0 rg 72 216 72 72 re f
0 0 1 rg 72 360 72 72 re f
..
0 1 1 rg 72 72 72 72 re f
1 0 1 rg 72 216 72 72 re f
1 1 0 rg 72 360 72 72 re f
. 0 0 rg 72 504 72 72 re f
Numbers in PDF/D
In addition to earlier notes on parsing numbers, the above behavior will be considered an error in PDF/D.
Also, in our 10,000's of test files we have often seen number arguments terminated in content streams by the operator like this:
... 2 0 0 2 0 0cm ...
Acrobat does not tolerate this but we have seen other PDF software (including our own) look past this error. PDF/D will require delimiters or whitespace to terminate number tokens.
You also need to apply this same test/validation to generic Cos objects. For example, consider the following array
ReplyDelete[. . 612 792]
Some parsers also consider that as valid and treats the '.'s as 0s...
Leonard
Thanks. I just tested with Acrobat Reader 9:
ReplyDelete/MediaBox [. . 612 792] works
/MediaBox [.. 612 792] does not
The 2nd variant crashed Solid PDF Tools. ;-)