Tuesday, February 3, 2009

XRef stream vs xref

That didn't take long! I've been urged to compromise on legacy features already.


Members of the PDF/A camp, including one of my software engineers (Sergey), are concerned that dropping the old style xref tables means that no PDF/A-1 file can possibly be PDF/D compliant.  I was looking forward with the hope that PDF/A-2 support would be good enough but they disagree.

So, I've decided to take a step back and allow old style xref tables to exist in PDF/D but with plenty of constraints:
  • only a single xref table (no Prev field in Trailer)
  • no hybrid files (no XRefStm in trailer)
  • no deleted objects (no f type in the xref table except for the first entry)
  • generation numbers always zero
  • only one section (implies consecutive object numbers starting at 1)
These simplifications mean that the end of the PDF file will always look like some variant of this:

xref
0 4
0000000000 65535 f 
0000000009 00000 n 
0000000122 00000 n 
0000000175 00000 n 
trailer
<<
  /Size 4
  /Root 2 0 R
>>
startxref
226
%%EOF

The other valid PDF/D entries in the Trailer are ID, Info and Encrypt.

As with my earlier PDF/D constraints, incremental updates and the dead objects that come with them are eliminated. So is linearization.

There you have it: the minimum required functionality of old style xref tables to make it possible for PDF/A-1 files to be PDF/D compliant.

2 comments:

  1. Disallowing linearization is a BAD IDEA! Linearization, while complex, is a VERY USEFUL (and underused) feature of PDF, especially when serving large documents on the web.

    I'd rather see you find an alternative than kill it entirely.

    ReplyDelete
  2. I've tried using FPDF/FPDI to text-stamp PDF docs. To my surprised... any PDF doc with xref-stream cannot be handled by FPDI library. PDF docs with xref-table are ok. Anyone have any idea how to modify FPDI functions to read/parse xref-stream binary? Thanks a bunch.

    ReplyDelete