To change an existing PDF, the file has to be read with:
aFile := PDF.File read: <aFilename>
The file can then be converted to a Document, an object which can write itself as a PDF file by:
aDocument := aFile asDocument
After changing things in aDocument
, the PDF file is written out with:
<aDocument> saveAs: <aFilename>
The class PDF.File is for reading PDFs from files. It does so incrementally by just reading objects from disk when they are needed. On can see that in the PDFExplorer:
where 729 of 125179 objects have been read.
The inital object read from a PDF is the /Trailer
. Apart from some internal bookkeeping attributes, a trailer contains the /Root
with a reference to the Catalog (the contents of the PDF), the /Info
and the /ID
. This trailer is held by a File in the #trailer
instance variable.
The cloning of the PDF is done in the File»asDocument
method:
asDocument "<Document> a new document with the same contents as the receiver for writing out the PDF later" | newDocument info | newDocument := Document new. newDocument root: self trailer Root. info := self trailer Info. info at: #ModDate put: Timestamp now. info at: #Producer put: PDF producerText. newDocument info: info. newDocument previousId: self trailer ID. ^newDocument
For the new document , we just take the /Root
, /Info
and /ID
attributes from the file just read. The /Info
is modified by setting the modification time and overwriting /Producer
with the name and version of the library.
The /ID
needs special treatment. It is an array with two hash values, where the first identifies the original PDF (there both hashes were the same), while the second changes with every change of the document. Some workflows identify different versions of a document by their first ID value. Therefore, it should be preserved by the new document, which is why we store the old /ID
as #previousId
in the new document.
Finally, when writing the new document, all references from /Root
are followed, possibly read in on the fly by the File
object, and then written to the new file. Therefore, the original file should not be closed before the new document has been written out.
In the demos 12 and 13 (package “PDF Development”, class Document #demo12_copyPagesToNewPDF and #demo13_splitPDF), selected objects, pages, are copied to new PDFs. With #asDocument, all other /Catalog attributes like /Outlines, /Metadata and other document related information are copied over to a new PDF.