User Tools

Site Tools


pdf:changing

Changing existing PDFs

To change an existing PDF, the file has to be read with:

aFile := PDF.File read: <aFilename>

The file can then be converted to a Document, an object which can write itself as a PDF file by:

aDocument := aFile asDocument

After changing things in aDocument, the PDF file is written out with:

<aDocument> saveAs: <aFilename>

Details

The class PDF.File is for reading PDFs from files. It does so incrementally by just reading objects from disk when they are needed. On can see that in the PDFExplorer:

729 of 125179 objects have been read

where 729 of 125179 objects have been read.

The inital object read from a PDF is the /Trailer. Apart from some internal bookkeeping attributes, a trailer contains the /Root with a reference to the Catalog (the contents of the PDF), the /Info and the /ID. This trailer is held by a File in the #trailer instance variable.

The cloning of the PDF is done in the File»asDocument method:

asDocument
	"<Document>
	a new document with the same contents as the receiver for writing out the PDF later"
 
	| newDocument info |
	newDocument := Document new.
	newDocument root: self trailer Root.
	info := self trailer Info.
	info at: #ModDate put: Timestamp now.
	info at: #Producer put: PDF producerText.
	newDocument info: info.
	newDocument previousId: self trailer ID.
	^newDocument

For the new document , we just take the /Root, /Info and /ID attributes from the file just read. The /Info is modified by setting the modification time and overwriting /Producer with the name and version of the library.

The /ID needs special treatment. It is an array with two hash values, where the first identifies the original PDF (there both hashes were the same), while the second changes with every change of the document. Some workflows identify different versions of a document by their first ID value. Therefore, it should be preserved by the new document, which is why we store the old /ID as #previousId in the new document.

Finally, when writing the new document, all references from /Root are followed, possibly read in on the fly by the File object, and then written to the new file. Therefore, the original file should not be closed before the new document has been written out.

In the demos 12 and 13 (package “PDF Development”, class Document #demo12_copyPagesToNewPDF and #demo13_splitPDF), selected objects, pages, are copied to new PDFs. With #asDocument, all other /Catalog attributes like /Outlines, /Metadata and other document related information are copied over to a new PDF.

/var/www/virtual/code4hl/html/dokuwiki/data/pages/pdf/changing.txt · Last modified: 2016/06/02 16:39 by christian