User Tools

Site Tools


pdf:extending

This is an old revision of the document!


Adding a new object type

There are many object types in the PDF specification and only some are defined in the library yet. This page explains in detail how to add a new object type to the library and why you should do so.

Why should an object type be defined?

Why bother to define a type? The usual objects are dictionaries which can be processed and viewed as they are.

PDFExplorer with an untyped object

The attribute /BS of an /FreeText has an untyped dictionary.

To make it nice in the PDFExplorer!

PDFExplorer with a BorderStyle object

Adding an object type allows for a customized presentation with a printString, an icon, attribute documentation and order etc. (see below for all the details).

Attribute /W in the BorderStyle object

The attribute /W of a /BorderStyle object shows its documentation.

How to define a new type

Usually, you inspect a PDF with the PDFExplorer and find some object which is not documented. To define an object type it is important to have an example open in the PDFExplorer so that you can see the changes. In our example this is the object in the attribute /BS in a /FreeText annotation object.

In order to find out what the object is about, the relevant piece of documentation should be found in the PDF Specification. In our case this is a border style dictionary described in chapter “12.5.4 Border Styles” on page 386.

Add a new Smalltalk class

The new class can be defined with the ClassCreationDialog:

 The ClassCreationDialog for BorderStyle

Choose the package

The package is [PDF Interactive Features], because it is related to /Annot which is defined there.

Choose the namespace

This should be Graphics.PDF, since this is the only namespace for the runtime code of the library.

Choose a class name

As name for this example, I use BorderStyle. Ideally the name should be the same as used in the PDF specification.

Choose the superclass

Most often, this will be Dictionary or a TypedDictionary if the object has the common attribute /Type. It can also be something exotic as a PDFArray, a Name or someting else (see later).

Add a class comment

The first line should give the reference to the PDF specification, followed by the first paragraph of the description in the specification. I usually edit this text to add line breaks after sentences and remove any cross references to other parts of the specification:

PDF border style dictionary as defined in PDF 32000_2008.pdf, section 12.5.4, pp. 386.

An annotation may optionally be surrounded by a border when displayed or printed.
If present, the border shall be drawn completely inside the annotation rectangle.
In PDF 1.1, the characteristics of the border shall be specified by the Border entry in the annotation dictionary.
Beginning with PDF 1.2, the border characteristics for some types of annotations may instead be specified in a border style dictionary designated by the annotation’s BS entry.
Such dictionaries may also be used to specify the width and dash pattern for the lines drawn by line, square, circle, and ink annotations.
If neither the Border nor the BS entry is present, the border shall be drawn as a solid line with a width of 1 point.

Add class methods

Two more bits of information should be added as methods on the class side.

The documentationPlace defines the section in the PDF specification. This is a more recent addition intented to be able to jump directly to the corresponding place in the specification PDF from the code browser or the PDFExplorer. This has not been done yet and most objects don't have this method, but for new objects, I add it. Eventually, I will add this to all objects.

documentationPlace
	^#(12 5 4)

If the object type was not part of the original PDF specification 1.0, the version should be added. version notes the minor part of the PDF version in which this feature first occurred, allowing for computing the minimal version for a PDF.

version
	^2

The version is usually mentioned in the specification of the object. After I add this method, I remove the corresponding text from the class comment.

Reset the object types

Since a new type is defined, the object types have to be reset with

PDF resetObjecttypes.

This clears the cache for all the object types (Smalltalk classes - 137 at the time of writing). On next access, the cache is filled with all known types, including the newly defined ones, so that the new type can be found.

This has to be done only when a new class is defined.

Use the new type for the containing attributes

The new type can now be used. Therefore, the type of the attribute which contains the object should be set to the new type. In the example, in the method BS of class FreeText, the type: pragma should be changed from

BS
	<type: #Dictionary>
	<version: 6>
	<attribute: 9 documentation: 'A border style dictionary specifying the line width and dash pattern that shall be used in drawing the annotation’s border.
The annotation dictionary’s AP entry, if present, takes precedence over the BS entry'>
	^self objectAt: #BS ifAbsent: [Dictionary empty]

to

BS
	<type: #BorderStyle>
	<version: 6>
	<attribute: 9 documentation: 'A border style dictionary specifying the line width and dash pattern that shall be used in drawing the annotation’s border.
The annotation dictionary’s AP entry, if present, takes precedence over the BS entry'>
	^self objectAt: #BS ifAbsent: [BorderStyle empty]

The result looks like this in the PDFExplorer (after hitting F5 for refresh):

new object type /BorderStyle

the style is recognized as BorderStyle and it shows the right version (PDF-1.2), but the required field Type is red (error) and the W field is pink (not known).

Add attributes

Attributes are added as methods in protocol accessing entries named like the key in the definition, even with a capital letter, although this is not common Smalltalk style.

The first two attributes (of 4) look like this in the PDF specification:

Part of the BorderStyle attribute definition

The corresponding methods look like this:

Type
	<type: #Name>
	<attribute: 1 documentation: 'The type of PDF object that this dictionary describes.'>
	^self objectAt: #Type ifAbsent: [#Border asPDF]
W
	<type: #Number>
	<attribute: 2 documentation: 'The border width in points. If this value is 0, no border shall drawn.'>
	^self objectAt: #W ifAbsent: [1 asPDF]

An attribute method consists of a number of describing pragmas and the code for access.

The ''type:'' pragma

Mandatory is the <type: aSymbol> pragma: it takes the name symbol of the Smalltalk class implementing the PDF type. This is derived from the “Type” column of the definition table. For more information about typing and the possible type pragmas, see Typing.

The documentation pragma

The documentation is specified in the <attribute: anInteger documentation: aString> pragma. The first parameter defines the order of the attribute, so that they can be displayed in the same order as they are defined by the PDF specification. The first attribute shall be 1 and the next ones are numbered consecutively.

The documentation is taken directly from the specification and edited, so that all information is removed which is expressed directly in the method. In our example, the “(Optional)” is removed, because this is implied. If the attribute is required, the <required> pragma is used to express this fact.

The description of the default value is also removed, because this is evident from the access code.

Also references to other parts of the specification are removed (which is not the case in the example).

The access code

The access code can be either

	^self objectAt: #Type ifAbsent: [#Border asPDF]

for optional attributes with a default value, or

	^self objectAt: #Type

for a required attribute. This will raise an error if the attribute is not present in the object.

The method will return the object of the value of the attribute. The object is either stored directly in the attribute or a reference to it. In any case, the object is returned. To access the value (object or reference), the following methods can be used:

	^self at: #Type ifAbsent: [#Border asPDF]
	^self at: #Type

Customize an object type

docs, icon, string, attributes (docs, type, order, version, required)

/var/www/virtual/code4hl/html/dokuwiki/data/attic/pdf/extending.1458815653.txt.gz · Last modified: 2016/03/24 11:34 by christian