User Tools

Site Tools


pdf:images

Images

Bitmapped images are described in chapter 8.9 Images on page 203 of the PDF specification on quite readable 14 pages.

The implementation is in package PDF Images.

Usage

Sampled or bitmapped images are represented by PDF.ImageXObject objects.

An ImageXObject is created by sending asPDF to a Smalltalk image object.

The ImageXObject is rendered in the rendering block with: paintXObject:.

PDF images are always rendered into a unit square (1 by 1 point by default). Therefore the scaling should be adjusted by changing the current transformation matrix. Images, as any other graphics in PDF, are projected using the current transformation matrix. This means that images can be freely rotated, reflected and skewed.

For this not to affect other graphics, the code should be wrapped by isolatedDo:.

The complete basic code to put anImage onto a PDF page is:

page := Page newInBounds: (0 @ 0 corner: 100 @ 100) colorspace: DeviceRGB new render: [:renderer |
  renderer isolatedDo: [
    "scale by the extent for pixels to have unit size"
    renderer concat: ((Matrix scaling: anImage extent) translatedBy: 10 @ 10).
    renderer paintXObject: anImage asPDF]].

demo20_imagesusage.pdf shows the result with an example image.

demo21_images.pdf shows some PDF features of images (masking, rotation, interpolation, inverting and alpha blended images).

Object Models

Conceptually, an image is a rectangular array of colored pixels. The array is defined by width and height. Pixels are accessed with zero-based coordinates with the first pixel (0 @ 0) at the top left corner and the last (w-1 @ h-1) at the bottom right corner. The pixels are organized by rows.

Pixels consist of bits representing a color. The color of a pixel can be read by

aColorValue := anImage valueAtPoint: columnIndex @ rowIndex.

and set with

anImage valueAtPoint: columnIndex @ rowIndex put: aColorValue.

Technically, images in Smalltalk and PDF differ.

Images in Smalltalk

Image
  bits: aByteArray
  width: anInteger
  height: anInteger
  bitsPerPixel: anInteger  "1, 2, 4, 8, 16, 24 or 32"
  depth: anInteger
  palette: aPalette

A Smalltalk image stores the pixels in the variable bits as a ByteArray. bitsPerPixel defines how many bits one pixel occupies. This variable is redundant because there is one subclass for each of the permitted values (1, 2, 4, 8, 16, 24 and 32 bits). The rows of the image are stored one after the other in the bits byte array. The number of bytes in a row is a multiple of 4 bytes (32 bits). That means, that a row with 1 pixel with 1 bit would occupy 4 bytes.

The depth specifies how many of the bitsPerPixel are actually used for encoding one color. Take, for example a 16 bits per pixel image where the red, green and blue components are encoded with 5 bits. This image would have a depth of 15 and 1 bit per pixel is unused.

The palette translates between the bits of the pixel to a color. Either, the bits are interpreted directly as RGB value, as in the above example, or the bits are used as an index into a list of colors.

Colors in Smalltalk are always RGB colors, even for gray colors, or coverage values for masks.

Images in PDF

<<  /Type /XObject
    /Subtype /Image
    /Width anInteger
    /Height anInteger
    /BitsPerComponent anInteger  % 1, 2, 4, 8 or 16
    /ColorSpace aColourSpace  >>
stream
aByteString
endstream

In PDF images are defined by the number of bits per color component (1, 2, 4, 8 or 16 bit). The colorspace defines the number of components and their layout for a pixel. Any colorspace permitted in PDF can be used (see Colors). For images converted from Smalltalk, only /DeviceRGB (3 components), /DeviceGray (1 component) and /Indexed (1 component) are relevant. The /Indexed colorspace in PDF can only hold up to 255 colors.

A PDF.ImageXObject is a stream whose contents is a byte string with the pixel bits. A row ends at the byte boundary. A row with one pixel with 3 components of 1 bits (= 3 bits), uses 1 byte (8 bit) for the row. As any stream, ImageXObjects can be compressed using filters. By default, /FlateDecode (zip) is used.

The optional attribute /Decode, defining min and max for each component, is used to interpolate the pixel bits. A RGB pixel with 5 bits per component would have a /Decode array to limit the range. Still, 8 bits are used per component so that 3×3=9 bits are unused per pixel. /Decode can also be used to invert the interpretation of the bits.

Masked Images

Useful images often have a mask, allowing only the masked pixels to be drawn. Smalltalk has the class Graphics.OpaqueImage for this. An OpaqueImage contains a figure for the image and a shape for the mask. The mask is a 1 bit image with the same dimension as the figure and a CoveragePalette which interprets 0 as transparent (the background is drawn) and 1 as opaque (the pixel of the figure ist drawn).

In PDF, an ImageXObject has an optional attribute /Mask which can hold a mask. The mask is a 1 bit /DeviceGray ImageXObject with the optional attribute /ImageMask set to true and without a /ColorSpace attribute. Interestingly, the resolution of the mask need not be the same as the resolution of the containing image. Since PDF interprets the bits in the opposite way (“0 shall mark the page with the current colour, and a 1 shall leave the previous contents unchanged”), converted masks have a /Decode array of [1 0] instead of the default [0 1].

When converting an OpaqueImage to PDF with asPDF, an ImageXObject with a mask is created. Conversely, converting an ImageXObject with a mask with asSmalltalkValue will produce an OpaqueImage. There are also UI.Icon objects with a similar layout as OpaqueImage. Icons do understand asPDF, but the reverse conversion will result in an OpaqueImage (from which an icon can be created easily).

Alpha Blended Images

Images with gradual transparency are Graphics.AlphaCompositedImages in Smalltalk. They use a Depth32Image with 1 byte for alpha and 3 bytes for RGB. PDF.ImageXObjects use the optional attribute /SMask to hold a 8 bit /DeviceGray /SoftMaskImage for the alpha information.

Implementation

The conversion methods are implemented in the Graphics.Image hierarchy. To convert a Smalltalk image to PDF with asPDF, the method writePixelsTo: anImageXObject transfers the actual pixels from the Smalltalk to the PDF image. The other direction uses the method readPixelsFrom: anImageXObject which transfers the PDF pixels to the Smalltalk image.

Work sketch for bit fiddling

The default behavior is to transfer the pixels one by one. For each pixel, the bits are read from the specified location in the source image bytes and interpreted as color (valueAtPoint:). This color is then converted to the target bits which are written to the specified location in the target image bytes (valueAtPoint:put:). While these two pixel accessors are correct and well tested for any kind of image, they are very slow.

This default implementation (readPixelsByPixelFrom: and writePixelsByPixelTo:) is the reference for testing and the baseline for performance benchmarks (see the class ImageConversionBenchmarks in package [PDF Development]).

Some conversions can be greatly sped up (one or two orders of magnitude) by exploiting the internal byte organization of the image bits and transfering them directly. While this is possible for many useful forms, it is not possible in general (a Smalltalk image with a palette of more than 255 colors, for example).

The following conversions are currently optimized:

  • Depth1Image for Black and white images and masks
  • Depth24Image with 8 bit RGB
  • Depth32Image for 8 bit RGB and BGR images taken from the Screen (the first byte is always zero)
  • Depth32Image for 8 bit ARGB and ABGR
  • Depth{2 4 8)Image with a MappedPalette.

The direct conversion of an image with a mapped palette is special. Since RGB color components are represented with 13 bits in Smalltalk, but using 8 bits in PDF, a Smalltalk palette may have more than one entry for one 8 bit RGB color. This is correctly handled when converting the image pixel by pixel, because each color is stored as 8 bit color in the PDF /Indexed colorspace, thereby aligning different 3×13 bit colors to the same 3×8 bit color.

When converting such image optimized by converting the palette and using the same indexes for the pixels allowing direct reuse of the image bytes, the /Indexed colorspace may contain several entries for the same color. Converting such an ImageXObject back to Smalltalk will not recreate the least significant 5 bits leading to slightly different colors as in the original. But for 8 bit RGB usage, it will not make any difference. Although this does not feel proper, it will not make much difference in practice. But the speed up of the optimization is worth it.

To be done

Filter

Although all Smalltalk images can be used for PDF, not all PDF images can be transformed to Smalltalk images. For one, several filters specific to images are not implemented:

  • RunLengthDecode 8 bit monochrome images
  • CCITTFaxDecode CCITT encoded 1 bit monochrome images
  • JBIG2Decode JBIG2 encoded 1 bit monochrome images
  • DCTDecode JPEG encoded 8 bit grayscale or color images
  • JPXDecode JPEG2000 encoded grayscale or color images.

This means that it is not possible to extract such images from PDF. Nor is it possible to store images in the most efficient way in a PDF. This feature is valuable and I hope to implement some of the filters in the not too distant future.

Secondly, PDF can have images in other colorspaces than RGB or Grayscale; most notable is /DeviceCMYK for print. For correctly extracting such images, proper color conversions to RGB need to be implemented. This feature is not intersting to me at the moment.

Inlined Images

Images in PDF can be inlined in the /Contents stream instead of storing them in the /Resources as /XObject. Only a subset of legal PDF images can be inlined and it is discouraged for large images. Even though, I have not seen such image in a real-world PDF, this feature should be implemented for completeness.

/var/www/virtual/code4hl/html/dokuwiki/data/pages/pdf/images.txt · Last modified: 2016/03/02 15:19 by christian