Optimize PDF files - Part II
Your applications can write incredibly small PDF files if you know what you're doing. This article is intended for programmers who create PDF files programmatically using custom routines. Read Part I if you are a user saving PDFs and also to gain a general understanding of PDF optimization.
Part I | Part II
Amaze your users by saving small, high-quality PDF files. Users expecting multimegabyte PDFs will be pleased to find out your application requires only tens of kilobytes, even just a few kilobytes, for a simple PDF report.
This article assumes you are writing PDF files programmatically. It further assumes you are doing this with a PDF writer module or class for which you have the source code so that you can actually fine tune the PDF output. You need to know quite a bit about the PDF file format to take advantage of these techniques.
The article focuses on PDF v1.3. The optimizations are potentially as useful with other versions as well. Get PDF Reference, Adobe Portable Document Format Version 1.3, to follow the tricks.
Let's start with the obvious optimization: fonts.
Optimization #1: Don't embed fonts
Did you know fonts don't need to be embedded in PDF? Font embedding is optional. The PDF standard allows you to use any font, whether or not it exists on the reader's machine. If a required font is not found, PDF reader applications use font metrics (in
Indeed, in many cases font embedding will unnecessarily bloat the file. Consider an application that creates reports, which are mainly used by the same user on the same PC. They will display perfectly well as long as they are on the same PC (unless the user happens to uninstall the required fonts). When common fonts are used, you have good chances the fonts will always show up correctly.
Optimization #2: Use standard fonts
PDF comes with 5 standard font families. The families are Times, Helvetica, Courier, Symbol and ZapfDingbats. All PDF readers support these standard fonts. Except for ZapfDingbats, the other fonts are similar to standard Windows fonts.
The standard fonts will not need embedding. That's why you can safely use them.
Optimized representation of text and numeric values
Don't bloat PDFs by representing text and numeric values with too many bytes. You can potentially do the same with less.
Optimization #3: Use PDFDocEncoding
This is a relatively small optimization, but simple enough. Text strings, such as
Note that PDFDocEncoding contains a wider range of characters than WinAnsiEncoding or MacRomanEncoding. It's good news for optimizers.
Optimization #4: Optimize number of decimal digits
This optimization is for representing all numeric values in PDF. Use only as few decimals as required. It's unnecessary to bloat the file with too many useless decimals.
Write a utility function that rounds values for you. Supposing you need 2 decimals precision, the function should round like this:
1.2345 → 1.23
Now we get to optimizing streams, the actual page content.
Optimization #5: Clip to viewable area
This rule is especially important if you're drawing a part of a larger graphic into PDF. When drawing graphics objects or text it's a good idea to check for page boundaries. If no part of the object will be visible, there's no point adding the respective drawing operations in the PDF file. The result will be invisible anyway. Besides, hidden data in a PDF is a security concern.
Optimization #6: Don't repeat operators unnecessarily
PDF keeps track on the currently selected color, line width, font and so on. You don't need to select the color each time you draw a line. Only set the color when it needs to change. The same goes for line width, line cap style and other drawing attributes. Keep track on the current attributes. Only change them when you need.
Optimization #7: Close polygons
When drawing a polygon, there is no need to draw the last edge (with the
Even better optimizations are available. Instead of
Optimization #8: Use shortcuts for splines
The default way to draw a spline curve from the current point to (x3,x3) with (x1,y1) and (x2,y2) as the control points is this:
If the current point and (x1,y1) are the same, there is a shorter form:
If points (x2,y2) and (x3,y3) are the same, use this shorter form:
Optimization #9: Use color shortcuts
The standard operators to set color are:
These operators take 3 values: Red, Green and Blue.
For black, gray or white colors you don't need the full RGB color space. Grayscale is enough. To select black, use one of these operators:
For white, use these:
You can do the same for any shade of gray. To select 0.123 gray, use one of the following:
Optimization #10: Compress
Compress streams to get the size down. Get a copy of the zlib library to do the compression for you. zlib is relatively straightforward to use.
Visual Basic note: VB6 cannot call the regular zlib.dll, but you can use zlibwapi.dll instead.
Small PDF samples
Here are small PDF samples with vector graphics and text. The graphic was originally created with Visustin.
Part I | Part II
Optimize PDF files - Part II