Optimize PDF files - Part I
Large PDF files are slow to download and they consume too much bandwidth. Create smaller PDF files by following a few easy rules. You can possibly decrease the size of PDF files by several hundreds of kilobytes. This article also goes into the details of using the Save As PDF command in Word 2007 and 2010.
Part I | Part II
PDF files are often unnecessarily large. It can take hundreds of kilobytes or even megabytes to represent a small amount of actual content. Downloads run at a snail's pace. Web servers are running hot. Bandwidth costs are exceeded and web sites become slow to respond.
What's causing the PDF bloat? The bulk of a regular PDF file consists primarily of text, images and fonts.
Obviously, you could optimize by writing less text! This isn't what we're after. Effective optimization means cutting down the font and image parts. It is possible to cut off the font part and decrease the image part while keeping the useful textual data intact (or almost intact). This article shows you how.
There are several ways to produce PDF files. One can use a PDF printer driver, for example. This article focuses on the Save As PDF command in Microsoft Word 2007 and 2010. Many of the tricks are also applicable to other PDF writers. Because PDF writers differ in the details, you need to experiment to find out how the rules work with your PDF writer.
Saving as PDF is a built-in feature in Word 2010. To enable it with Word 2007, you may need a free add-in from Microsoft. The add-in is titled 2007 Microsoft Office Add-in: Microsoft Save as PDF or XPS. You can download it on Microsoft's web site.
Font issues are crucial to PDF optimization. A simple PDF may easily store like 200 kB of font data. It is possible to go without storing any font data at all. By designing your font use in advance you get stylish and smaller files.
Rule #1: Use standard fonts
PDF comes with 5 standard font families. The families are Times, Helvetica, Courier, Symbol and ZapfDingbats. All PDF readers support these standard fonts.
For all other fonts, PDF writers normally embed the font data in the PDF file. Embedding means copying. The file includes a copy of the entire font, or a part of it. When a Garamond font is used, for example, the font glyphs get copied in the PDF. This consumes a lot of space.
To tell which fonts exist in a PDF file, select Properties in the File menu of Adobe Reader. Open the Fonts tab. Here you see the fonts used in the currently open PDF. Fonts marked as (Embedded) or (Embedded Subset) have been embedded in the file. Other fonts were not embedded. As a rule of thumb, the 5 standard fonts are not usually embedded, while all others are. One can, however, embed standard fonts, or not embed the other fonts. This depends on the capabilities of the PDF writer application. With Word 2007/2010, you have the option to embed all fonts, or embed everything else but 2 standard fonts. We will go into the details soon.
To save space, use the PDF standard fonts. As it happens, they are not installed on Windows (other than Symbol). Fortunately, similar fonts do exist and PDF writer applications are aware of the similarities. You can use Times New Roman in place of Times and Arial in place of Helvetica. The standard fonts and their Windows replacements are listed in the following table.
In Word you can safely use Times New Roman and Arial. Your PDF will use Times and Helvetica, the standard fonts, consuming as few bytes as possible.
Unfortunately this is not true for Courier, Symbol or ZapfDingbats. Word will always embed Courier, Courier New, Symbol and ZapfDingbats. It wouldn't be necessary, really, but Word does that. Too bad!
Other PDF writers than Word may well support all the standard fonts, including Courier New, Symbol and ZapfDingbats. By creating the PDF with a PDF printer driver you can possibly get away without embedding Courier, Symbol or ZapfDingbats.
Rule #2: Use fewer fonts
When Times New Roman (Times) and Arial (Helvetica) are not enough, you will end up embedding font data into PDF. This would be just perfect, but it adds a minimum of tens of kilobytes per each font used.
It pays off to use as few fonts as possible. Using just a few fonts will produce visually appealing output too. A good number of font families (font names) is usually 1 to 3 per document. Use one font family for body text, maybe another for headings. A third font family may be in place for image captions or special effects. It is perfectly OK to use just one font family for everything. Overuse of different fonts makes a document look inconsistent. What is more, it bloats the file.
Rule #3: Use fewer font styles
It is important to notice that Regular, Italic, Bold and Bold Italic are different fonts to PDF. Each of them will need to be embedded separately. If you use all the 4 styles, you end up embedding the font data 4 times: the Regular, Italic, Bold and Bold Italic font data.
Use as few styles as possible to keep the file size down. To emphasize text, use either Italic or Bold. Don't mix both. Pick your preference and be consistent. You don't want your documents look like mixed character soup anyway. Readers like a consistent style with few but carefully chosen effects.
Italic, Bold and Bold Italic are expensive ways to emphasize text. Fortunately, there are some free styles too. It doesn't add many bytes to change the font size, write in a different color or add an underline. You can use Small Caps or adjust the letter spacing. To emphasize a block of text, indentation can be used.
When appropriate, use these effects in place of Italic or Bold. They may not always be the style you want, though. It's a size vs. style trade-off.
As a practical example, consider heading styles. Many documents have 3 or 4 levels of headings in a specific heading font. Utilizing different combinations of Italic and Bold for the various levels not only makes the document look inconsistent, but it also adds to the file size. Try varying the font sizes and colors instead. Perhaps you can use a horizontal ruler too. Your document will become stylish and optimized at the same time.
A way to get rid of embedding italic or bold fonts is to use an italic or bold version of either Times New Roman (Times) or Arial (Helvetica).
If the body text is in Garamond, you can emphasize with Times New Roman Italic, for example. You could even use Arial Italic, depending on your taste. This saves you from the need of embedding Garamond Italic.
Alternatively, use the heading font to emphasize within body text. Reusing the same font doesn't add anything to the file size.
Rule #4: Use smaller fonts
Some fonts consume more bytes than others. As an example, embedded Consolas produces a smaller file than embedded Courier New.
Switch fonts to find one that creates a small file. You need to experiment to find a small and stylish font.
Rule #5: Avoid special characters
When writing text using the standard fonts Times New Roman or Arial, it pays off to use PDF standard characters. As long as you use these "safe" characters, you avoid font embedding.
The opposite happens with special, non-standard characters. They will force font embedding. This happens even if the font is Times New Roman or Arial. What exactly counts as a standard or a special character depends on the PDF writer application. Next we will consider the way Word behaves.
Standard characters. With Word 2007/2010, the standard or "safe" characters consist of the ASCII characters and the Unicode block Latin-1 Supplement. These characters are enough to write English and many European languages—mostly. Here are the safe characters:
Special characters. All other characters are special, or "unsafe". They require font embedding.
The above safe character list is quite near to the Windows Western character set (Windows-1252 codepage), but not identical. Some of the Windows-1252 characters are not safe. Unfortunately there are some common and useful characters in this group. The following "unsafe" characters will be embedded:
This is not a complete list of unsafe characters. All others are unsafe as well.
We will now have a closer look at some of these common but unsafe punctuation and typographic characters. Word embeds them even though they are technically PDF standard characters. Use of the following characters will cause font embedding. Fortunately, there are replacements available.
When you do want to use some of the special characters, try using them in a single font only. This removes the need to embed them several times. As an example, the typographic dashes are identical in a regular and an italic font. Using them in one style only will save a little disk space.
As one might guess, the less images in a document, the smaller the file. Use as few as required.
Vector images work much better than bitmap images. A vector image takes less disk space and produces better quality output, both on the screen and on paper. Vector images draw in the maximum available resolution, while bitmaps come with a preset resolution. Since the resolution of the display and the printer are different, bitmaps are not an ideal choice for PDF.
When you have to use bitmaps, try keeping them as small as possible. Try monochrome bitmaps instead of color bitmaps.
Considering file size, what are the best options for saving as PDF with Word 2007/2010?
PDF optimizer utilities
There are some utilities for PDF optimization, even free ones. Such utilities often apply compression to the file. This can be a useful additional step. It doesn't do away with the need to deal with the font and image data, though. Therefore, to get the smallest file, follow these optimization rules and then use a PDF optimizer.
Sample PDF files
To prove the point, here are two PDF samples. Both files were produced with Word 2007 with the same settings. No additional applications were used.
The difference is 267 kB in such a short document! Careless use of fonts and styles resulted in a 4500% increase in the file size. Considering large documents, think how much you can save!
In Part II we are going deeper into PDF optimization on a detailed technical level. Read on if you are a developer creating PDF files programmatically.
Part I | Part II
Optimize PDF files - Part I