|
Optimize PDF files - Part IIYour applications can write incredibly small PDF files if you know what you're doing. This article is intended for programmers who create PDF files programmatically using custom routines. Read Part I if you are a user saving PDFs and also to gain a general understanding of PDF optimization. Part I | Part II Amaze your users by saving small, high-quality PDF files. Users expecting multimegabyte PDFs will be pleased to find out your application requires only tens of kilobytes, even just a few kilobytes, for a simple PDF report. This article assumes you are writing PDF files programmatically. It further assumes you are doing this with a PDF writer module or class for which you have the source code so that you can actually fine tune the PDF output. You need to know quite a bit about the PDF file format to take advantage of these techniques. The article focuses on PDF v1.3. The optimizations are potentially as useful with other versions as well. Get PDF Reference, Adobe Portable Document Format Version 1.3, to follow the tricks. Font optimizationLet's start with the obvious optimization: fonts. Optimization #1: Don't embed fontsDid you know fonts don't need to be embedded in PDF? Font embedding is optional. The PDF standard allows you to use any font, whether or not it exists on the reader's machine. If a required font is not found, PDF reader applications use font metrics (in /FontDescriptor) to find a reasonable replacement font. Indeed, in many cases font embedding will unnecessarily bloat the file. Consider an application that creates reports, which are mainly used by the same user on the same PC. They will display perfectly well as long as they are on the same PC (unless the user happens to uninstall the required fonts). When common fonts are used, you have good chances the fonts will always show up correctly. Optimization #2: Use standard fontsPDF comes with 5 standard font families. The families are Times, Helvetica, Courier, Symbol and ZapfDingbats. All PDF readers support these standard fonts. Except for ZapfDingbats, the other fonts are similar to standard Windows fonts. The standard fonts will not need embedding. That's why you can safely use them.
Optimized representation of text and numeric valuesDon't bloat PDFs by representing text and numeric values with too many bytes. You can potentially do the same with less. Optimization #3: Use PDFDocEncodingThis is a relatively small optimization, but simple enough. Text strings, such as /Subject in file info or /Title in /Outlines, can be in either Unicode or PDFDocEncoding. Unicode takes twice the space: two bytes per character compared to one byte with PDFDocEncoding. Use Unicode only when the content cannot be represented in PDFDocEncoding. Note that PDFDocEncoding contains a wider range of characters than WinAnsiEncoding or MacRomanEncoding. It's good news for optimizers. Optimization #4: Optimize number of decimal digitsThis optimization is for representing all numeric values in PDF. Use only as few decimals as required. It's unnecessary to bloat the file with too many useless decimals. Write a utility function that rounds values for you. Supposing you need 2 decimals precision, the function should round like this: Stream optimizationNow we get to optimizing streams, the actual page content. Optimization #5: Clip to viewable areaThis rule is especially important if you're drawing a part of a larger graphic into PDF. When drawing graphics objects or text it's a good idea to check for page boundaries. If no part of the object will be visible, there's no point adding the respective drawing operations in the PDF file. The result will be invisible anyway. Besides, hidden data in a PDF is a security concern. Optimization #6: Don't repeat operators unnecessarilyPDF keeps track on the currently selected color, line width, font and so on. You don't need to select the color each time you draw a line. Only set the color when it needs to change. The same goes for line width, line cap style and other drawing attributes. Keep track on the current attributes. Only change them when you need. Optimization #7: Close polygonsWhen drawing a polygon, there is no need to draw the last edge (with the Even better optimizations are available. Instead of Optimization #8: Use shortcuts for splinesThe default way to draw a spline curve from the current point to (x3,x3) with (x1,y1) and (x2,y2) as the control points is this: x1 y1 x2 y2 x3 y3 c If the current point and (x1,y1) are the same, there is a shorter form: x2 y2 x3 y3 v If points (x2,y2) and (x3,y3) are the same, use this shorter form: x1 y1 x3 y3 y Optimization #9: Use color shortcutsThe standard operators to set color are: 0.123 0.123 0.123 rg 0.123 0.123 0.123 RG These operators take 3 values: Red, Green and Blue. For black, gray or white colors you don't need the full RGB color space. Grayscale is enough. To select black, use one of these operators: 0 g 0 G For white, use these: 1 g 1 G You can do the same for any shade of gray. To select 0.123 gray, use one of the following: 0.123 g 0.123 G Optimization #10: CompressCompress streams to get the size down. Get a copy of the zlib Visual Basic note: VB6 cannot call the regular zlib.dll, but you can use zlibwapi.dll instead. Small PDF samplesHere are small PDF samples with vector graphics and text. The graphic was originally created with Visustin.
Part I | Part II Optimize PDF files - Part II ©Aivosto Oy -
|