Single and multi-page PDF files from one or more TIFF files with free open-source software

Robin Whittle  12 August 2008

Back to the main First Principles page for all sorts of things.

Here is my cheatsheet on using Ghostscript commands to convert TIFF files into PDFs, on Debian 4.0.  

You might find this page better: http://www.troubleshooters.com/linux/gs.htm

You may be able to do the same under Mac OSX and Cygwin under Microsoft Windows.

Using the Synaptic package manager, I installed:

(libtiff4 was already installed.  So was gs-esp 8.15, which would have caused trouble if I had removed it.  At one point I installed gs-gpl 8.54.dfsg.1-5etch1 but this duplicates gs-esp, so I removed it.)

This uses the Ghostscript software, including the gs program itself, as called by the ps2pdfwr or ps2pdf13 command. It also uses the tiff2ps command from libtiff-tools.  

A Ghostscript installation has its own HTML documentation.  Running the command:

gs -h

should produce the location of that doco in your system.  For my Debian system, it was at: /usr/share/doc/gs-esp/Use.htm and, for the gs-gpl I installed briefly, at: /usr/share/doc/gs-gpl/Use.htm..        

This doco seems to be pretty rare on the Net, but here is a reasonably up-to-date version:

http://www.cs.wisc.edu/~ghost/doc/AFPL/8.00/Use.htm
More information is available at: http://www.cs.wisc.edu/~ghost/ .

General Plan

The general plan is to put the TIFF files in a directory, named in some way that when they are sorted by name, this will be the correct page order.  Then I run two commands. 

The first command, tiff2ps, converts the TIFFs into one long, large, PostScript file. 

The second, ps2pdf13 or ps2pdfwr (which are actually wrappers for the gs Ghostscript program), converts the PostScript file into a PDF file. 

In practice, I pipe the output of the first command directly to the second, to avoid writing and reading a large PostScript file, which would then need to be deleted.

To find information on these programs, and to create text files of this for future reference:

info tiff2ps  > tiff2ps.txt

info ps2pdfwr > ps2pdfwr.txt    Creates PDF without specifying a version number.

info
ps2pdf12 > ps2pdf12.txt    Creates PDF version 1.2 - for Acrobat 3 and later.
info ps2pdf13 > ps2pdf13.txt    Creates PDF version 1.3 - for Acrobat 4 and later. Higher resolution!
info ps2pdf   > ps2pdf.txt        Creates PDF version 1.2, but this may change in future versions of Ghostscript.

My notes are for ESP Ghostscript 8.15.3 (I have not tried to figure out the various strands of Ghostscript development.)

I create the TIFF files, with appropriate names ending in '".tif", with the resolution I want in the final PDF.  See the section below on TIFF Resolution and Size.

The files will be read, one per PDF, in alphanumeric order, so it is good to name them blah01.tif, blah02.tif etc.  (Note that the full directory path will be included in the PDF file, so anyone reading the PDF can see this via the File > Properties command.  (This does not seem to be true now, but it was in the past: The name of the first TIFF file will appear in the PDF file's "title", which is also visible with File > Properties.)

Then I give the following incantation (but read on for a more complete version, with paper size selection):
tiff2ps *.tif | ps2pdf13 - > test.pdf

This runs tiff2ps, telling it to find all the ".tif" files in the current directory.  The '|' vertical line pipes its PostScript output to the stdin of the next command, ps2pdf which has a '-' to tell it to accept its input on stdin, rather than reading a file.  The stdout of ps2pdf is redirected with '>' to the output file test.pdf.

See also: the man page: http://linux.die.net/man/1/tiff2ps

For me, this produced a PDF with A4 portrait pages, but that may be due to this Debian machine being localised to Australia. To change the default paper size, see: http://pages.cs.wisc.edu/~ghost/doc/AFPL/8.00/Use.htm#Change_default_size 


TIFF resolution and size

I generated TIFF files with both Adobe Photoshop 6.0 and Gimp 2.2 (Open Source image editor, comparable with Photoshop).  There were some warnings when reading the Photoshop TIFFs, which were monochrome, saved with LZW compression: "unknown field with tag 37724 (0x935c) encountered."

Assuming the images are black and white, it is important to reduce black and white images to monochrome, rather than leaving them as RGB or CMYK - in order to reduce the size of the final PDF.

It is fine for each page to have a different type of TIFF file - for instance one can be monochrome and the next colour.  I haven't tested this system with all possible formats, such as 16 bit CMYK to 1 bit monochrome bitmap.

It is vital to set the resolution (Dots per Inch) and the vertical and horizontal image sizes in Photoshop/Gimp to match the desired outcome on the PDF page.  

I had generated TIFF files by using Photoshop to edit some 800 dpi scanned files.  These files did not fill the A4 landscape page I created as described below.  This is because the images I scanned were smaller than A4 landscape, and the file contained information that each pixel was 1/800 inch. So I needed to edit the TIFF files in Photoshop again and changed them with Image > Image Size > Document Size the make them at least large enough that they would fill the page size I specified.  This involved changing the resolution to something other than 800DPI, without resampling the image, or changing the number of pixels.  The way to do this is to turn off "Resample Image".  Then, horizontal and vertical linear dimensions can be changed, giving a new DPI, or the DPI can be changed, giving new linear dimensions.

The resulting PDF file can have each page made of a completely different resolution TIFF file.

I seems that the PDF can have pages with different raw image resolutions.  Where the image size is smaller than the page size, the image sits in the lower left corner.  Therefore, it is important to have all the white borders etc. which may be desired as part of the TIFF file, and to make the size as close as possible to the page size.  When the image size is larger than the page size, the top and/or right parts of it are lost.


Paper size of final PDF file

To force the paper size, for instance to US "letter" size, here is my command line:

tiff2ps *.tif | ps2pdf13 -sPAPERSIZE=letter - > test.pdf

For A4:

tiff2ps *.tif | ps2pdf13 -sPAPERSIZE=a4 - > test.pdf

The options for paper size are defined in the Ghostscript file gs_statd.ps file - which on my system, lives in the directory:

/usr/share/gs-gpl/8.54/lib/

From that file, I find the options listed below.  The dimensions are in PostScript points, which are 72 points per inch, or 0.3527mm per PostScript point. (Production First has an extensive printing dictionary which states that traditionally there are 72.29 points per inch, but that Adobe defined a PostScript point as 72 per inch.)

Information on paper sizes Ghostscript uses is here (or a more up-to-date version at a similar location):
http://www.cs.wisc.edu/~ghost/doc/AFPL/8.00/Use.htm#Known_paper_sizes
General information on paper sizes can be found at: http://en.wikipedia.org/wiki/Paper_size .

It is also possible to specify the size of the PDF pages, in PostScript points (1/72") explicitly.  I suspect that in order to have a document print under Adobe Reader without being reduced in size, the "paper size" of the PDF needs to fit inside the printer's margins.  These margins, in almost all circumstances, will be smaller than the physical paper.  So I think an A4 PDF will always print smaller than it should on A4 paper, unless the printer can print to the edges.

To make the PDF be in landscape format (I had the tif files in this layout too, for A4 paper) - I was tempted to specify the output size, with width larger than height (the following is all one line):

tiff2ps *.tif | ps2pdf13 -dDEVICEWIDTHPOINTS=842 -dDEVICEHEIGHTPOINTS=595 - > test.pdf

The numbers are in 1/72" units and are from the Points columns of: http://pages.cs.wisc.edu/~ghost/doc/AFPL/8.00/Use.htm#Known_paper_sizes .


Performance

(This is from before 2008-08-12 - I haven't checked all the links.)

Conversion of 12 monochrome letter-size 300DPI TIFF files into a PDF took 21 seconds on a Debian 3.1 Celeron (Pentium 4) 2.6GHz machine.

The file size was 6.9 Megabytes, which compares favourably with 9.3 Megs produced by fastio.com's tiff2pdf, linked to below.  The two files are at:
http://astroneu.com/refs/solar-redshift/Evershed-Royds-1916.pdf
http://astroneu.com/refs/solar-redshift/Evershed-Royds-1916-tiff2pdf.pdf

Related discussions and software

Here are some discussion forum pages I found, including some relating to automated scanning, processing and conversion of files on the Mac OSX, most or all of which would be applicable to Unix/Linux:

http://forums.macosxhints.com/showthread.php?s=&postid=35017
Using '-' to select stdin as the input for ps2pdfwr (actually gs, I think) so I can pipe to it from tiff2ps:
http://linux.derkeiler.com/Newsgroups/alt.linux/2005-04/0099.html
There is an open-source program to convert multiple TIFFs into a PDF:
http://c42pdf.ffii.org/
Windows binaries are available. However the TIFF file needs to be compressed with the G4 algorithm - and I can't figure out how to make
Photoshop do this.

A discussion of the page size limits of various versions of PDF file:
http://c42pdf.ffii.org/docs/scanned.html
Image size (really page size?) in PDF 1.1 and 1.2 is limited to 3240x3240 units (PostScript points = 1/72"), in PDF 1.3 it is limited to 14,400x14,400 units.
A paper on a programming library for generating PDFs, by Joseph J. Pfeiffer Jr. http://www.cs.nmsu.edu/TechReports/2002/012.pdf

References for various versions of the PDF file format: http://partners.adobe.com/public/developer/pdf/index_reference.html An older version of the 1.3 PDFSPEC13.pdf and an earlier version is at archive.org.

A free binary ("free-beer" software, not really free as in free open-source source code) for Windows is:
http://www.fastio.com/tiff2pdf.html
I converted the TIFFs into 300DPI (otherwise the PDF was going to be too big) and saved them without compression. They are greyscale images. I gave them sequential numbers and put them in a directory "infiles". Then from a DOS box I gave the command:
tiff2pdf114demo infiles test.pdf
It produced a well compressed, very clear PDF.

12 7.8 Meg TIFF files (93 Megabytes) were crunched into a 9.3 Megabyte PDF. The only gotcha is an advertising link at the bottom of each page. 

PDFCreator is a Windows GUI front end for Ghostscript: http://en.wikipedia.org/wiki/PDFCreator . I am not sure if it can accept raw TIFF files, or whether it only accepts PostScript files.  Its main purpose is to accept the print output of other programs.  This would be handy, at least, for converting academic papers which are only available as .ps into a .pdf format.
.



 

 


 

.