Friday, March 6, 2015

Compressing PDF Files at the Command Line

Sometimes you have to email someone a bunch of PDF files, but you can't keep them in the same email without exceeding Gmail's 25MB limit (or the limit of whatever email service you use). 

Rather than having to split up the email (which isn't ideal in many situations), you can compress the PDF files themselves. Here are a couple of strategies for doing that, both using the wonderful Ghostscript interpreter.
You might remember Ghostscript from your college days.
It's one of those old school things that's still awesome.

In the end, I used ps2pdf to reduce 28MB of PDF files down to 12MB.


The command-line tool ps2pdf converts .ps files (Postscript) to .pdf using Ghostscript. But you can also pass in a PDF file as input.

If you have Ghostscript installed, you can type this at the command line:

ps2pdf -dPDFSETTINGS=/ebook in.pdf out.pdf

The /ebook setting "selects medium-resolution output similar to the Acrobat Distiller "eBook" setting," which sounds good for documents that need to be screen-readable.

Read more on StackOverflow about what else you can pass into PDFSETTINGS.

Color to Grayscale

Another easy way to get rid of a lot of unneeded PDF size is to convert it from color to grayscale. Color takes up a lot of space and is not needed for many documents.

If you have Ghostscript installed, you can type this at the command line:

gs -sDEVICE=pdfwrite -sProcessColorModel=DeviceGray -sColorConversionStrategy=Gray -dOverrideICC -o out.pdf -f in.pdf

Here, in.pdf is your input file, and out.pdf is your output file.

Did It Work?

The easiest way to tell if it worked is to list the files in the current directory by size:

ls -sS

Compare the size of the input PDF with the output PDF. The output PDF should be smaller, of course.

Other Tools

Here are some other tools to look into if you don't like either of the above approaches.


I always start out by trying to find a Python solution to problems, because then whatever I find becomes a handy little building block for me to use in other Python projects.

Before trying either of the above, I attempted to compress the PDF files with the Python library pdfsizeopt. However, I ran into this error:

error: Multivalent.jar not found. Make sure it is on the $PATH, or it is one of the files on the $CLASSPATH.

I resolved that error by finding the latest Multivalent jar file on the official project page and putting it on my $PATH, but then I got this error:

AssertionError: Multivalent failed (status)

At that point I moved on. That said, if anyone knows how to get around that second error, I'd love to know. It would be great to get pdfsizeopt working.


I also tried out Alfred Klomp's nice script. If you try it out, experiment with the resolution and the other parameters. (This is actually how I ended up with my color to grayscale solution above.) Study it and play around until you're satisfied with the output.

No comments:

Post a Comment