Peter Cooper : UK Web 2.0 and Ruby on Rails consultant
Recent Posts
»Jay-Z: From Brooklyn to the Boardroom
»Prank Caller Submits Girl To Sexual Torture By Proxy
>Full archive
Other Posts
« Skype makes users feel secure with its iconDivine Comedy UK tour - May 2006 »

How to stamp and customize text in PDF files for free


One of the common ideas in publishing these days is to sell your content as a PDF file. I've bought a few books this way, and it's been great. One of the tricks they use is to embed your name onto every page so that you'll be dissuaded from sharing the file on the Internet since.. with your name in there, they know who to come after! I was interested in finding out how to produce PDFs that can programatically be changed in this way, but had no luck with some experiments and free tools I tried several weeks ago. Never mind.

Today, Jason Fried posted a 30 day update on the Getting Real book, sold as PDF only for $19 a pop. They've sold over 5000 copies and made almost $120,000 in profit over the last 30 days alone. It's not a get rich quick scheme, but if you've got the smarts and know what you're doing, it's nice money to make. As these guys are the sort who 'know what they're doing' I asked how they did the PDF stamping / watermarking. David Heinemeier Hansson replied quickly:

We bought an expensive license from pdf-tools.com. Their PDF Batch Stamp Tool.

I checked it out and for a single Windows license they wanted $450, but for a Linux or OS X license.. $900! For multi-client/server installs it appeared to be a lot more, so I did some digging around. I eventually found similar tools in the $300-$2000 range, and a few Windows-only client apps under $100. I also found one free Windows app, but nothing that was particularly useful to automate. It was then that I had an epiphany.

I realized that to 'stamp' PDFs, all you needed to do was change placeholders in the text. PDFs are, typically, compressed, but what if there was an easy way to decompress them? Turns out there is (as long as they're not encrypted, but as the content provider, you can make sure of that!) and I already had a library called pdftk installed which would do it. I copied my PDF copy of Rails Recipes (don't worry Prags, I won't be ripping you off!) to a.pdf and had a play:

pdftk a.pdf output a2.pdf uncompress
sed 's/Peter Cooper/Fred Bloggs/g' < a2.pdf > a3.pdf
pdftk a3.pdf output a4.pdf compress

Et voila, I had a working copy of my PDF with "Prepared exclusively for Fred Bloggs" on every page instead of the Peter Cooper variant, and the end result was just 3K larger than the original PDF (the uncompressed version is double the size).

I haven't done much testing, and can't guarantee this will work with everything (that is, large graphics-heavy bookmark-heavy hyperlink-heavy files) but it works on at least one professional example (a PDF with a unique layout, images and over 100 pages). The pdftk library works on Windows, OS X, Linux, etc.. so with a little work I could probably create a PDF vending service quite easily (if I had any content to sell, that is!)

(Update: This doesn't work on PDFs which use outlines for all their text, naturally.)

(Update 2: Some PDFs seem to use, even in uncompressed format, a format that looks like this to render Unicode text - it seems: <004900050051>3<0059>-7<0050>3<0043>5<004D>2<0046>-6<004E00010042>-5<005500010049>-19<0042> .. I have been tweaking with it and think I am finding a pattern. Two 'bytes' per character, changing the least significant byte to 50 changes the character to the letter 'o'.. so I think it might be possible to 'hack' these sorts of PDFs too!)

(Update 3: I think I'll stop updating on this post now.. gone deep into this and wowzers, managed to decode the above numbers, you have to look them up against a "CMap" which is basically a set of tuples linking internal character codes with Unicode codes. Not impossible, but it means some PDFs aren't as easy to do as my hack above.)

Technorati Tags: ,




April 03, 2006 | Posted by peter | Comments (1)
Comments

I have a possible solution, but it doesn't immediately fall into the "free" category. I assist in doing ASP-vbscript coding, and my primary client already owns a copy of AspPDF (www.asppdf), which is a great tool that we use for making very professional reports for the end-user. IF you have a license to this product, using a quick loop, with the correct parameters for placing the resulting text, you could complete the special embedding as you describe.

Granted, all of this is making assumptions that you have the product already, and the required know-how. But it might be a possibility for some other web programmers out there.

Thank you for the tip, and helping me to think of some nice possibilities for delivering documentation to my clients.

Posted by: Benjamin L. Jendrick at April 6, 2006 05:49 PM

Return to the homepage.
Privacy Policy