Grrr, what is the deal with PDFs and copying to the clipboard?
We all routinely copy and paste information from one document to another, right? Even the hipster non-readers on the iPhone design team have grudgingly conceded this point. PDFs are still the most common document format out there, having survived for 16 years in the tech world remarkably unchanged. Practically every gadget on the planet can display them, and an increasing number of them can generate PDF files too. PDF documents are reasonably small, cross platform and, as of July 2008, a freely open standard.
So why is copying information out of PDFs so cumbersome? If I didn’t know better (and I don’t), I’d wonder if there is some kind of evil conspiracy amongst copyright attorneys, software makers and the Knights Templar to safeguard their contents from the clipboards of the masses.
As far as I know the only piece of software that can fully handle copy and paste from PDFs is the ridiculously expensive full version of Adobe Acrobat. Otherwise, copy support for PDFs ranges from the minimal (Adobe Reader and Foxit Reader, for example) to the absent (almost all smartphones, including the iPhone).
(Incidentally, one of the least mentioned improvements in the iPhone 3GS is that it the faster CPU has made it a top notch PDF reader. Using the highly recommended GoodReader app I can load a graphics-laden PDF magazine and scroll and zoom through it with no lags or crashes. Given that no other pocket-sized device supports copy and paste from PDFs either, the iPhone is now my PDF reader of choice).
I read a lot of technical stuff, almost all of which is released in PDF documents. Like any self respecting geek, when I see something worth noting I don’t retype it, I don’t (shudder) write it down, I copy it.
It’s bad enough that PDF readers which support copy and paste insist on throwing line breaks willy-nilly into the text. They make a real mess of it when it comes to tables, though. Each cell of the table gets copied as a separate line, resulting in a fugly mishmash of text that in no way resembles the original table. Wasn’t one of the original ideas of the PDF that the document’s original format gets preserved across software platforms?
When Google Docs introduced a PDF reader I thought this problem had finally been cracked. Copy and paste support for web pages has long ago been perfected, and tables copied the web into Word and any other HTML-aware word processor are automatically formatted as tables. And Google is famously opposed to evil alliances with such as the Knights Templar. Right? Wrong! Line breaks and tables have the exact same copy and paste problem as in other PDF readers.
It’s a sad state of affairs when the only salvation comes from the world’s most reviled document format: Microsoft Word. Say what you will about fat, proprietary, insecure Word documents — there are, at least, clipboard friendly.
It seems that the creativity that should have been poured into PDF readers has, instead, been focused on support for Microsoft Word. There is a staggering number of Word readers and converters to be found on the Web, most of them free and some of them very, very good.
My current method of choice is a web site named PDF To Word. It does just this one thing, but it does it very well – it’s fast and its accurate. It has replicated into Word pretty much every PDF document I’ve thrown at — the number of pages is the same, the page headers and footers are the same, and — hallelujah! — the tables are the same.
There are a few downsides with the PDF to Word approach – it often doesn’t exactly match the font type and size, resulting in odd looking formatting in things like columnar text or artsy magazine-style layouts. Also, the size of Word documents is generally much larger than the source PDF file — a document with illustrations might triple in size. This obviously makes large Word documents an impractical alternative to PDFs on memory-constrained devices like smartphones.
It’s nice to see Microsoft innovating again with Office 2010, but what I really, really wish they would provide is a Web-based, clipboard-friendly PDF reader. Of course, they won’t. I blame the Knights Templar.