ePub Crawl: Pulling a Book out of an App
A few months ago, technology publisher O’Reilly began selling some of their books as iPhone apps [iTunes link] for a surprisingly low price — generally just $5. These are the full versions of the books, not just an extract. The apps come bundled with Lexcyle’s Stanza e-reader, which is feature-rich, fast, and stable. All things considered, these books are quite a bargain.
There is a catch, of course: for some books, and many humans, the iPhone isn’t the best reading platform. Books about software development and tools are generally most useful when you are working hands-on at your computer. Switching from the iPhone to the PC is rather awkward, and copying and pasting code fragments from the iPhone to your computer is pretty much impossible. (Stanza, unlike Kindle, does support copy and paste of text by way of their annotation feature, but getting that copied text onto your computer is a byzantine procedure).
Fortunately, O’Reilly chose to package their e-books using the open ePub standard, without ePub’s optional DRM (Digital Rights Management) encryption. This means that it’s relatively easy to extract the ePub document from the iPhone app, at which point you can read it on whichever platform you choose. The number of software and hardware e-readers that support ePub is rapidly expanding (with one notable holdout), and it is widely expected that ePub will eventually replace today’s myriad incompatible formats.
The following method for extracting the ePub document from one of O’Reilly’s iPhone apps is based on an article on the excellent TeleRead site. The packaging of the apps has changed a little since that article was written, so a couple of extra steps are required. I use a Windows PC, but I’m sure a similar approach would work on a Mac since the only software tool required is one that can read and write .zip files.
- Locate the iPhone app file. The easiest way to do this is to right-click on the app in iTunes, then select “Open in Windows Explorer”. The example I’m working with is the wonderful Coding4Fun book (which costs $32 when bought as an eBook right now), and its app file is named Code4Fun 1.0.ipa. Copy the .ipa file to another folder so that you won’t confuse iTunes with the following steps.
- Extract the contents of the .ipa file. Despite the extension, this is a zip-compressed file. Most zip extraction tools (like 7-Zip in the following screenshot) are quite happy to take a whack at opening the file without knowing what an .ipa is, but if necessary you can rename the file to Code4Fun.zip first.
A zip in app's clothing - The contents of the app should consist of a couple of files and a folder named “Payload”. If you open Payload you’ll find another folder named Code4Fun.app. Another level down is a folder named “book”, as shown in the following screenshot. (Incidentally, the parent folder of “book” also contains a file named default.pub. This is actually a bonus ePub book: The Time Machine by H.G. Wells. I don’t think you can get at this book from within the Code4Fun iPhone app – it presumably is there as part of the Stanza packaging).
In the book, is a book - Select the contents of the “book” folder (2 folders and a file) and add them to a new .zip file, as shown below.
A ePub in zip's clothing - That .zip file is actually your ePub document, so rename it to something more suitable like Code4Fun.pub. At this point you should be able to open the .pub file in Adobe Digital Editions, or MobiPocket Reader, or Stanza Reader. (Mobipocket and Stanza are generally used on mobile devices, such as Blackberry or Windows Mobile smartphones, but both offer a desktop reader). My own preference is to keep things simple and flexible by using the browser-based Bookwork reader.
Enjoy, but please, please don’t pass along the .pub file to your friends (or, worse, a Torrent site). O’Reilly is doing us a great favour by selling these ebooks at such a low price and supporting the open ePub standard.
I’m pretty sure that O’Reilly is OK with you extracting the .pub file for your own use — it was an article on an O’Reilly site where I first came across this procedure. Other companies would have you believe that DRM-encrusted proprietary standards are the only way to prevent the unwashed masses from pirating ebooks. Please don’t help them to prove their point.





