Wednesday, February 9, 2011

Anatomy of an E-book

I've gotten a few questions from the more tech-oriented among you, fair readers, as to what, exactly, an e-book file looks like. So! Allow me to illuminate... the EPUB format.

If you're looking for the short (and somewhat inaccurate) story: The EPUB format is the industry standard, and the file is sort of like a zipped up website. The book itself is written in the same code used to write web pages, and fancier books have extra files zipped into the final package.

If you're not familiar with the idea of "zipping up" a file, just imagine it as packing up all the stuff in your room. Your unpacked room represents all the various files and formats you'd like in the finished product; the single box you end up with that contains everything from your room is the zipped-up file.

For the more involved (and more technically correct) story, a basic EPUB file consists of the following:

· A bunch of pages written in XHTML that contain the written content of the book;
· CSS (Cascading Style Sheets) to provide formatting;
· An XML file with the extension .opf that contains the book's metadata (title, the language it's written in, &c);
· An XML file with the extension .ncx that contains the book's hierarchical table of contents.

These last two XML files are what really separate an e-book from a website: they provide a linear structure to the book that require (for the most part) that it be read in a certain order. (Many books do contain hyperlinks and allow you to skip from page to page this way.)

Now, although EPUB is the standard by which the industry operates, not all e-book retailers use it (and those who do generally modify the files they receive from publishers or individuals to suit their particular standards). This is why e-books often look different from device to device.

The most visible example is that of Amazon's Kindle, which pretty much reads anything except EPUB (e.g. MOBI, PRC, AZW, PDF). Because Amazon needs to convert EPUB files before it can sell them to consumers, e-books may not always appear as publishers intended (due to the translation process in general, how the two coding systems handle different objects like tables and captions, and so on). What is possible via EPUB may not be possible in, say, MOBI, and vice-versa.

While I think that formats and devices will consolidate over time, I very much doubt we're going to see a one-format, handful-of-devices scenario for awhile. The good news is that there are ways to convert almost any file type to any other file type and many devices can either cross-read or run apps that are capable of doing so, so your library hopefully won't be (too) fragmented for the time being.

That's it for today, amigos and -as. Friday: the pre-Valentine's Day round-up!


  1. Whenever I see this subject come up, i prime myself to correct what is usually a plethora of incorrect information and misconceptions. That is not the case here. This is a wonderfully accurate explanation.

    Thank you very much, Eric, and well done.

  2. I'm curious how much 'extra' material is currently in e-books and how popular this will become over time. Do you think it will become more common to see links, pictures, author info, etc included with the actual text of the book?

  3. Yes, many changes in the epub book world. has a very concise formatting to help make the ebook easy read. It is work and I spent hours, being a non-techie myself. Finally, hired someone to sort out and add the chapter hyperlinks (a selling point.) The book cove must also we within their specs. I hired someone to make a new, improved and updated one. Thanks to you (learned about smashwords here) I entered a world never before ventured and very much enjoying the possibilities.

  4. Very helpful post! As a web developer and aspirational author, it's great to see where these worlds overlap.

  5. Gee, and here I thought all I had to do was select the e-pub as the output format in Calibre and that was it. I'm not sure I'm thanking you about giving me all this extra information!

    But it does explain why things look different when I used Calibre to format my books for All Romance e-Books, since they don't have that nifty, "just tell us what you want, we'll take care of it" option like Smashwords. All the italicized text came out bold and larger.

    Terry's Place
    Romance with a Twist--of Mystery

  6. I greatly appreciate the dumbed-down explanations! If you want to publish something universally readable among e-readers and Kindle doesn’t accept epub, does that mean something else (.pdf?) would be more appropriate? What about a smaller document, like a short story? Anyone know if there’s a website plugin to offer a subscription straight from your website yet?

  7. Hi Sierra,

    It depends on the type of book and the publisher. Some books include audio and/or video files, reading guides, additional photographs that couldn't be included in the paper book due to page count/cost, &c. I do see these enhanced e-books becoming more popular for certain categories (think categories like cooking, craft) as time goes on.

  8. I sort of wonder if they will become like 'extras' in a movie. A fun aside at first but now, if I get a DVD that doesn't have at least a few deleted scenes I feel a little cheated. I guess as authors we should just remember not to throw anything away.

  9. Thanks for the explanation! What I don't understand is how there are different "versions" of EPUB. For example, with the Sony Ereader and the Nook, they both accept EPUB, but if you have the Sony, you can't purchase books from B&N and have them work on your device (but if you own a Nook, you can buy books from Sony and have them work).

    I've heard this has something to do with the Adobe DRM, but I'm afraid I don't really know what that means. Could you clarify?

  10. Tiana: it's not really different versions of EPUB, but rather that the contents of the EPUB have been encrypted with different kinds of DRM (Digital Rights Management, or copy protection).

    Sony, Borders, Kobo, Books-A-Million, Google, OverDrive (public library lending), and just about all of the independent e-book stores use Adobe's authorized-device DRM. That DRM was formerly known as Adept, and now Adobe calls it "identity-based". Most people just call it Adobe DRM or Adobe EPUB.

    B&N and Apple have chosen to use their own DRM systems. B&N shared their simple password-based DRM system with Adobe, so other e-book apps and readers *could* read B&N EPUBs if the developers wanted, but so far the Pandigital Novel and the Ectaco jetBook Lite are about the only ones that have taken advantage. Apple keeps its DRM so tightly controlled that you can't even read an Apple EPUB on a Mac.

    And, of course, there's EPUB without any DRM.


    Read-to-Me Feature
    With the new text-to-speech feature, Kindle can read every newspaper, magazine, blog and book out loud to you, unless the book's rights holder made the feature unavailable. You can switch back and forth between reading and listening, and your spot is automatically saved. Pages automatically turn while the content is being read, so you can listen hands-free. You can choose from both male and female voices which can be sped up or slowed down to suit your preference. In the middle of a great book or article but have to jump in the car? Simply turn on Text-to-Speech and listen on the go.

    Improved Newspaper Experience
    Using Kindle's new 5-way controller, you can quickly flip between articles, making it faster and easier to browse and read the morning paper. Want to remember the newspaper or magazine article you just read? Clip and save entire articles for later reading with a single click.

    Faster Page Turns
    Pages now turn 20% faster on average.

    Bookmarks and Annotations
    By using the QWERTY keyboard, you can add annotations to text, just like you might write in the margins of a book. And because it is digital, you can edit, delete, and export your notes. Using the new 5-way controller, you can highlight and clip key passages and bookmark pages for future use. You'll never need to bookmark your last place in the book, because Kindle remembers for you and always opens to the last page you read.

    Full Image Zoom
    Images and photos display crisply on Kindle and can be zoomed to the full size of the screen.

    Personal Document Service Via Whispernet
    Kindle makes it easy to take your personal documents with you, eliminating the need to print. Each Kindle has a unique and customizable e-mail address. You can set your unique email address on your Manage Your Kindle page. This allows you and your approved contacts to e-mail Word, PDF documents, and pictures wirelessly to your Kindle for a small fee--see details. Kindle supports wireless delivery of unprotected Microsoft Word, PDF, HTML, TXT, RTF, JPEG, GIF, PNG, BMP, PRC and MOBI files.

    Built-in Dictionary with Instant Lookup
    Never get caught without a dictionary. Kindle includes The New Oxford American Dictionary with over 250,000 entries and definitions, so you can seamlessly look up the definitions of words without interrupting your reading. Come across a word you don't know? Simply move the cursor to it and the definition will automatically display at the bottom of the screen. Never fear a sesquipedalian word again--simply look it up and keep reading.

    Wireless Access to Wikipedia
    Kindle also includes free built-in access to the world's most exhaustive and up-to-date With Kindle in hand, looking up people, places, events, and more has never been easier. It gives whole new meaning to the phrase walking encyclopedia.

    Kindle makes it easy to search within a book, across your library, in the Kindle Store, or even the Web. To use the Search feature, simply type in a word or phrase you're looking for, and Kindle finds every instance in your book or across your Kindle library. Looking for the first reference of a character in your book? Simply type in the name and search. You can extend your search to the Kindle Store to find related titles you may be interested in. Explore even further by searching Wikipedia and the Web.