Wednesday, February 9, 2011

Anatomy of an E-book

I've gotten a few questions from the more tech-oriented among you, fair readers, as to what, exactly, an e-book file looks like. So! Allow me to illuminate... the EPUB format.

If you're looking for the short (and somewhat inaccurate) story: The EPUB format is the industry standard, and the file is sort of like a zipped up website. The book itself is written in the same code used to write web pages, and fancier books have extra files zipped into the final package.

If you're not familiar with the idea of "zipping up" a file, just imagine it as packing up all the stuff in your room. Your unpacked room represents all the various files and formats you'd like in the finished product; the single box you end up with that contains everything from your room is the zipped-up file.

For the more involved (and more technically correct) story, a basic EPUB file consists of the following:

· A bunch of pages written in XHTML that contain the written content of the book;
· CSS (Cascading Style Sheets) to provide formatting;
· An XML file with the extension .opf that contains the book's metadata (title, the language it's written in, &c);
· An XML file with the extension .ncx that contains the book's hierarchical table of contents.

These last two XML files are what really separate an e-book from a website: they provide a linear structure to the book that require (for the most part) that it be read in a certain order. (Many books do contain hyperlinks and allow you to skip from page to page this way.)

Now, although EPUB is the standard by which the industry operates, not all e-book retailers use it (and those who do generally modify the files they receive from publishers or individuals to suit their particular standards). This is why e-books often look different from device to device.

The most visible example is that of Amazon's Kindle, which pretty much reads anything except EPUB (e.g. MOBI, PRC, AZW, PDF). Because Amazon needs to convert EPUB files before it can sell them to consumers, e-books may not always appear as publishers intended (due to the translation process in general, how the two coding systems handle different objects like tables and captions, and so on). What is possible via EPUB may not be possible in, say, MOBI, and vice-versa.

While I think that formats and devices will consolidate over time, I very much doubt we're going to see a one-format, handful-of-devices scenario for awhile. The good news is that there are ways to convert almost any file type to any other file type and many devices can either cross-read or run apps that are capable of doing so, so your library hopefully won't be (too) fragmented for the time being.

That's it for today, amigos and -as. Friday: the pre-Valentine's Day round-up!

11 comments:

  1. Whenever I see this subject come up, i prime myself to correct what is usually a plethora of incorrect information and misconceptions. That is not the case here. This is a wonderfully accurate explanation.

    Thank you very much, Eric, and well done.

    ReplyDelete
  2. I'm curious how much 'extra' material is currently in e-books and how popular this will become over time. Do you think it will become more common to see links, pictures, author info, etc included with the actual text of the book?

    ReplyDelete
  3. Yes, many changes in the epub book world. smashwords.com has a very concise formatting to help make the ebook easy read. It is work and I spent hours, being a non-techie myself. Finally, hired someone to sort out and add the chapter hyperlinks (a selling point.) The book cove must also we within their specs. I hired someone to make a new, improved and updated one. Thanks to you (learned about smashwords here) I entered a world never before ventured and very much enjoying the possibilities.

    ReplyDelete
  4. Very helpful post! As a web developer and aspirational author, it's great to see where these worlds overlap.

    ReplyDelete
  5. Gee, and here I thought all I had to do was select the e-pub as the output format in Calibre and that was it. I'm not sure I'm thanking you about giving me all this extra information!

    But it does explain why things look different when I used Calibre to format my books for All Romance e-Books, since they don't have that nifty, "just tell us what you want, we'll take care of it" option like Smashwords. All the italicized text came out bold and larger.

    Terry
    Terry's Place
    Romance with a Twist--of Mystery

    ReplyDelete
  6. I greatly appreciate the dumbed-down explanations! If you want to publish something universally readable among e-readers and Kindle doesn’t accept epub, does that mean something else (.pdf?) would be more appropriate? What about a smaller document, like a short story? Anyone know if there’s a website plugin to offer a subscription straight from your website yet?

    ReplyDelete
  7. Hi Sierra,

    It depends on the type of book and the publisher. Some books include audio and/or video files, reading guides, additional photographs that couldn't be included in the paper book due to page count/cost, &c. I do see these enhanced e-books becoming more popular for certain categories (think categories like cooking, craft) as time goes on.

    ReplyDelete
  8. I sort of wonder if they will become like 'extras' in a movie. A fun aside at first but now, if I get a DVD that doesn't have at least a few deleted scenes I feel a little cheated. I guess as authors we should just remember not to throw anything away.

    ReplyDelete
  9. Thanks for the explanation! What I don't understand is how there are different "versions" of EPUB. For example, with the Sony Ereader and the Nook, they both accept EPUB, but if you have the Sony, you can't purchase books from B&N and have them work on your device (but if you own a Nook, you can buy books from Sony and have them work).

    I've heard this has something to do with the Adobe DRM, but I'm afraid I don't really know what that means. Could you clarify?

    ReplyDelete
  10. Tiana: it's not really different versions of EPUB, but rather that the contents of the EPUB have been encrypted with different kinds of DRM (Digital Rights Management, or copy protection).

    Sony, Borders, Kobo, Books-A-Million, Google, OverDrive (public library lending), and just about all of the independent e-book stores use Adobe's authorized-device DRM. That DRM was formerly known as Adept, and now Adobe calls it "identity-based". Most people just call it Adobe DRM or Adobe EPUB.

    B&N and Apple have chosen to use their own DRM systems. B&N shared their simple password-based DRM system with Adobe, so other e-book apps and readers *could* read B&N EPUBs if the developers wanted, but so far the Pandigital Novel and the Ectaco jetBook Lite are about the only ones that have taken advantage. Apple keeps its DRM so tightly controlled that you can't even read an Apple EPUB on a Mac.

    And, of course, there's EPUB without any DRM.

    ReplyDelete