CCSP - Thinkubator Wiki

Joe Clark's Web Standards

Toronto-based writer, typographer, and long-time accessibility advocate Joe Clark has recently published an excellent, long (and well-linked) essay on Web Standards for E-books over on A List Apart. It’s been getting some well-deserved attention.

Clark’s post is true to his form: it’s opinionated, detailed, and thorough. He outlines an argument for why (and how) HTML and CSS should be at the center of many—if not most—publishing workflows, but he also goes into considerable detail on the finer points of typography and punctuation, not to mention spot-on commentary on the sad state of footnotes and tables.

He writes,

I am articulating an HTML-triumphalist view of E-book production. By backing what I feel is obviously the right horse, I am contributing to the strangulation of new or uninvented forms of the book.

What’s important about this stance—which I wholeheartedly share—is that is properly acknowledges the fact that the web, like the Internet on which it rests, is not just somebody’s hacked-up sketch of what a digital publishing system might look like. The web, 20 years old this year, is THE digital publishing system. It’s the one everyone actually uses, with hundreds of millions—if not billions—of users, and trillions of pages of content. The web was the right thing at the right time in the early 1990s, and since then it has evolved into what it is today. It is nonsense to think that some Godlike engineer (or God forbid, consortium) is going to simply create a better one. The digital publishing platform of the mid-21st century is going to be an evolution of the one we have now, not a replacement of it. That e-book standards should emerge as part of the larger ensemble of web standards is only natural.

Those who would replace the web with something ‘smarter’ inevitably bring up semantic markup, which HTML is reputedly poor at, but as Clark points out, “Novels and many nonfiction books are semantically simple.” Indeed. Simple HTML has enough semantic depth for probably 80% of the prose we actually consume (and fine, let’s spend the other 80% of our lives struggling with that last 20%). Plus, I’m wagering that we can actually add a fair bit of semantic richness to HTML simply by stuffing class attributes with metadata instead of CSS hooks.

Clark’s account of how a good deal of contemporary nightmares over e-book quality (the horrendous typos and production glitches we hear about almost daily) and hand-wringing over how to properly do ‘formatting’ are a result of the publishing industry’s “race to the bottom.” In the quest for cheaper and cheaper outsourced conversion ‘solutions’, publishers have lost any grip on the issue they might possibly have had.

What’s the solution? The canonical format of a book should be HTML. Authors should write in HTML, making a manuscript immediately transformable to an E-book. A manuscript could then be imported into that fossil the publishing industry refuses to leave behind, Microsoft Word. (MS Word’s Track Changes feature has become a kind of methadone for an addicted publishing industry.)

Well said! Any time someone compares MS Word to drug abuse, I salute. But then Clark goes somewhere I don’t care to follow, and perpetuates the “Word is never going away” mentality anyway:

Authors are not going to start writing in HTML, let alone the full-on XML that Ben Hammersley has called for. Book copy will continue to be saved as MS Word, Xpress, and/or InDesign files.

But authors certainly are going to write in HTML. A hundred thousand bloggers do it every day. Some do it natively, some in “lightweight” (i.e., wiki-like) markup, but far more do it in the nice, friendly WYSIWYG editing interfaces that are by now ubiquitous in Web CMS and blogging tools. Like TinyMCE, which you’ve probably used hundreds of times on your favourite sites. We’ve been experimenting with it as part of an editorial environment and have been entirely encouraged by the results.

This is a huge opportunity to get out from underneath MS Word—which I think more of as the antichrist than merely a “fossil.” When Clark states, in the comment thread on Teleread.org discussing his article, that “People have been publishing utilities to clean up MS Word ‘HTML’ for ten years and nothing has worked. MS Word will not produce usable HTML,” [italics mine] remember that there’s a reason this is the case. It isn’t that Microsoft engineers are just too stupid to figure out how do it properly.

Yes, people still use InDesign, and will for years to come. This is because it has a large installed base and an enormous investment of time and experience in how to use it well. It’s also because this tool is practically irreplaceable—there isn’t really anything else out there that will give a typographer or layout designer the kind of immediate control that InDesign provides. So we will continue to use InDesign (and its descendants) to produce print layouts for the foreseeable future. Hopefully we can stop using InDesign as an archival format, but for page makeup, it’s damn hard to beat.

But Word? Alternatives exist. Better, free, open, simpler alternatives exist. Let’s not get stuck here, people. Cut the cord.


comments:

Joe writes… --Fri, 12 Mar 2010 05:19:30 -0800 reply

Well, yes, I understand the nit you are picking, but on the whole the authorial class does not wish to be retrained, nor is typing into a wiki textarea scalable to writing a whole book. The distinction is like the insistence that nobody will read long texts off a monitor. But we read a hundred small texts every day that add up to the same length. So, while some authors will edit wikis and publish on blogs, the minute they have to work on a manuscript they forget that ever happened and fire up MS Word. Better?
 

Add a comment

subject:

 
Comments and opinions expressed here belong to their respective authors, and do not represent the views of Simon Fraser University or the Canadian Centre for Studies in Publishing. Powered by Zope, and much more...