CCSP - Thinkubator Wiki

Who Killed Curly Quotes?

A good chunk of the research/development work around SFU MPub this past year has been making web-based tools into things capable of handling professional publishing workflows—as in, capable of producing books and magazines. The work we did this spring on Web-first XML workflows, and the minimal submissions management piece (OMMM project) both leverage free and ubiquitous web-based content management tools, but put them to work on more traditional print-oriented publications rather than just websites.

In general, I’ve been extremely pleased with what we’ve been able to achieve. We can take material authored either web-native, or in the dreaded .doc format; we can collaboratively edit, track revisions and changes from multiple authors; and even output to the Adobe Creative Suite for high-end print (or PDF, if you’re so inclined) output. Best of all, everything is web-native, so publishing online or to ebook is falling off the log.

One of the nicest discoveries I made this year is just how nice TinyMCE is these days. You’ve seen TinyMCE, though you may not know it by name: when you see an editing area on a website, and there’s a little row of formatting buttons along the top, like in a word processor, chances are it’s TinyMCE, a nearly ubiquitous piece of free software, glued into most web tools these days.

I’ve started to really believe that people could use these web tools as a platform for editorial work, to get rid of the dreaded (did I say that already) MS Word altogether. But, admittedly, people are used to using Word. Some people I’ve worked with are young enough to have never known a world without Word in it. Sad. But as I say, I’m really encouraged—there’s enough functionality and elegance in free web tools that we may finally be able to achieve escape velocity and leave the damn word processor behind.

Except for quotation marks.

Remember these things? These are quotation marks:

“ ”

Not these:

" "

Those are actually called double primes, though most people know them as straight quotes. They have a use too, in denoting measures: minutes of latitude and longitude, and linear inches. They are not the same thing as quotation marks. Quotation marks come in pairs: one marks the beginning of the quotation, and the other marks the end of a quotation (that matters; try grepping text with only straight quotes, and you literally don’t know whether you’re coming or going).

But the web seems to have banished proper quotation marks. For the most part, they don’t exist online (unless someone has gone to some trouble to put them in). The reason isn’t hard to figure out; it’s because the web, like much of modern computing, was originally built on an ASCII foundation.

ASCII is a computer representation of the alphabet that was defined back in the 1950s in America (that’s what the A stands for). ASCII is roughly the set of characters you can create using a 1950s-era American typewriter—127 in total, including a handful of control codes for things like ringing a bell on your terminal.

ASCII, dating from the 1950s (in America) has held ground for an amazingly long time. It didn’t take too long before computers got into regular use out there in the real world (and outside of America) before people realized they needed more characters to represent the real text they had (en français, par example). In the 1980s, different computing platforms (Mac, Windows, Unix, etc.) adopted different “extended character sets” that would allow users to write in languages other than English, put in more sophisticated typography (such proper dashes—hyphens, en-dashes, and em-dashes—fractions, and whatnot). Unfortunately, as there were multiple systems in play, incompatibilities were a constant headache. Ever move an old file from Mac to PC? Or from WordPerfect to MSWord? (Actually, that’s still a problem in 2009, as we encountered this past week)

One of the more robust ways of dealing with the problem—the way the early web dealt with it—was to encode characters beyond the 127 ASCII characters as ISO standard character entities: é — ç and ” and so forth.

The idea behind the ISO character entities was that you had a neutral encoding, based in the plain ASCII that worked almost everywhere unproblematically. So it was a functional solution, but not exactly an elegant one. For example, after a decade and a half of working with the web, the four entity codes I mentioned above are the only ones I know off by heart; everything else I would need to go consult a reference. And really, do you want to be typing all those ampersands (which get quickly into their own recursive tailspin of &amp.... if you’re not careful.

The solution, as proposed in the late 1990s along with XML (which was also going to save the world), was Unicode, a new international standard for character representation. Capable of representing something like 100,000 different glyphs, Unicode could handle proper quotation marks. And French characters. And Chinese, for that matter.

Unicode would solve the problem. Early in this decade, all major operating systems offered support for Unicode. Major software platforms began to support it shortly after. Web standards like HTML and XML were written to assume Unicode as the default character encoding. This was supposed to be the solution to the whole character set mess.

So why do we still encounter glitchy characters? Why are there still no proper quotation marks online?

The reason, I think, is twofold:

First, we have legacy systems. In particular, many popular free software tools—including the relational database and web application platforms that power much of the known universe, have lagged and have really crappy support for Unicode. Some of these pieces of software still lack Unicode support all the way through, even though this standard has been around for over a decade.

The second reason is that people don’t understand what Unicode is. I’ve become convinced of this since Googling the issue today. What I find online is thousands of pieces of advice that run like this:

Using Unicode, enter “ for a left double-quote mark.

Excuse me, but in what universe is it better to have to type (let alone remember!) “ instead of the old ISO “? But much more importantly, this is completely missing the point of Unicode!

Unicode is supposed to mean that if I want a I should be able to type and not have to remember some geeky codepoint.

Now, the TKBR system is running on ZWiki, which is running on Zope, which is running on Python. This thing speaks Unicode UTF-8 all the way down. And so I can, in fact, type “ and see “ (do you see it?).

The trouble, I have found, is that I can’t actually type a proper quotation mark in Firefox—nor in Safari, nor in IE—you just can’t do it. You hit the quote key, and you get a double prime. (you can, thankfully, paste them in)

Again, it’s not too hard to see how we got here: back in the day, there were umpteen different possible encodings, and no way for web software to predict which one was in use when (note the Character Encoding menu in your web browser). So the only safe—er, sane—thing for them to do was to leave it as plain ASCII straight quotes.

The result is that there is no convenient way to get proper quotation marks. (You actually can get them, if you know the secret: on my Mac, I can type option-[ to open a set of quotes, and shift-option-[ to close. In Windows, you have to remember that the number is 8220). Like I say, there is no convenient way...

What word processors and DTP tools do, of course, is to give you “smart quotes”—a typing aid that looks at the context, and when you hit the quotemark key, substitutes the right character.

There is no reason why my web browser couldn’t do this. Or perhaps it is TinyMCE that should offer me this functionality. We have a standard representation for the marks; why can’t we type them?

But no, culturally, the web seems to be a massive edifice dedicated to the elimination of quotation marks. A little (OK, a lot) of Googling today showed me that there are about 1000 pieces of advice for how to get rid of curly quotes (when pasting in from Word, presumably, because the curly quotes screw up your lousy Unicode-incompatible web software) for every tidbit that tells you how to put them in (and those usually tell you to type “)

Even fine typography guides, such as on A List Apart still talk about inserting &#8220.

People, this is a silly situation.

Quotation marks come in pairs.

They are different than double primes/straight quotes.

Unicode support means quotation marks are possible.

They are important, and I want to type them.

I might know who it was... --Thu, 01 Oct 2009 22:25:39 -0700 reply

Perhaps it's the same person that killed spelling, long attention spans and invented spam. If you find who it is, please tell them I'd like to have a serious talk.

Haig

... --Thu, 01 Oct 2009 22:30:38 -0700 reply

you object to straight-quotes.

i object to sloppily-edited posts...

you said: > such proper dashes

i think you meant "such as proper dashes".

you said: > only ones I know off by heart

i think you meant "know by heart". or perhaps "know of by heart". or perhaps "know off the top of my head".

anyway, heck yeah, why doesn't the software make it easy for people to do the right thing?

the software should take care of the grunt work.

so aim your squirt-gun at the real culprits -- the developers who made such as mess of the unicode transition -- and leave the ordinary people out of the equation entirely.

thanks.

-bowerbird

p.s. tell me, how many real people out there do you think know how to enter curly quotes? so is it any surprise that you see none of 'em?

p.p.s. oh, and while you're at it, please fix all of the search routines so that they will work correctly on _any_ text for _any_ person no matter if the text _or_ the searcher uses straight-quotes or curly-quotes. thanks loads. because if you're not part of the solution, you're part of the problem.

Not nearly as scary as all that --jmax, Thu, 01 Oct 2009 22:52:10 -0700 reply

Look, when word processors and DTP tools could assume a nice, closed world (i.e. no networks, no incompatibilities), the so-called "smart-quote" systems worked fine.

What broke it was the network: suddenly you had a bunch of incompatible systems trying to exchange material. So the response was to drop down to the lowest-common-denominator: the straightquote.

My point is that we now live in a world where Unicode is possible (TKBR is a case in point, Unicode top-to-bottom), so the smart-quote routines for entering the marks is feasible again. It's not a matter of fixing everything; it's a matter of priorities.

ONIX headaches ... --juliafaye, Fri, 02 Oct 2009 09:56:08 -0700 reply

My ONIX data feeds would be so. Much. Easier. To deal with if Unicode were universal. I've spent my entire four-week tenure at LitDistCo thus far fighting to make product information on Chapters/Indigo, Amazon.ca, etc., appear correctly. The publisher likes things like accented letters and em-dashes, and apparently ONIX doesn't. Bleh.

Agreed --agaumont, Mon, 05 Oct 2009 12:00:42 -0700 reply

"Smart Quotes" (argh, there are those typewriter quotes) are a bandaid solution. See also my note on technological impediments to proper punctuation, inspired by this post: http://bit.ly/zrhjK

 

Add a comment

subject:

 
Comments and opinions expressed here belong to their respective authors, and do not represent the views of Simon Fraser University or the Canadian Centre for Studies in Publishing. Powered by Zope, and much more...