[WEB4LIB] non-SGML characters

Thomas Dowling tdowling at ohiolink.edu
Thu Jan 31 14:56:51 EST 2002


At 02:02 PM 1/31/2002, bob at esrl.lib.md.us wrote:
>Hello.
>
>I've been using the • character to separate our address information
>at the bottom of our pages. But now, as I'm moving to XHTML 1.0, I'm
>finding that when I validate these characters are returned with the error
>"reference to non-SGML character".
>
>Here's my line to define content-type:
>
><meta http-equiv="Content-Type" content="text/html;charset=UTF-8" />
>
>Is there a different charset I should be using that would allow these
>characters to validate properly?

I think you can solve the academic validation problem with 
"charset=Windows-1252".  However, that won't solve the fact that there are 
browsers that will either not display any character there, or display some 
other character.  I don't know of any browsers that actually change their 
character handling based on the charset in the content type.  There may be 
*indexers* - but they'd want an actual HTTP header, not a meta tag (if I'm 
wrong, someone stop me before I make a fool of myself).

You might have more success rewriting them as legal characters in 
UTF-8.  The bullet character (#149) = &bull; = &#8226;  (&8226; will have 
wider support.)

>I could just take them out. They aren't critical. But I'd like to know
>why they don't work with XHTML.

Characters 128 through 159 have always been undefined in the ISO Latin-1 
character sets and in Unicode, which includes the UTF-8 you're sending to 
the validator.


Other common characters whose Windows-1252 position is undefined in ISO 
Latin or Unicode:

Ellipsis (#133) = &hellip; = &#8230;
En Dash (#150) = &ndash; = &#8211;
Em Dash (#150) = &mdash; = &#8212;
Curly single quote (#145 and #146) = &lsquo; and &rsquo; = &#8216; and &#8217;
Curly double quotes (#147 and #148) = &ldquo; and &rdquo; = &#8220; and &#8221;
[Unregistered] Trademark (#153) = &trade; = &#8482;
Euro (#128) = &euro; = &#8364;

NS4 will only understand the numeric entities.  &#9785;


Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu



More information about the Web4lib mailing list