[WEB4LIB] Re: non-SGML characters

Eric Hellman eric at openly.com
Thu Jan 31 23:51:51 EST 2002


If you use numeric entities in xml, it won't matter what what encoding you set.

In other words, •  means BULLET,  BLACK SMALL CIRCLE
  whether your encoding is utf-8, shift-JIS, euc-kr or mac-symbol

in xml, the encoding tells the parser how to read bytes. however, the 
character set is ALWAYS Unicode, excluding the control characters. 
the Unicode character #149 is a in a control character zone and is 
not a legal XML character.


Eric

At 1:00 PM -0800 1/31/02, Thomas Dowling wrote:
>  >I think you can solve the academic validation problem with
>>"charset=Windows-1252".  However, that won't solve the fact that there are
>>browsers that will either not display any character there, or display some
>>other character.  I don't know of any browsers that actually change their
>>character handling based on the charset in the content type.  There may be
>>*indexers* - but they'd want an actual HTTP header, not a meta tag (if I'm
>>wrong, someone stop me before I make a fool of myself).
>
>
>Well, that ship has sailed.  Obviously, browsers respond to charsets in
>order to display pages in non-Roman scripts.  Also, while changing the
>charset might make the actual character #149 valid, the numeric character
>entity "•" still represents Unicode and is still invalid (you see,
>there's SGML's and XML's "document character set" which isn't necessarily
>your *document's* character set...suddenly my brain hurts).
>
>So stick with UTF-8 and valid entities • or •.
>
>
>Thomas Dowling
>OhioLINK - Ohio Library and Information Network
>tdowling at ohiolink.edu

-- 

Eric Hellman, President                            Openly Informatics, Inc.
eric at openly.com                                    2 Broad St., 2nd Floor
tel 1-973-509-7800 fax 1-734-468-6216              Bloomfield, NJ 07003
http://www.openly.com/1cate/      1 Click Access To Everything
http://my.linkbaton.com/                Links that Learn


More information about the Web4lib mailing list