[Web4lib] RSS and diacritics

Jonathan Gorman jtgorman at uiuc.edu
Tue Nov 27 15:56:45 EST 2007



Apologizes, In rereading I realized I mis-interpreted what you were saying.  I thought you had two distinct problems (using html character entities) and issues with diacritics.

The answer as far as the entities?  RSS can be a mess ;).  RSS feeds are XML.  Sadly, a widespread practice has occurred of using "escaped html" in fields of the RSS feeds.  There's no way to ensure that these escaping nightmares will be parsed correctly.

HTML defines some character entities, but RSS doesn't have all of them.  You can attempt to add these characters to the RSS feed via including them in a Doctype declaration at the beginning of the feed.  This wikipedia page looks like it has some examples of that: http://en.wikipedia.org/wiki/XML.

The best solution?  Not really sure.  I'd lean towards not using "escaped html" in my RSS feed.  Instead use just rss and the character references, which should display cleanly assuming that the rss feeder isn't junk.

(And by character reference, I mean use &#x..; where .. is the appropriate code point).

See http://en.wikipedia.org/wiki/Character_entity_reference for a bit more information.

Jon Gorman

---- Original message ----
>Date: Tue, 27 Nov 2007 14:56:56 -0500
>From: Bob Duncan <duncanr at lafayette.edu>  
>Subject: [Web4lib] RSS and diacritics  
>To: web4lib at webjunction.org
>
>
>Greetings,
>
>I'm getting ready to offer RSS feeds for our library's recent 
>acquisitions lists and have run into a little snag:  characters with 
>diacritics.  I understand why I can't use HTML character entity 
>references and expect all feed readers to play nicely, so I tried 
>encoding the ampersand in the HTML entity reference (a suggested fix 
>that I can no longer document).  While this works great for some feed 
>readers, other readers and the two major browsers display the raw 
>code instead of the character with diacritical mark.
>
>Other than displaying plain letters without diacritics, is there a 
>way to code feeds so that all (or at least most) feed readers will 
>display the character with the mark?  (I'd like to be able to this in 
>item titles and descriptions.)
>
>Thanks,
>
>Bob Duncan
>
>
>~!~!~!~!~!~!~!~!~!~!~!~!~
>Robert E. Duncan
>Systems Librarian
>Editor of IT Communications
>Lafayette College
>Easton, PA  18042
>duncanr at lafayette.edu
>http://www.library.lafayette.edu/ 
>
>
>_______________________________________________
>Web4lib mailing list
>Web4lib at webjunction.org
>http://lists.webjunction.org/web4lib/


More information about the Web4lib mailing list