[WEB4LIB] Re: Ampersands in database URLs problem

Tue Jun 8 17:07:26 EDT 2004

Tom, you're my hero!

After reading through your message closely, I clued into what I've been 
doing wrong.  I haven't been uploading pages and testing the links.  I've 
been copying the code into my browser's address bar and trying it that 
way.  The plain text solution was it!

I guess I should be wondering about all of the databases that it DID work 
for.

As always, I knew the answer would come from Web4LIB.  Thanks!

-----------------------------------------------------------------
OSU is currently changing the campus e-mail system and my e-mail will be 
extremely unreliable until at least the end of the summer.  If you have 
sent a message which I did not respond to, please try to resend it.
-----------------------------------------------------------------
Beth Reiten, Librarian
Digital Library Services
Edmon Low Library
Oklahoma State University
Phone: 405-744-9109
Email: reitene at okstate.edu

Thomas Dowling <tdowling at ohiolink.edu>
Sent by: web4lib at webjunction.org
06/08/2004 03:57 PM
Please respond to tdowling

        To:     Multiple recipients of list <web4lib at webjunction.org>
        cc:     (bcc: Elizabeth A Reiten/lib/Okstate)
        Subject:        [WEB4LIB] Re: Ampersands in database URLs problem

Elizabeth A Reiten wrote:

>Now, everybody seems to be using ampersands in their URLs these days; I 
>understand that it often is a consequence of database-driven sites. 
> 
>

Ampersands in HTML should always be escaped as &amp; or an equivalent 
numerical entity.  There are cases where an ampersand can unambiguously 
stand for itself, but why court danger.

Because some characters have special meaning in HTML--specifically the 
ampersand and angle brackets--any occurrence of those characters *when 
encoded in HTML* must be handled by special rules.  So, to be valid and 
to avoid any possible confusion, URLs within HTML markup, including the 
href attribute on anchors, must be replaced with "&amp;", "&#38;" or 
"&#x26;".  I've never encountered a browser that failed to understand 
any of these in any HTML context, or that failed to convert these 
entities back to an ampersand when requesting a URL.

But...when the URL is not in a plain text context, an ampersand has to 
be just an ampersand.  Plain text contexts include plain text e-mail 
messages and also the text in your browser's address bar.  If you paste 
"...&amp;..." into a URL in the address bar, the browser is obligated to 
treat it as "[ampersand] [a] [m] [p] [semi-colon]".

It's a source of chronic confusion and frustration that the CGI 
specification gave the ampersand the role it did when it was already a 
special character in HTML.  In practice, you can often get away with 
unescaped ampersands in href attributes; all browsers I know will try to 
handle that as part of their error correction.  But it can lead to 
confusion if the string that follows the ampersand happens to match an 
entity name.  For example, should "...&copy=1..." be interpreted as "and 
copy equals 1" or "[copyright symbol] equals 1"?

The problem you describe with some database vendors sounds either like 
you're getting HTML-encoded ampersands into a plain text setting 
somehow, or you're getting ampersands encoded twice (&amp;amp;), perhaps 
by an over-zealous HTML editor.

-- 
Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu