[WEB4LIB] Ampersands in database URLs problem

Thomas Dowling tdowling at ohiolink.edu
Tue Jun 8 16:53:27 EDT 2004


Elizabeth A Reiten wrote:

>Now, everybody seems to be using ampersands in their URLs these days; I 
>understand that it often is a consequence of database-driven sites. 
>  
>

Ampersands in HTML should always be escaped as & or an equivalent 
numerical entity.  There are cases where an ampersand can unambiguously 
stand for itself, but why court danger.

Because some characters have special meaning in HTML--specifically the 
ampersand and angle brackets--any occurrence of those characters *when 
encoded in HTML* must be handled by special rules.  So, to be valid and 
to avoid any possible confusion, URLs within HTML markup, including the 
href attribute on anchors, must be replaced with "&", "&" or 
"&".  I've never encountered a browser that failed to understand 
any of these in any HTML context, or that failed to convert these 
entities back to an ampersand when requesting a URL.

But...when the URL is not in a plain text context, an ampersand has to 
be just an ampersand.  Plain text contexts include plain text e-mail 
messages and also the text in your browser's address bar.  If you paste 
"...&..." into a URL in the address bar, the browser is obligated to 
treat it as "[ampersand] [a] [m] [p] [semi-colon]".

It's a source of chronic confusion and frustration that the CGI 
specification gave the ampersand the role it did when it was already a 
special character in HTML.  In practice, you can often get away with 
unescaped ampersands in href attributes; all browsers I know will try to 
handle that as part of their error correction.  But it can lead to 
confusion if the string that follows the ampersand happens to match an 
entity name.  For example, should "...&copy=1..." be interpreted as "and 
copy equals 1" or "[copyright symbol] equals 1"?


The problem you describe with some database vendors sounds either like 
you're getting HTML-encoded ampersands into a plain text setting 
somehow, or you're getting ampersands encoded twice (&), perhaps 
by an over-zealous HTML editor.


-- 
Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu




More information about the Web4lib mailing list