[WEB4LIB] Ampersands in database URLs problem
Thomas Dowling
tdowling at ohiolink.edu
Tue Jun 8 16:53:27 EDT 2004
Elizabeth A Reiten wrote:
>Now, everybody seems to be using ampersands in their URLs these days; I
>understand that it often is a consequence of database-driven sites.
>
>
Ampersands in HTML should always be escaped as & or an equivalent
numerical entity. There are cases where an ampersand can unambiguously
stand for itself, but why court danger.
Because some characters have special meaning in HTML--specifically the
ampersand and angle brackets--any occurrence of those characters *when
encoded in HTML* must be handled by special rules. So, to be valid and
to avoid any possible confusion, URLs within HTML markup, including the
href attribute on anchors, must be replaced with "&", "&" or
"&". I've never encountered a browser that failed to understand
any of these in any HTML context, or that failed to convert these
entities back to an ampersand when requesting a URL.
But...when the URL is not in a plain text context, an ampersand has to
be just an ampersand. Plain text contexts include plain text e-mail
messages and also the text in your browser's address bar. If you paste
"...&..." into a URL in the address bar, the browser is obligated to
treat it as "[ampersand] [a] [m] [p] [semi-colon]".
It's a source of chronic confusion and frustration that the CGI
specification gave the ampersand the role it did when it was already a
special character in HTML. In practice, you can often get away with
unescaped ampersands in href attributes; all browsers I know will try to
handle that as part of their error correction. But it can lead to
confusion if the string that follows the ampersand happens to match an
entity name. For example, should "...©=1..." be interpreted as "and
copy equals 1" or "[copyright symbol] equals 1"?
The problem you describe with some database vendors sounds either like
you're getting HTML-encoded ampersands into a plain text setting
somehow, or you're getting ampersands encoded twice (&), perhaps
by an over-zealous HTML editor.
--
Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu
More information about the Web4lib
mailing list