[WEB4LIB] Re: another tangent to Re: Inline forms in CSS
Thomas Dowling
tdowling at ohiolink.edu
Thu Feb 28 09:46:04 EST 2002
At 05:20 PM 2/27/2002, Vicki Falkland wrote:
> >
> >[Second things second: Who invented <NOINDEX>...</NOINDEX>
> >elements? Proprietary/made-up stuff like that gets more and more likely to
> >screw things up as browsers start expecting you to abide by your doctype
> >declaration.]
> >
>
>I am in the process of implementing a search feature on our site using
>Atomz (www.atomz.com)
>While testing, I noticed that if I searched on a word which happened to be
>used in various bits of navigation text, the search results listed every
>page in the site and the descriptive text was simply a rehash of the
>navigation text.
>Their help files suggest using <noindex></noindex> for any portions of text
>I may wish to exclude from being indexed (like navigation!) to correct this
>problem.
HTML is not just an arbitrary bunch of tags, to which vendors can add their
own creations willy-nilly. It is a standard derived by a consensus of the
W3C membership. Anyone writing software that does something with HTML can
look at the standard; they won't look at the Atomz help documents.
By adding a bogus NOINDEX element, you break any program that looks for
valid markup. That includes, obviously, validators, which will never pass
your pages--so you may not be able to use them to see what else is wrong
with them.
It also increases the risk that newer browsers, some of which take your
doctype declaration seriously, will choke to some extent on seeing this
unknown element. Will that cause problems with HTML and/or CSS
rendering? Answer: you can't know for sure, so stick to the spec.
And the most likely problem: an HTML editor, upon opening your page, may
discard the bogus elements it finds, perhaps without warning you, so that
when you save it again there will be changes you're not aware of.
It's a pity the Atomz didn't take the obvious step of delimiting
non-indexed parts of the document with comments. It seems obvious to me
that you could look for "<!-- atomz indexing off -->...<!-- atomz indexing
on -->" without affecting the validity of the document. If they have
responsive developers, you might suggest something like that.
If you were running a search engine on your own server that looked at the
source HTML files rather than getting them through your server, you could
use server-side includes for the navigation bars. Then they wouldn't even
be in the files that get indexed.
Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu
More information about the Web4lib
mailing list