[WEB4LIB] Re: another tangent to Re: Inline forms in CSS

Thomas Dowling tdowling at ohiolink.edu
Thu Feb 28 09:46:04 EST 2002


At 05:20 PM 2/27/2002, Vicki Falkland wrote:
> >
> >[Second things second: Who invented <NOINDEX>...</NOINDEX>
> >elements?  Proprietary/made-up stuff like that gets more and more likely to
> >screw things up as browsers start expecting you to abide by your doctype
> >declaration.]
> >
>
>I am in the process of implementing a search feature on our site using
>Atomz (www.atomz.com)
>While testing, I noticed that if I searched on a word which happened to be
>used in various bits of navigation text, the search results listed every
>page in the site and the descriptive text was simply a rehash of the
>navigation text.
>Their help files suggest using <noindex></noindex> for any portions of text
>I may wish to exclude from being indexed (like navigation!) to correct this
>problem.

HTML is not just an arbitrary bunch of tags, to which vendors can add their 
own creations willy-nilly.  It is a standard derived by a consensus of the 
W3C membership.  Anyone writing software that does something with HTML can 
look at the standard; they won't look at the Atomz help documents.

By adding a bogus NOINDEX element, you break any program that looks for 
valid markup.  That includes, obviously, validators, which will never pass 
your pages--so you may not be able to use them to see what else is wrong 
with them.

It also increases the risk that newer browsers, some of which take your 
doctype declaration seriously, will choke to some extent on seeing this 
unknown element.  Will that cause problems with HTML and/or CSS 
rendering?  Answer: you can't know for sure, so stick to the spec.

And the most likely problem: an HTML editor, upon opening your page, may 
discard the bogus elements it finds, perhaps without warning you, so that 
when you save it again there will be changes you're not aware of.

It's a pity the Atomz didn't take the obvious step of delimiting 
non-indexed parts of the document with comments.  It seems obvious to me 
that you could look for "<!-- atomz indexing off -->...<!-- atomz indexing 
on -->" without affecting the validity of the document.  If they have 
responsive developers, you might suggest something like that.


If you were running a search engine on your own server that looked at the 
source HTML files rather than getting them through your server, you could 
use server-side includes for the navigation bars.  Then they wouldn't even 
be in the files that get indexed.


Thomas Dowling
OhioLINK - Ohio Library and Information Network
tdowling at ohiolink.edu




More information about the Web4lib mailing list