[WEB4LIB] Re: another tangent to Re: Inline forms in CSS

Charlie Irwin cirwin at world.std.com
Sun Mar 3 18:44:42 EST 2002


      I had not really been following this thread BUT...

      The <noindex> tag is used to prevent search engines' spiders and 'bots
from crawling a page. It is specified by the Robots Exclusion Protocol,
according to Chris Sherman's and Gary Price's book "The Invisible Web".
Therefore it can't really be classed as "proprietary/made up stuff" and is
certainly not something invented by Atomz.

Charlie Irwin

-----Original Message-----
From: Vicki Falkland <library at cryptic.rch.unimelb.edu.au>
To: Multiple recipients of list <web4lib at webjunction.org>
Date: Sunday, March 03, 2002 5:36 PM
Subject: [WEB4LIB] Re: another tangent to Re: Inline forms in CSS


>thanks for all the responses to this.
>i really thought Atomz was the way to go until i saw that comment about the
><noindex> thing. obviously i cannot use it now. i'll be taking a look at
>swish-e instead.
>when my boss asks me "why the change?" i'll tell her "cos the WEB4LIBbers
>told me its a bad bad thing" :)
>
>many thanks,
>Vicki
>
>
>
>At 06:54 AM 28/02/02 -0800, you wrote:
>>At 05:20 PM 2/27/2002, Vicki Falkland wrote:
>>> >
>>> >[Second things second: Who invented <NOINDEX>...</NOINDEX>
>>> >elements?  Proprietary/made-up stuff like that gets more and more
>likely to
>>> >screw things up as browsers start expecting you to abide by your
doctype
>>> >declaration.]
>>> >
>>>
>>>I am in the process of implementing a search feature on our site using
>>>Atomz (www.atomz.com)
>>>While testing, I noticed that if I searched on a word which happened to
be
>>>used in various bits of navigation text, the search results listed every
>>>page in the site and the descriptive text was simply a rehash of the
>>>navigation text.
>>>Their help files suggest using <noindex></noindex> for any portions of
text
>>>I may wish to exclude from being indexed (like navigation!) to correct
this
>>>problem.
>>
>>HTML is not just an arbitrary bunch of tags, to which vendors can add
their
>>own creations willy-nilly.  It is a standard derived by a consensus of the
>>W3C membership.  Anyone writing software that does something with HTML can
>>look at the standard; they won't look at the Atomz help documents.
>>
>>By adding a bogus NOINDEX element, you break any program that looks for
>>valid markup.  That includes, obviously, validators, which will never pass
>>your pages--so you may not be able to use them to see what else is wrong
>>with them.
>>
>>It also increases the risk that newer browsers, some of which take your
>>doctype declaration seriously, will choke to some extent on seeing this
>>unknown element.  Will that cause problems with HTML and/or CSS
>>rendering?  Answer: you can't know for sure, so stick to the spec.
>>
>>And the most likely problem: an HTML editor, upon opening your page, may
>>discard the bogus elements it finds, perhaps without warning you, so that
>>when you save it again there will be changes you're not aware of.
>>
>>It's a pity the Atomz didn't take the obvious step of delimiting
>>non-indexed parts of the document with comments.  It seems obvious to me
>>that you could look for "<!-- atomz indexing off -->...<!-- atomz indexing
>>on -->" without affecting the validity of the document.  If they have
>>responsive developers, you might suggest something like that.
>>
>>
>>If you were running a search engine on your own server that looked at the
>>source HTML files rather than getting them through your server, you could
>>use server-side includes for the navigation bars.  Then they wouldn't even
>>be in the files that get indexed.
>>
>>
>>Thomas Dowling
>>OhioLINK - Ohio Library and Information Network
>>tdowling at ohiolink.edu
>>
>>
>>




More information about the Web4lib mailing list