[WEB4LIB] Re: another tangent ..<noindex>

Sun Mar 3 19:06:06 EST 2002

>      The <noindex> tag is used to prevent search engines' spiders and 'bots
>from crawling a page. It is specified by the Robots Exclusion Protocol,
>according to Chris Sherman's and Gary Price's book "The Invisible Web".
>Therefore it can't really be classed as "proprietary/made up stuff" and is
>certainly not something invented by Atomz.
>
>

uggghhh ... so what do i do now?? is it a "bad bad" thing or not?? my
validator certainly doesn't like it, as suggested earlier.
V.

>-----Original Message-----
>From: Vicki Falkland <library at cryptic.rch.unimelb.edu.au>
>To: Multiple recipients of list <web4lib at webjunction.org>
>Date: Sunday, March 03, 2002 5:36 PM
>Subject: [WEB4LIB] Re: another tangent to Re: Inline forms in CSS
>
>
>>thanks for all the responses to this.
>>i really thought Atomz was the way to go until i saw that comment about the
>><noindex> thing. obviously i cannot use it now. i'll be taking a look at
>>swish-e instead.
>>when my boss asks me "why the change?" i'll tell her "cos the WEB4LIBbers
>>told me its a bad bad thing" :)
>>
>>many thanks,
>>Vicki
>>
>>
>>
>>At 06:54 AM 28/02/02 -0800, you wrote:
>>>At 05:20 PM 2/27/2002, Vicki Falkland wrote:
>>>> >
>>>> >[Second things second: Who invented <NOINDEX>...</NOINDEX>
>>>> >elements?  Proprietary/made-up stuff like that gets more and more
>>likely to
>>>> >screw things up as browsers start expecting you to abide by your
>doctype
>>>> >declaration.]
>>>> >
>>>>
>>>>I am in the process of implementing a search feature on our site using
>>>>Atomz (www.atomz.com)
>>>>While testing, I noticed that if I searched on a word which happened to
>be
>>>>used in various bits of navigation text, the search results listed every
>>>>page in the site and the descriptive text was simply a rehash of the
>>>>navigation text.
>>>>Their help files suggest using <noindex></noindex> for any portions of
>text
>>>>I may wish to exclude from being indexed (like navigation!) to correct
>this
>>>>problem.
>>>
>>>HTML is not just an arbitrary bunch of tags, to which vendors can add
>their
>>>own creations willy-nilly.  It is a standard derived by a consensus of the
>>>W3C membership.  Anyone writing software that does something with HTML can
>>>look at the standard; they won't look at the Atomz help documents.
>>>
>>>By adding a bogus NOINDEX element, you break any program that looks for
>>>valid markup.  That includes, obviously, validators, which will never pass
>>>your pages--so you may not be able to use them to see what else is wrong
>>>with them.
>>>
>>>It also increases the risk that newer browsers, some of which take your
>>>doctype declaration seriously, will choke to some extent on seeing this
>>>unknown element.  Will that cause problems with HTML and/or CSS
>>>rendering?  Answer: you can't know for sure, so stick to the spec.
>>>
>>>And the most likely problem: an HTML editor, upon opening your page, may
>>>discard the bogus elements it finds, perhaps without warning you, so that
>>>when you save it again there will be changes you're not aware of.
>>>
>>>It's a pity the Atomz didn't take the obvious step of delimiting
>>>non-indexed parts of the document with comments.  It seems obvious to me
>>>that you could look for "<!-- atomz indexing off -->...<!-- atomz indexing
>>>on -->" without affecting the validity of the document.  If they have
>>>responsive developers, you might suggest something like that.
>>>
>>>
>>>If you were running a search engine on your own server that looked at the
>>>source HTML files rather than getting them through your server, you could
>>>use server-side includes for the navigation bars.  Then they wouldn't even
>>>be in the files that get indexed.
>>>
>>>
>>>Thomas Dowling
>>>OhioLINK - Ohio Library and Information Network
>>>tdowling at ohiolink.edu
>>>
>>>
>>>
>
>