Sex & the Search Engine

Mon Jan 26 19:16:26 EST 1998

Nick posted a reply about my article on RDF, saying it had erroneous 
statements. My defense is below. I'd encourage you to read the 
article, of course, so that you get both viewpoints in context. Plus, 
there's all sorts of links about RDF and other metadata proposals on 
the page. The article is:

The New Meta Tags Are Coming - Or Are They?
http://searchenginewatch.com/sereport/9712-metatags.htm

FYI, I think the key issue is that RDF is promoted as a system that
will improve web searching, among other things. Perhaps it will, but
there are some definite questions on whether it will take hold, IMHO.
The search engines will be a powerful force, and a central element
of the article was that they haven't been much involved.

That doesn't mean its doomed. It doesn't even mean the evolving 
proposal is bad. But there are questions, and the article was meant 
to educate webmasters to some issues involved.

And how about that defense:

> * Netscape didn't back MCF, which was created at Apple.  Netscape
> hired its inventor and he is spearheading their RDF developments.

Netscape did back MCF, when it submitted its MCF for XML proposal to
the W3C on June 13, 1997. On Sept 8, 1997, they announced support
for RDF, instead. The only reason I mentioned Netscape with MCF in
the article is because some developers still remember that earlier
move and wonder if they should care about MCF (answer, probably
not).

> * Digital Equipment, which operates Altavista, is participating, so there
> is at least one major search service vendor.

Just because Digital is involved does not mean those at AltaVista
are. I asked people at AltaVista about RDF, and it was not something
in their current planning. They answered like all the other search
engines: we'll watch and see.

> * Verity is participating, so there is one major search software developer.
>  I am a member of the RDF Schema Working Group.

Yes, but Verity does not run one of the major web-wide search
engines. That's no slight to Verity, because your products are
excellent. But RDF has been posed as a solution to both Intranet and
Internet searching. 

It can be very powerful for Intranets. However, if RDF is going to
take hold as a web-wide searching assistance standard, it's almost
certainly going to take the major search engines to get behind it. 
That's opinion, of course, but I think it makes sense.

> * Dublin Core was not developed by the Web community; RDF is a
> project of the W3C and is complementary to Dublin Core, not
> competitive.  It should help spur implementation of DC.

Perhaps the W3C backing will help, but I think Dublin Core has been
a pretty well known standard that the search engines have ignored.
If they get behind RDF, great. The main thing is that they have a
general mistrust of metadata, because people are purposely
misleading with it.

> * RDF's design goals include that it can be encoded as easily as HTML (I
> don't agree, but the committee feels otherwise).  It is unfair to criticize
> it as more complicated when it isn't even proposed yet.  Just as browsers
> may be quite complex, the applications that use and produce RDF may be
> complex.

Tim Bray and Ora Lassila, both on the committee, readily admit that 
RDF will be more complicated that existing meta tags. But I did also 
note that they both expect this complexity will be diminished by 
authoring tools. In either case, the key will be if people make the 
extra effort to define resources -- and if they information they 
provide is trustworthy.

> It is difficult to imagine a future in which large search services are
> useful unless they begin to incorporate meta-data.  I am quite sure that
> you will see serious competitors to them emerge if they fail to do so.
> You'll see large Internet infrastructure companies who serve as trusted
> meta-data exchanges.

Actually, I think one of the saving graces of web relevancy is the 
fact that full-text indexing continues to occur. With so many people 
trying to trick search engines, the actual content is sometimes 
helpful.

Every search engine describes spamming as an arms race. To them,
user-defined meta data is probably not going to be the saving grace.
However, trustworthy third-party meta data may do so.

> New search features show up initially in software tools such as
> ours, then they migrate to the large services as the technology
> becomes less expensive and higher in performance.  I can't identify
> one advanced feature of the large search services that wasn't
> shipped first by a software tools vendor. 

Link relevancy involved with determining ranking of pages? Spam
detection, from invisible text to word stuffing? Detection of 
submission page stuffing attempts? These aren't advanced features 
needed on an Intranet, but they are some of the essential feature 
that keep the major search engines still going. Again, that's not to 
downplay what they get from software vendors. It's mainly to stress 
that on the web, they have to adapt to indexing untrustworthy data. 
If they decided RDF can help them in the battle, I'm sure they'll 
sign up.

cheers,
danny

-----------------------------------
Danny Sullivan 
Editor, Search Engine Watch
http://searchenginewatch.com