[Web4lib] Server logs as tag clouds

Tom Keays tomkeays at gmail.com
Wed May 9 10:51:05 EDT 2007


O'Reilly has a nifty feature that displays the top 20 search terms on
their various sites using "terms that someone typed into a search
engine (e.g., Google) and then followed a resulting link". (They're
also distrubuting these tags as JSON, which is a nice idea.)

http://www.oreillynet.com/feeds/widgets/organic_search_tagcloud/

Presumably they are doing server log analysis to get and rank search
terms as tags (although there is no way to tell absolutely since the
code is not GPL). It seems like it would be a good complement to
search log analysis to see how people are finding and using your site.

O'Reilly has addressed the potential issues of privacy and
appropriateness of the displayed tags by matching search terms back to
an index of their site. "While the keyword frequency does give some
idea of what people are looking for, keep in mind that the word had to
already be on our site in order for it to appear, and it had to be
ranked highly enough for someone to find it."

It also greatly helps that their site has a highly structured search
engine, allowing limiting of results by content type and by site. This
is probably only practical on sites that use a structured CMS.

Still, it is worth asking: Has anyone made a stab at this -- ie,
publically exposing server logs? Are there code examples (any
real-world, generalizable examples would be welcome). Sorry for
cross-posting this.

-- 
Tom


More information about the Web4lib mailing list