[Web4lib] Dublin Core: An idea and thoughts

Wed Nov 8 19:42:09 EST 2006

Hi David,

You raised a good point - Why bother?  To paraphrase a popular saying,
"Publish 'em all, and let Google sort them out."

I can see a point coming in the future when keyword searching won't be
enough.  With the addition of maps and geospatial metadata,
especially, but also with images, media, etc.  Things that can't carry
a description by itself rely on metadata to make itself known.
Currently, the method of finding it is serendipity.  It usually relies
on the fact that HTML around the link is in proximity to the media,
and therefore provides the keywords that are important to Google to
pick up.

People are also starting to look for context to their metadata, as
well as new methods of accessing, using and analysing their data.
We're currently stuck with the browser-webpage model, but imagine
being able to navigate a website in much the same way you can navigate
the filesystem of an OS X system in column-view mode.  You can
abstract the presentation from the content.

I made the suggestion precisely because nobody can be bothered to make
the extra effort.  Or, more precisely, only the people that care can
be bothered to make the extra effort.  The thing is, it is easier for
the metadata-maniac to get changes made to one file than to go around
and either a) convince people to do it from the get-go or b) add the
metadata after they have published it.

I know this isn't a fully robust solution.  It breaks down when you
get to the page level, and it requires a significant change to the
entire web ecosystem. (Servers have to implement it, but browsers also
have to support it.)  I don't think it would be a big hit to everyone
overnight - it takes time to build this stuff, and even longer for
people to see how useful it is.  However, you have to start somewhere,
and I figured this was as good a place to start as any.

Cheers,
Andrew

On 11/8/06, David Kane <DKANE at wit.ie> wrote:
> Hi Andrew,
>
> This is the classic problem with metadata.  Nobody can be bothered to
> make the extra effort to add it.  The solution you suggest to the
> problem is a good one, I think.  But is the problem itself worth
> solving? To make this happen, you would have to make a very convincing
> arguement to the web development public, which you do not make here.
>
> All the best,
>
>
> David Kane
> WIT Libraries
> http://library.wit.ie/
> ++353.51302838
>
>
> >>> "Andrew Hankinson" <andrew.hankinson at gmail.com> 31/10/2006 01:38:40
> >>>
> Hi folks,
>
> I've been mulling an idea over in my head for a while now, and am just
> getting to the point where I think I can sufficiently explain it.  I
> have not really floated this idea to many people yet, so I am
> interested in hearing feedback from everyone out there.
>
> First, some preamble:
> As with most good ideas that have not gone anywhere, the problem lies
> in the execution, and not the idea itself.  I think that,
> unfortunately, such is the case with Dublin Core.  It was meant to
> help organize the web - to provide metadata and machine-understandable
> context that the average web developer could deploy without having a
> degree in information retrieval.
>
> Unfortunately, after almost 10 years of having this standard, we're
> not really much further ahead.
>
> I know many software projects use Dublin Core as a foundation for
> their metadata - projects such as Dspace, Fedora, Greenstone, and
> countless others.  Ironically, however, these are projects that are
> used mostly by information retrieval professionals, and DC has made
> few inroads to being adopted by the general web development public.
>
> I think one of the reasons for this discrepancy is the lack of useful
> and popular tools for this standard.  It all starts with the people
> producing the content. Presently, for a website to have a Dublin Core
> record it must be included in the metadata section of the header of
> each and every page you create - a task that gets exponentially
> cumbersome when it comes to maintaining hundreds or thousands of web
> pages.  This metadata must often be hand-crafted - yes there are some
> tools that assist in the construction to some extent, but there is no
> auto-creation of metadata mechanism.
>
> The second part of the puzzle is the people organizing the content.
> They depend on the people producing the content, but with so few sites
> using Dublin Core, they have no impetus to build it into their
> software products.
>
> The third part is, of course, the people consuming the content.
> However, since the people producing the content are not supplying the
> people organizing it with any useful information, they cannot pass
> this along to the consumers.  'Consumers' take what they are given,
> which in the current information environment means they rely on
> Google's ranking algorithim.
>
> Now, the idea:
> Since every web site must be served from a web server, what if we
> could take the metadata cataloguing and management from the page level
> and place it with the server itself.  In thinking about this, I
> specifically had in mind Apache as a platform, but other platforms
> would function similarly.
>
> By installing a module for Apache (say, 'mod_dc' or similar), it would
> extend the functionality of the server to understand and serve
> requests for Dublin Core metadata.  By keeping the configuration of
> this as simple as possible I believe that we can lower the barriers of
> implementation.
>
> Consider: every subdirectory in a website can contain a .htaccess
> file.  This file provides local configuration options for all files in
> that directory and below.  What if, in this file, we could write
> (example given in pseudo-code):
>
> <dc.title>Title of site or sub-site</dc.title>
> <dc.date.modified>2006-10-30</dc.date.modified>
> <dc.creator>Joe Q. Developer</dc.creator>\
> etc. etc. for all the DC elements
>
> Since .htaccess files can get parsed hierachially, you could inherit
> common properties across all the pages in your site. (sort of like a
> cascading style sheet.)  So, if you had one publishing organization
> for the entire site, you would not have to maintain that in each
> subfolder - you could place that tag in your site root or even in your
> main configuration file, and all pages and subsites below that would
> inherit that information.  To override that for, say, an independent
> sub-site, simply put a <dc.publisher> tag in the .htaccess file in
> that sub-directory.
>
> This would alleviate most of the work done by developers, and allow
> for a centralized record that could be maintained by people other than
> the 'code monkeys' without having to re-write every page on the
> website.  (librarians and taxonomists, here's your chance!)
>
> For organizers and consumers, then, they would use tools that would
> pass a query to the webserver to see what it has to offer for Dublin
> Core.  Something like:
> http://www.mysite.com/?dc or http://www.mysite.com/subsite/?dc.
> The server would then respond with properly formatted XML of the
> complete Dublin Core record that could then be parsed by a web
> browser, but could also be parsed with a myriad of other tools to
> provide further indexing, navigation, structural and semantic
> metadata.
>
> What's out there now:
> I have had a look at a number of other projects that seem to offer
> similar services, but so far I have been underwhelmed.  There is a
> mod_dublincore project out there, but it seemed to be focussed on
> providing RSS functionality, and in any case has not been updated
> since 2000. (http://web.resource.org/rss/1.0/modules/dc/).  There is
> also mod_oai which looks extremely powerful, but it also fairly dense
> to implement.
>
> I would envision something that was, above all, simple to implement,
> and easy to see real, tangible benefits for putting it in place.  Like
> I mentioned before, the key sticking point is getting the content
> developers to do it.  Once they start delivering metadata, the
> organizers and consumers will start using it. "If you build it, they
> will come."  The barriers in implementation and ongoing maintenence
> needs just need to be much, much lower.
>
> That said, I'm looking for some feedback on this idea.  I know there
> are some drawbacks to this implementation, and it might have already
> been tried and failed - I don't know.  Any and all opinions are
> welcome.
>
> Andrew
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>