[Web4lib] cms and metadata

Thu Oct 11 17:42:57 EDT 2007

On 10/11/07, Crystal Knapp <crystal.knapp at state.or.us> wrote:
> Alnisa,
>
> By "web publisher," I'm referring to the person entering the document into the cms.  In > our case, this is usually the same person that publishes the document and enters the   > metadata.

That's what I was thinking. Yes, this is how our various clients enter
their metadata. So this might be a long response, be pre-warned.

> It's interesting to hear that you found that improving the metadata also improved search results in Google.  While I might anticipate this within an internal search engine, it was my impression that most commercial search engines don't rely on keyword metadata much anymore. That is promising to hear.  Also, it sounds like you don't use title or description metadata? <

I had heard similar rumblings, but it doesn't seem to be totally true.
I believe less reliance is placed on it, since a number of sites stuff
their data, but if the metadata is clean, consistent, and supports the
page contents, adding it to other efforts can really make a
difference.

I do use title, but just as part of the <title></title> tag. We also
make sure that this is reflective of the document being viewed,
instead of the generic  'My Site'. For some clients we do use the 'My
Site: Article Title' format.

> I'd be interested in hearing more about how you set up the default data for pages where the user didn't contribute any.  What kind of criteria/logic did you use?  Is this a feature of "Expression Engine?"  We aren't in a position to change our CMS (Teamsite), but I'm still interested to hear more about how you did this. <

Part of its direct 'logic' but some of it is also handled by
information architecture. For example, we try to make section names
informative and or reflective for example:
http://www.mysite.org/legislation/  or
http://www.mysite.org/publications/  .   So based on site
architecture, some details are always known:

• items must be published to a single section
• items can be published on multiple categories
• items can have multiple keywords
• all items have a publish date and time (this can be adjusted by the user)
• all items have an author (this can also be adjusted by the user)
• some sites have 'publisher' and 'author data', and we code the difference
• individual articles can be published in different languages
• language identification can be reflected by modifying the character
set appropriately

We model our metadata structure after the International Herald
Tribune, but not as complicated. We do <title></title> tags,
obviously, and recommend everyone always have reflective and
appropriate titles (this is very important). Then we also do:
content-type, content-style, content-language, description, keywords,
author, date, copyright, owner, summary, and section. Some clients
have others. IHT uses like 20 different tags, none of our clients need
that many.

Expression Engine allows for fairly robust if else:if logic, so we do
a variety of situational checks. {if keywords ==""}consumer, advocacy,
education{if:else}{keywords}{/if}  basically allows us to say, if a
field, 'keywords' has no data, use the generic data, otherwise, use
specific keywords. And this is done for all fields, we check if they
are empty if so we use defaults, if not we use the field or a
combination of defaults and the fields.

Though we also do checks on other fields as well. For example, one
client publishes headline news from other sources. We don't want to
claim Ownership of that. So we check if publisher_name not empty, then
use publisher as owner.

I'm going to include some sample code, from a fairly simple set-up—my
personal website. So it doesn't have a lot of if then statement, just
some straight forward uses.

--------------CODE---------------------------
{exp:weblog:entries
weblog="main|colophon|books|playlists|videos|maps|collections|movies|music|television|websites"
limit="1" disable="member_data|trackbacks"}

    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    <meta name="author" content="{author}" />
    <meta name="description" content="Wide-Eyed &amp; Laughing: A
Digital Life Collection from Alnisa Allgood" />
    <meta name="keywords" content="life, media,
{tv_keywords}{bk_keywords}{music_keywords}{website_keywords}{keywords}{movie_keywords}"
/>
    <meta name="copyright" content="Copyright {entry_date format=' %Y '}" />
    <meta name="owner" content="Wide-Eyed & Laughing" />
    <meta name="pub-date" content="{entry_date format='%Y-%m-%d'}" />
    <meta name="section" content="{weblog}" />
    <meta name="summary"
content="{summary}{tv_summary}{bk_summary}{music_summary}{website_summary}{description}{movie_summary}"
/>

{/exp:weblog:entries}

------------ END CODE -----------------------------------

Basically, since the header is used for every section of the site, I
list all the areas where data can be pulled from, then add any
exclusions that I want, then limit the number of records.

>From there, I start pulling data. Anything not changing is just
written in, anything that does change has a field {field}
identification. Expression Engine is nice so that if you are pulling
data from multiple areas, but only displaying data from a single area,
you can just stack fields. So
{summary}{tv_summary}{bk_summary}{music_summary}{website_summary}{description}{movie_summary}
 will technically only print a single summary, and the other fields
will display as empty.

Some systems would convert those fields into errors, as will EE if you
only pull data from one section, but since I identified all the
sections that also listed fields from, EE properly handles those
fields as empty.

Then you close the EE code, and whenever the page load data is pulled
from the article, from defaults, or from what was directly written on
the page.

That's it. As I said, its more complicated for some clients, lots of
{if} {if:elseif}{if:else} statements. But the basic structure is the
same.

Alnisa

> > -----Original Message-----
> > From: allgood2 at gmail.com [mailto:allgood2 at gmail.com] On Behalf Of Alnisa
> > Allgood
> > Sent: Thursday, October 11, 2007 12:02 PM
> > To: web4lib at webjunction.org;KNAPP Crystal
> > Subject: Re: [Web4lib] cms and metadata
> >
> > Hi Crystal-
> >
> > I'm not certain, when you say web publisher-does that mean the CMS or
> > web developer, or the person entering the document into the web
> > system?
> >
> > It's not a library system, but we've had great success using
> > Expression Engine (a CMS), and creating page specific metadata for
> > clients.  The original project goal was twofold (1) the technical: was
> > it possible to adjust metadata on the per article basis; and (2) did
> > adapting this data improve search performance both onsite and offsite
> > (page ranking).
> >
> > Both replies were a resounding yes. Google page rank increased, which
> > also seemed to influence rank at other search engines. Internal search
> > results seemed more accurate, and technically the process was very,
> > very easy.  Basically we decided which metadata would be served with
> > the web page, then just set-up calculations so if no user contributed
> > data was provided, default data would be used, otherwise, it used
> > either pure user data or a combination of user data with defaults.
> >
> > The big end is of course the users, and garbage in means garbage out,
> > but we made it so as much data as possible could be collected from the
> > system itself. Things like 'author', 'publisher', publish date,
> > section, etc.  That way, the user just had to really complete two
> > fields: keywords, and language, and an optional third-summary. The
> > other fields could be modified by the user, but didn't have to be.
> >
> > Metadata for clients transformed from being, just the client name with
> > the same five organizational keywords, and a manually updated
> > copyright date.  To being far more robust and with obvious metadata
> > distinction say between an article on credit card fraud versus an
> > article on home loans.
> >
> > Alnisa
> >
> >
> > On 10/11/07, Crystal Knapp <crystal.knapp at state.or.us> wrote:
> > > Earlier this week, I posted a link to a survey the State Library of
> > Oregon is conducting on metadata and taxonomies.  This survey is still up
> > through Monday at
> > http://library.state.or.us/services/surveys/survey.php?sid=170. I can post
> > the results to the list if there is interest.
> > >
> > > I also wanted to pose a few questions to the list.  For those of you who
> > use content management systems and maintain a search feature for your
> > website, do you use metadata in conjunction with your pages?  Does anyone
> > have a success story of having good quality metadata that is created by
> > the web publisher (and not edited by library staff later)?
> > >
> > > I am asking because at the Oregon State Library, we are responsible for
> > the metadata schema (and search feature) used by all Oregon government
> > agencies, aka all Oregon.gov websites.  We currently have over 2,000 web
> > publishers within Interwoven's Teamsite who all create their own metadata
> > as they publish pages, and, as you can probably imagine, the metadata
> > ranges in quality from good to useless.  We're looking for other
> > solutions.  I'm curious to hear about other successful metadata
> > contribution models for content management systems.  It'd also be
> > interesting to hear if you just aren't using metadata at all within your
> > cms.  I'm specifically referring to metadata used to assist with find-
> > ability, not for preservation or document management.
> > >
> > > Thanks in advance,
> > >
> > > Crystal Knapp
> > > E-Government Librarian
> > > Oregon State Library
> > > crystal.knapp at state.or.us
> > > 503-378-5009
> > >
> > >
> > > _______________________________________________
> > > Web4lib mailing list
> > > Web4lib at webjunction.org
> > > http://lists.webjunction.org/web4lib/
> > >
> > >
>
>
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>
>