Stats o' the Day revisited

Roy Tennant rtennant at library.berkeley.edu
Tue Apr 29 15:15:34 EDT 1997


Pointers to both of these messages are now on the Library Web Manager's 
Reference Center, under its own section called "The State of Library Webs".
Roy

On Tue, 29 Apr 1997, Thomas Dowling wrote:

> Web4Lib--
> 
> About three months ago, I posted some statistics regarding the validation
> of HTML on library home pages.  By the standard calculation, that was two
> Web Years ago, so I thought I'd do it again and also touch on a couple of
> other web authoring topics.
> 
> Let me try to recap some of the discussion points that arose from my
> earlier post.  First and foremost, it is not a crime to write invalid HTML;
> the most you can say is that invalid HTML may display in unpredicted ways
> on some browsers.  Unfortunately, the browser in question may be the next
> version of one your readers commonly use, so the next big upgrade might do
> something nasty to your pages.  Current example: some HTML editors (notably
> FrontPage) let you include numeric entities for characters such as smart
> quotes or em- and en-dashes that are defined in the Windows character set
> but not in ISO Latin 1 or Unicode; Netscape 4.0b has taken to displaying
> all characters undefined in Latin 1 or Unicode as question marks:
> 
>     She looked at me and said, ?Hey?Didn?t you go to the 
>     University of Wisconsin?Madison??
> 
> Second, using a strict SGML validator to compare documents to HTML DTDs has
> a number of problems.  One important one is that HTML and URLs are written
> to different specifications, and an SGML validator will list as errors
> anchor tags with certain characters (specifically ampersands).  
> 
> Third, there is no HTML DTD which is really current with accepted practice.
>  The current standard--to the extent the term "standard" applies--is HTML
> 3.2, which was specifically written to describe popular browser behavior as
> of early- to mid-1996.  The experimental Cougar DTD has only recently been
> updated from nine months ago; I confess I haven't had a chance to look at
> it closely, although I do notice that it adds the FRAMESET element for the
> first time.  Anyone not specifically validating against the newest Cougar
> draft would generate errors for any Frames-based document.
> 
> Because of shortcomings with the available DTDs, there are times when it's
> perfectly defensible to write invalid HTML.  However, as with improvised
> jazz and abstract art, you should really know the rules before you set out
> to break them.  Also, if the rules you play by aren't written in a DTD,
> they aren't formally spelled out anywhere.
> 
> Conversely, however, validating HTML against a DTD can only turn up syntax
> errors.  You could write semantic gibberish (or at least reverse the WIDTH
> and HEIGHT attributes in an IMG tag): "<p>Colorless green ideas slept
> furiously.</p>" is perfectly valid.
> 
> So here's the state of validation on our home pages.  I expanded the number
> of pages checked over last time, and was also able to correctly validate
> against DTDs other than HTML 3.2 if specified in a DOCTYPE declaration.  A
> word to the wise: if your DOCTYPE is specifying HTML 2.0 or 3.0--or 2.1,
> whatever that is--you may want to check if that's really what you mean.  I
> took these documents at their word.
> 
> 
> VALIDATION STATS
> 
> Pages checked: 1114 (Libweb's listings for U.S. and Canadian libraries as
> of 4/24)
> Average/Median number of errors: 24 / 13 (compared to 20 and 13 in
> February)
> Number of pages with zero errors: 77, or 7% (compared to 4% in February)
> Number of pages with three or fewer errors: 236, or 21% (compared to 16% in
> Feb)
> Number of pages with 40 or more errors: 186, or 17% (14% in Feb)
> Number of pages with 80 or more errors: 57, or 5% (2% in Feb)
> 
> Number of pages that specified a DOCTYPE: 243, 22%
> Complete list is at <URL:http://gold.ohiolink.edu/tdowling/libpages/doctypes
> .html>
> 
> 
> HTML EDITOR STATS
> 
> Since I was looking at people's pages, I took the opportunity to see what
> HTML editors were identifying themselves in their pages.  Out of 1114
> pages, I could find 164 that identified an HTML editor.  None of these
> showed HoTMetaL or Hot Dog; do these programs identify themselves in the
> HTML source in any way?
> 
>   Netscape Gold       72
>   MS FrontPage        51
>   Adobe Pagemill      16
>   MS Word 97/
>     Publisher 97/
>     I'net Assistant    9
>   Claris Home Page     5
> 
> Complete list is at <URL:http://gold.ohiolink.edu/tdowling/libpages/generato
> rs.html>
> 
> 
> SERVER STATS
> 
> We're hearing from at least one database vendor that we should change our
> Web server from NCSA to either Apache or Netscape Enterprise.  That
> naturally made me curious about what other people were using:
> 
>   NCSA        25%
>   Apache      22%
>   Netscape    21%
>   CERN         6%
>   MS IIS       5%
>   WebStar      4%
>   WebSite      3%
>   OSU          3%
> 
> The complete list is at <URL:http://gold.ohiolink.edu/tdowling/libpages/serv
> ers.html>
> 
> Note that this is *very* different from the stats reported by Netcraft at
> <URL:http://www.netcraft.co.uk/Survey/> for the net as a whole, which shows
> Apache in the mid-40% range and IIS next at around 15%.
>   BTW, does anyone know why Netcraft no longer provides subtotals for the
> .edu domain?
> 
> 
> Thomas Dowling
> OhioLINK - Ohio Library and Information Network
> tdowling at ohiolink.edu
> 
> 


More information about the Web4lib mailing list