[Web4lib] the efficacy of filtering software

Lori Ayre loriayre at gmail.com
Wed Jul 5 11:35:44 EDT 2006


Hi John,

I've done a lot of research on filters and written a report for Library
Technology Reports on my findings.  Your description of how they work is
accurate for the most part although the filters are getting a tad more
sophisticated than simply seeking out keywords.  They have algorithms now
that evaluate context and look at the ratio of certain words to other words,
the layout of the pages (in terms of images and other components of the
page).

Products like NetNanny are among the most simplistic of all products and
work just like you describe.  The higher end products don't rely simply on
the so-called black list (I call it a block list) but also do their analysis
on the fly so that each site a person accesses is actually evaluated.  This
makes is less likely that categories of sites you wish to block will in fact
be blocked (and of course increases the likelihood that something you don't
mean to block will be blocked).

In my initial analysis of error rates for software, I came up with the error
rate of 15%.  My study was small and unscientific but it did seem to hold
across the board even with what I considered the higher end products.  What
I mean by 15% is that they will be 85% accurate with the sites that are
blocked but 15% of the sites will be things you didn't want blocked.
Conversely, 15% of the items blocked are items that you did not intend to
block and they get blocked by the filter because they are coded incorrectly
by the company (in my opinion).

Different products have different agendas.  Some products are more
customizable by the user.  They can be useful in the right situation but
they are far from perfect.  And by no means, don't rely on the software
company's default settings.

For more on the filters I know something about, see
http://libraryfiltering.org.  In Links menu, I've provided links to studies
and to other reviews of filters.  I apologize that I haven't updated the
Links page in a while so it isn't as current is it should be.  For more
current info, check out my filtering category on Mentat (my blog):
http://www.galecia.com/weblog/mt/archives/cat_filtering.php.

Hope that helps!

Lori Ayre


On 7/3/06, John Fitzgibbon <jfitzgibbon at galwaylibrary.ie> wrote:
>
> Hi,
>
> I just wish to use the mailing list as a sounding board to test if my
> understanding of how filtering software works is correct.
>
> Filtering software companies use software to retrieve web pages as
> search engines do. The retrieved web page is scanned for blocked
> keywords. If the page contains any of the blocked keywords, the address
> of the web site is added to a list of blocked sites.
>
> This so-called black list is being updated all the time. The list is
> downloaded onto the user's PC from time to time and whenever an http
> request is made, this list is first checked to determine if the request
> will be permitted.
>
> Because the Web is vast and ever changing it is not feasible to keep the
> black list up to date
>
> Filtering software will fail to block a large number of pornographic
> sites.
>
> Filtering software may, therefore, give parents a false sense of
> security; it may give them a mistaken belief that all pornographic sites
> are blocked.
>
> Is my analysis correct? If a number of sexual swear words are entered
> into a search engine, what proportion of the sites returned is not
> blocked by the main filtering software? Would it be one site per ten
> i.e. one site on every result page would be accessible or would it be
> much less than this?
>
> Are there any studies on this?
>
> Is it the case that filtering software which is based on black lists can
> successfully block innocent children from stumbling across porn but do
> not stop people who are determinedly hunting for porn from finding it?
>
> I would welcome any feedback.
>
> Regards
> John
>
> John Fitzgibbon
>
> p: 00 353 91 562471
> f: 00 353 91 565039
> w: http://www.galwaylibrary.ie
>
> *******************************************************************
> Tá eolas atá príobháideach agus rúnda sa ríomhphost seo
> agus aon iatán a ghabhann leis agus is leis an duine/na daoine
> sin amháin a bhfuil siad seolta chucu a bhaineann siad.
> Mura seolaí thú, níl tú údaraithe an ríomhphost nó aon iatán
> a ghabhann leis a léamh, a chóipáil ná a úsáid.
> Má tá an ríomhphost seo faighte agat trí dhearmad,
> cuir an seoltóir ar an eolas thrí aischur ríomhphoist
> agus scrios ansin é le do thoil.
>
> This e-mail and any attachment contains information which is
> private and confidential and is intended for the addressee
> only. If you are not an addressee, you are not authorised
> to read, copy or use the e-mail or any attachment.
> If you have received this e-mail in error, please notify
> the sender by return e-mail and then destroy it.
> *********************************************************************
> _______________________________________________
> Web4lib mailing list
> Web4lib at webjunction.org
> http://lists.webjunction.org/web4lib/
>



-- 
============================
Lori Bowen Ayre (via gmail)

visit Mentat, my blog, at http://galecia.com/wegblog


More information about the Web4lib mailing list