Proxy Server vs. URL Rewriter for Authentication

Mon Jan 24 03:34:42 EST 2000

JQ Johnson provided good and insightful questions in his message regarding
proxy servers.  As the author of EZproxy, I've answered them from the
perspective of that program.

> My theoretical understanding is that the key advantage of such rewriting
> proxy servers is that they require no client configuration.  That means
> you can deploy them and count on them doing their job even when your
> patron is unprepared to set a proxy server configuration in her browser
> (for example, because he isn't technically competent, or because she is
> already behind a mandated corporate firewall proxy server).

No client configuration is indeed the main advantage.  Unfortunately, they
don't always work behind mandated corporate firewalls, since they may run
on alternative ports that the corporate firewall also blocks for outgoing
traffic.  This doesn't tend to be an issue with ISPs, but I have received
a report from one institution of this preventing access.

> On the other hand, rewriting proxy servers work by tricking the
> protocol, and so may not be successful in proxying complex web sites.  
> They almost certainly won't work if the site provides web pages that
> contain javascript-generated links, or links that appear in anything
> except HTML documents (PDF files can have hyperlinks from them too). 

I have to admit that I have learned a great deal about the true HTTP and
HTML specifications as a result of writing this program.  It's all there
in the spec, but how many of you really knew that an href value of
"//www.cnn.com/WORLD" without the http protocol header was a perfectly
valid way to say "transfer this file using the same protocol under which
the current document was loaded"?

JavaScript is indeed one thing that can throw these servers off.  The
Brown University method of using port numbers to represent database
servers is extremely clever in addressing the support of absolute URLs
such as href="/main/page.html".  EZproxy also has the option of allowing
anything that looks like a URL to be translated (e.g. something that
starts http:// and is followed by a host name that is to be proxied),
which has been needed for a small number of database vendors that generate
absolute URLs.  It wouldn't work if someone pieced things together such as
"http://" + "www.site.com".  From a practical standpoint, it hasn't been a
problem for any institution using EZproxy, but that is the type of thing
that could throw off this type of proxy server.

URLs that are contained in other formats such as PDF are likely to beyond
what one of these products can handle.  The only mitigating factors here
are that PDF files are unlikely to contain URLs back to protected content,
since they are most frequently static files.  Other content could pose a
problem in the future.

> Depending on the particular rewriting server, they may not even handle
> such simple constructs as URLs in (or to) cascading style sheets.  To
> handle domain-based cookies seems to require that the rewriting proxy
> server maintain a notion of a client session.

Both of these are definitely issues that must be factored into this type
of server, and both are handled by EZproxy.

> Any rewriting of web
> pages invalidates byte-range headers in a client request; do the
> rewriting proxy servers handle this gracefully?  Etc.

EZproxy discards these byte-range requests, as does the Brown solution.
The HTTP spec then requires that the client be prepared to deal with the
possibility that it may receive back entire content instead of just the
selection requested, which is why this works.  Admittedly, it doesn't
allow for graceful resumption of long transfers which many browsers are
capable of performing.

> The rewriting
> proxy servers that we might look at have generally been tested against
> the sites we care about (a few database providers); as a practical
> matter, do they handle all the special cases that are relevant to
> those sites, and can we count on those sites not using newfangled HTML
> constructs that the rewriting proxy doesn't understand?

The risks are clearly much greater in this area than any other for URL
rewriting proxy servers over traditional proxy servers.  In this instance,
the only thing that is likely to disrupt a functioning traditional proxy
server are plugins that communicate directly with the database vendor's
computer instead of operating through the browser environment. That
possibility is likely to break all proxy servers.

Now I must digress a bit.  There are those who like proxy solutions for
their convenience, and those who hate them vehemently since they cause all
of off-site user traffic to pass through your local network at least once,
and often twice (once if you have a caching server that is able to server
stored copies of the content from your database vendors' servers, but more
often twice since the content is dynamic).  In addition to proxy servers,
there are referer header parsing authentication schemes offered by some
vendors, and remote authentication verifications from some database
vendors where the database vendor collects authentication information,
then queries one of your servers to verify if it is valid.

Here's my version of what I would like to see evolve (even it if does rank
up there with world peace and a cure for cancer in "impossibility
factors"):

1. Ability to use the same web page (or at least what appears to be the
   same web page) for both on-site and remote users.

2. Authentication performed completely by institution-owned systems,
   not by having it collected by remote database vendor and verified by
   that system checking back (it really bothers me to share patron secrets
   with remote sites, especially if the information collected is enough to
   compromise the patron's records).

3. Authentication occurs in a consistent manner for all database access,
   and occurs only once in a user session unless some type of per search
   pricing strategy requires more safety for cost concerns.

4. No dependence on referer headers, since they can be stripped out by
   user choice, proxy servers at the user's ISP, etc., making them
   problematic at times.

5. No proxy server required, so all traffic is directed as efficiently
   as possible directly between the user's computer and the vendor's
   database.

6. Usage reporting data from the vendor to help justify database cost,
   preferrably with some means to break out remote access.

EZproxy does a reasonable job of the first 4, although I'd really like to
see a world where number 5 eliminated the need for these products.  Proxy
servers tend to help answer 6 when the vendor doesn't, so it would need to
be addressed if proxy servers were eliminated.  I know that MIT is working
towards such ends, using digital certificates as a means to authenticate
valid users to remote database providers.  I don't know how enthusiastic
database vendors would be about adopting such systems, since it can impact
competitive edge.  If remote access was actually easy, it might also drive
them to further increase pricing, although the difficulties they face in
distinguishing their products from the otherwise "free" content on the
Internet might keep them from trying this.

All remote access solutions involve authentication issues, local hardware
and software choices, database vendor issues, and remote user support, all
of which impact staffing and budget. There are a variety of solutions
available.  It takes some effort to analyze the issues and chart a course
of action based on your resources, but remote access is a cost-effective
way to extend your resources and further meet the needs of your very busy
users.

Chris

--------------------------------------------------------------------------
Chris Zagar, MALS                                         Useful Utilities
zagar at UsefulUtilities.com             PO Box 6371, Glendale, AZ 85312-6371
http://www.UsefulUtilities.com      FAX: (888) 282-9754 or +1.603.925.8961