Proxy server help

Jim Green jfgreen at pilot.msu.edu
Mon Jul 6 09:21:59 EDT 1998


I am forwarding this question from the Jim LeBay, Project Leader of our
(university-wide) web group here in the Computer Center at Michigan State.
He has caught the assignment to set up some way of providing MSU-affiliated
users with remote access (i.e., access from IP addresses other than those
given to the vendors) to web-accessible resources (such as Lexis-Nexis,
ProQuest Direct, IAC SearchBank, etc.) licensed by the MSU Libraries, all of
which rely on IP authentication by the vendors.

Other institutions seem to be using "proxy servers" to accomplish this.  I
searched the Web4Lib archive and found some postings from mid-June pointing
to user documentation at Northwestern, Penn State, U. of California Irvine,
and U. of Wisconsin.  However, for the reasons stated below, Jim believes
that a "normal" proxy will not work for us.

Can anyone help Jim with the technical details of existing proxy
implementations like the ones listed above or similar?  Thanks in advance
for your help.

Respond directly to both of us, or to the list if appropriate:

Jim Green, Project Leader, Library Support Services, jfgreen at pilot.msu.edu
Jim C. Lebay, Project Leader, Web Group, lebay at pilot.msu.edu

>I need to do a similar thing for a different reason.  I have been
>considering both Apache's mod_proxy and a squid accelerator, but so far
>neither seems quite up to the task at hand.
>
>Scenario:  Our university has purchased access to a particular web-based
>database.  The license allows access to all 55,000 of our faculty, staff
>and students, but the vendor controls access by IP address range only.  In
>order to give our people access from off-campus machines, I want to use an
>HTTP gateway that authenticates the user, and "proxies" all traffic
>between the user's browser and the vendor's HTTP server.
>
>A normal HTTP proxy might work fine, if we had a closed network and a
>firewall, but we don't.  Reconfiguring every user's browser to use a
>full-time proxy is out of the question.  Even using an auto proxy would
>be a major problem.
>
>I want to tell our users that database XYZ is available on my server at
>"http://xyz.msu.edu".  I want my gateway to handle all requests, and fetch
>the corresponding data from "http://dbserv.xyz.com".  I don't want the
>user to ever access the vendor's server directly.
>
>Squid in accelerator mode handles part of the problem, but it doesn't do
>user authentication (AFAIK).  I'd prefer to use Apache, with our local
>auth module, and "ProxyPass / http://dbserv.xyz.com" seems to do the same
>job as a squid accelerator.
>
>But the major loophole in any such solution is that I have no control over
>the documents generated by the database vendor.  Any HTML file with an
>absolute URL would cause the browser to bypass the gateway and get "access
>denied" by the vendor's server.  I think I need to parse every HTML file,
>and rewrite every tag that might contain a URL (eg. "HREF=", "BACKGROUND=",
>"ACTION=", etc). This would be a slow and error-prone task.  And because a
>module cannot filter the output of a response handler (yet), I'll probably
>have to rewrite mod_proxy.
>
>Any suggestions?  Has anyone else dealt with a similar problem?  Can
>anyone think of a better way?
>
>Thanks.
>
>-----
>Jim C. LeBay  -  Michigan State University  -  Computer Laboratory
>   ------
>  "The day Microsoft makes something that doesn't suck is probably
>   the day they start making vacuum cleaners" - Ernst Jan Plugge
>
>
>



More information about the Web4lib mailing list