Fwd: [WEB4LIB] Huge file delivery

Jeremy Dunck jdunck at gmail.com
Thu Apr 28 19:55:30 EDT 2005


Forgot to reply-all.

---------- Forwarded message ----------
From: Jeremy Dunck <jdunck at gmail.com>
Date: Apr 28, 2005 4:01 PM
Subject: Re: [WEB4LIB] Huge file delivery
To: abullen at ameritech.net


On 4/28/05, A. Bullen <abullen at ameritech.net> wrote:
> All--
>
> Forgive  a naive question, but I have never had to deal with the
> following situation and I don't know how to it off. We will be in
> receipt of a very large GPS data set consisting of files that total 1
> terabyte all together; I think the individual data sets are 20-30 GB a
> piece.
>
> Does anyone have a suggestion how I can successfully distribute files
> this large on an on-demand basis? I can put them on servers that share a
> T-3, but I am not sure FTP can handle this size and scope of file
> transmission.

One of your problems will be successful transmission.  For this,
BitTorrent is a good option, because it verifies chunks.  It's also
nice because the more downloaders you have, the more bandwidth is
available to other downloaders.

BitTorrent has been around a while and it's been somewhat difficult to
set up and maintain a tracker, but BlogTorrent is changing that:
http://www.blogtorrent.com/

If you want a simpler solution, you might want to use Coral CDN, which
is a transparent mirroring system.

If you want to mirror http://example.com/your/path/, you'd publish
your content as http://example.com.nyud.net:8090/your/path.

That's all there is to it.  More info here:
http://www.coralcdn.org/

Coral does nothing for validation of transmission, though, so your
users will have failed downloads, and you'll have higher bills.  I
think you're right that simple HTTP/FTP will have trouble.

As a data point, I am currently downloading (over HTTP) the data for
en.wikipedia.org, which is about 30 GB.  Lots of web utilities (wget
for example) just don't know what to do with that.

Yet another idea is Jigdo, which is how Debian distributes their
software over the web.
http://atterer.net/jigdo/
That takes care of the transmission problem, but it's a bit fiddly and techie.

That's what I can think of...




More information about the Web4lib mailing list