[WEB4LIB] Huge file delivery

Alnisa Allgood alnisa at nonprofit-tech.org
Fri Apr 29 13:52:00 EDT 2005


At 1:39 PM -0700 4/28/05, A. Bullen wrote:
>All--
>
>Forgive  a naive question, but I have never had to deal with the
>following situation and I don't know how to it off. We will be in
>receipt of a very large GPS data set consisting of files that total 1
>terabyte all together; I think the individual data sets are 20-30 GB a
>piece.
>
>Does anyone have a suggestion how I can successfully distribute files
>this large on an on-demand basis? I can put them on servers that share a
>T-3, but I am not sure FTP can handle this size and scope of file
>transmission.

Just a question is, will the users need access to the entire data 
set? or would it be reasonable to set-up an interface that allows 
them to pull data from the data set based on a query?  Also, I'm 
assuming that the data doesn't have any HIPAA limitations on it.

I ask, just because, you could possible do dual distribution. For 
those who actually need the full data set, they could download via 
FTP. And hopefully they also have a T3 connection, or they will be 
downloading for days. But you could place some basic access data next 
to the file links (assuming people will come in through the web) 
indicating approximate download times like 2hrs on a T3 connection, 
10hrs on a basic DSL connection, etc.

But for those who just need to pull a subset of the data, you could 
set-up a web interface for the data itself, that queries the data 
set, displays records to the web or saves as comma separated text 
file for download. Of course this is partially depended on the actual 
data format, but I know a number of formats like SAS, and SPSS, and 
like files that can be converted to access for web use.

If most people only need 1000 records out of 1 million or billion, 
then something like this might work.

Alnisa



More information about the Web4lib mailing list