[Web4lib] http batch downloader that will submit cookies and post data

John Fitzgibbon jfitzgibbon at Galwaylibrary.ie
Tue Aug 25 06:50:56 EDT 2009


Hi,

In the past, I have used a batch downloader like Download Accelerator Plus to download a number of web pages. Each web page has an URL with a different query string. For example, if I wish to download files

http://www.somesite.com?age=1<http://www.somesite.com/?age=1>
http://www.somesite.com?age=2<http://www.somesite.com/?age=2>
...
http://www.somesite.com?age=100<http://www.somesite.com/?age=100>.

I can easily create a text file of such URLs and point the downloader at this file. The downloader, then, downloads each page, in turn, into a folder. I copy the resulting HTML files into one file and convert it into XML to extract the information I need.

This will not work if the site requires a cookie to be submitted each time. None of the downloaders I have tried will submit a cookie. Is there a downloader that will do this?

Secondly, if the page uses the POST method rather than the GET method to submit data to the server, specifying a file of URLs will not suffice; are there downloaders out there that can POST data to a web server in batch download mode.

I would appreciate any suggestions.


Regards,

John



John Fitzgibbon



w: www.galwaylibrary.ie

e: info at galwaylibrary.ie

p: 00 353 91 562471

f: 00 353 91 565039


#####################################################################################
This e-mail message has been scanned for  content and cleared 
by MailMarshal Hosted  at Galway County  Council

Tá an teachtaireacht ríomhphoist seo scanáilte d’Ábhar agus glanta 
ag MailMarshal atá Óstálta i gComhairle Chontae na Gaillimhe.

Correspondance is welcome in Irish or in English.
Tá míle fáilte roimh chomhfhreagras i nGaeilge nó i mBéarla.

Tá eolas atá príobháideach agus rúnda sa ríomhphost seo 
agus aon iatán a ghabhann leis agus is leis an duine/na daoine
sin amháin a bhfuil siad seolta chucu a bhaineann siad. 
Mura seolaí thú, níl tú údaraithe an ríomhphost nó aon iatán 
a ghabhann leis a léamh, a chóipáil ná a úsáid. 
Má tá an ríomhphost seo faighte agat trí dhearmad, 
cuir an seoltóir ar an eolas thrí aischur ríomhphoist 
agus scrios ansin é le do thoil. 
 
This e-mail and any attachment contains information which is 
private and confidential and is intended for the addressee 
only. If you are not an addressee, you are not authorised 
to read, copy or use the e-mail or any attachment. 
If you have received this e-mail in error, please notify 
the sender by return e-mail and then destroy it. 

If you need this email in an alternative format please contact the sender
Má tá an ríomhphost seo ag teastáil uait i bhformáid eile déan teagmháil leis an duine a sheol chugat é

#####################################################################################


More information about the Web4lib mailing list