[WEB4LIB] Re: Best/cheapest Tool for Counting URLs on a Web Page
Karen G. Schneider
kgs at bluehighways.com
Sat Oct 12 22:48:33 EDT 2002
:I'm probably missing something here, but assuming they're your pages,
that
:you can get to the source files, that they live close together
:in the file system, and that they are formatted with some consistency,
why
:won't almost any good text editor or grep-like text utility running
under
:the appropriate OS do the job as cheaply and easily as anything else?
I
:regularly extract, unique, count, and check hundreds of thousands of
such
:elements by such means.
:
:If they were on a Wintel machine, for example, you could use an editor
:like TextPad to search in files "*.html" in binary form for strings
:matching a pattern like this "<a [^<]*href="[^#][^>]+>"
:Likewise, if all you want is http links just search for the pattern
:="http:[^"]+"
What you're missing (which perhaps was not clear) is that the pages are
not local. I think the advice to use special lynx commands will work
best for us, though if I run them through Linkscan I get the added value
of figuring out which URLs are dead anyway.
----------------------------------------------
Karen G. Schneider kgs at lii.org http://lii.org
Director, Librarians' Index to the Internet
lii.org New This Week: http://lii.org/ntw
lii.org: Information You Can Trust!
----------------------------------------------
More information about the Web4lib
mailing list