[WEB4LIB] Re: Best/cheapest Tool for Counting URLs on a Web Page

Karen G. Schneider kgs at bluehighways.com
Sat Oct 12 22:48:33 EDT 2002


:I'm probably missing something here, but assuming they're your pages,
that
:you can get to the source files, that they live close together
:in the file system, and that they are formatted with some consistency,
why
:won't almost any good text editor or grep-like  text utility running
under
:the appropriate  OS do the job as cheaply and easily as anything else?
I
:regularly extract, unique, count, and check hundreds of thousands of
such
:elements by such means.
:
:If they were on a Wintel machine, for example, you could use an editor
:like TextPad to search in files "*.html" in binary form for strings
:matching a pattern like this  "<a [^<]*href="[^#][^>]+>"
:Likewise, if all you want is http links just search for the pattern
:="http:[^"]+"

What you're missing (which perhaps was not clear) is that the pages are
not local.  I think the advice to use special lynx commands will work
best for us, though if I run them through Linkscan I get the added value
of figuring out which URLs are dead anyway.  

----------------------------------------------
Karen G. Schneider kgs at lii.org  http://lii.org 
Director,    Librarians' Index to the Internet
lii.org  New This Week:     http://lii.org/ntw 
      lii.org: Information You Can Trust!
---------------------------------------------- 
 






More information about the Web4lib mailing list