HTML Hacking Scripts
Here are a few useful web-related programs I've written lately.
You might also see this
interesting program by Abigail.
Shaking Up the Web
- latro
- Latro finds idiotic PC sites open to perl.exe?FMH.pl
abuse and reports their little problem.
HTML Munging
- churl
- Extract URLs and verify validity;
currently only looks for FTP:, HTTP:, and FILE: schemata,
stored in A or IMG tags.
- striphtml
- Strip out all the html bits from a document, leaving (unformatted)
plain text in its wake.
- htdecom
- Strips out comments from an HTML document.
htitle
Retrieve the title from a URL.
URL Munging
- surl
- Given a list of URLs, sorts them by last-modified date.
- xurl
- Given one URL, extract all URLs it contains. Uses the LWP
library, and is pretty complete.
- qxurl
- Somewhat like xurl,
(means ``quick xurl'')
but expects to
read from files, not URLs, and doesn't canonicalize relative links.
It also runs about 100x faster and doesn't require an external library.
- reltree
Fix up a tree's URL to make them all relative instead of absolute.
Netscape Munging
- ggh
- Grovel global history. Search or dump out the netscape global history
history file.