NAME
XML::Driver::HTML - SAX Driver for non wellformed HTML.
SYNOPSIS
use XML::Driver::HTML;
$driver = new XML::Driver::HTML(
'Handler' => $some_sax_filter_or_handler,
'Source' => $some_PerlSAX_like_hash
);
$driver->parse();
or
use XML::Driver::HTML;
$driver = new XML::Driver::HTML();
$driver->parse(
'Handler' => $some_sax_filter_or_handler,
'Source' => $some_PerlSAX_like_hash
);
$driver->parse(
'Handler' => $some_other_sax_filter_or_handler,
'Source' => $some_other_source
);
DESCRIPTION
XML::Driver::HTML is a SAX Driver for HTML. There is no
need for the HTML input to be weel formed, as
XML::Driver::HTML is generating its SAX events by walking
a HTML::TreeBuilder object. The simplest kind of use, is a
filter from HTML to XHTML using XML::Handler::YAWriter as
a SAX Handler.
my $ya = new XML::Handler::YAWriter(
'Output' => new IO::File ( ">-" ),
'Pretty' => {
'NoWhiteSpace'=>1,
'NoComments'=>1,
'AddHiddenNewline'=>1,
'AddHiddenAttrTab'=>1,
}
);
my $html = new XML::Driver::HTML(
'Handler' => $ya,
'Source' => { 'ByteStream' => new IO::File ( "<-" ) }
);
$html->parse();
METHODS
new Creates a new XML::Driver::HTML object. Default
options for parsing, described below, are passed as
key-value pairs or as a single hash. Options may be
changed directly in the object.
parse
Parses a document. Options, described below, are
passed as key-value pairs or as a single hash.
Options passed to parse() override the default options
in the parser object for the duration of the parse.
OPTIONS
The following options are supported by XML::Driver::HTML :
Handler
Default SAX Handler to receive events
Source
Hash containing the input source for parsing. The
`Source' hash may contain the following parameters:
ByteStream
The raw byte stream (file handle) containing the
document.
String
A string containing the document.
SystemId
The system identifier (URL) of the document.
Encoding
A string describing the character encoding.
If more than one of `ByteStream', `String', or
`SystemId', then preference is given first to
`ByteStream', then `String', then `SystemId'.
NOTES
XML::Driver::HTML requires Perl 5.6 to convert from
ISO-8859-1 to UTF-8.
BUGS
not yet implemented:
Interpretation of SystemId as being an URI
XHTML document type
other bugs:
HTML::Parser and HTML::TreeBuilder bugs concerning DOCTYPE and CSS.
Perl handling of UFT8 is compatible between different versions. So
you need exactly Perl 5.6.0, not lower not higher.
AUTHOR
Michael Koehne, Kraehe@Copyleft.De
(c) 2001 GNU General Public License
SEE ALSO
the XML::Parser::PerlSAX manpage and the HTML::TreeBuilder
manpage