NAME
HTML::Truncate - (beta software) truncate HTML by percentage or
character count while preserving well-formedness.
VERSION
0.20
ABSTRACT
When working with text it is common to want to truncate strings to make
them fit a desired context. E.g., you might have a menu that is only
100px wide and prefer text doesn't wrap so you'd truncate it around
15-30 characters, depending on preference and typeface size. This is
trivial with plain text using substr but with HTML it is somewhat
difficult because whitespace has fluid significance and open tags that
are not properly closed destroy well-formedness and can wreck an entire
layout.
HTML::Truncate attempts to account for those two problems by padding
truncation for spacing and entities and closing any tags that remain
open at the point of truncation.
SYNOPSIS
use strict;
use HTML::Truncate;
my $html = '
We have to test something.
';
my $readmore = '... [readmore]';
my $html_truncate = HTML::Truncate->new();
$html_truncate->chars(20);
$html_truncate->ellipsis($readmore);
print $html_truncate->truncate($html);
# or
use Encode;
my $ht = HTML::Truncate->new( utf8_mode => 1,
chars => 1_000,
);
print Encode::encode_utf8( $ht->truncate($html) );
XHTML
This module is designed to work with XHTML-style nested tags. More
below.
WHITESPACE AND ENTITIES
Repeated natural whitespace (i.e., "\s+" and not " ") in HTML --
with rare exception (pre tags or user defined styles) -- is not
meaningful. Therefore it is normalized when truncating. Entities are
also normalized. The following is only counted 14 chars long.
\n
are not supported and may cause
a fatal error. See "repair" for help with badly formed HTML.
Certain tags are omitted by default from the truncated output.
* Skipped tags
These will not be included in truncated output by default.
...
...
* Tags allowed to self-close
See emptyElement in HTML::Tagset.
add_skip_tags( qw( tag list ) )
Put one or more new tags into the list of those to be omitted from
truncated output. An example of when you might like to use this is
if you're thumb-nailing articles and they start with
"
title
" or such before the article body. The heading level
would be absurd with a list of excerpts so you could drop it
completely this way--
$ht->add_skip_tags( 'h1' );
dont_skip_tags( qw( tag list ) )
Takes tags out of the current list to be omitted from truncated
output.
repair
Set/get, true/false. If true, will attempt to repair unclosed HTML
tags by adding close-tags as late as possible (eg.
"foobar" becomes "foobar"). Unmatched close
tags are dropped ("foobar" becomes "foobar").
on_space
This will make the truncation back up to the first space it finds so
it doesn't truncate in the the middle of a word. "on_space" runs
before "cleanly" if both are set.
cleanly
Set/get -- a regular expression. This is on by default and the
default cleaning regular expression is
"cleanly(qr/[\s[:punct:]]+\z/)". It will make the truncation strip
any trailing spacing and punctuation so you don't get things like
"The End...." or "What? ..." You can cancel it with
"$ht->cleanly(undef)" or provide your own regular expression.
COOKBOOK (well, a recipe)
Template Toolkit filter
For excerpting HTML in your Templates. Note the "add_skip_tags" which is
set to drop any images from the truncated output.
use Template;
use HTML::Truncate;
my %config =
(
FILTERS => {
truncate_html => [ \&truncate_html_filter_factory, 1 ],
},
);
my $tt = Template->new(\%config) or die $Template::ERROR;
# ... etc ...
sub truncate_html_filter_factory {
my ( $context, $len, $ellipsis ) = @_;
$len = 32 unless $len;
$ellipsis = chr(8230) unless defined $ellipsis;
my $ht = HTML::Truncate->new();
$ht->add_skip_tags(qw( img ));
return sub {
my $html = shift || return '';
return $ht->truncate( $html, $len, $ellipsis );
}
}
Then in your templates you can do things like this:
[% FOR item IN search_results %]