Home | Trees | Indices | Help |
|
---|
|
object --+ | Transformer
Stream filter that can apply a variety of different transformations to a stream.
This is achieved by selecting the events to be transformed using XPath,
then applying the transformations to the events matched by the path
expression. Each marked event is in the form (mark, (kind, data, pos)),
where mark can be any of ENTER, INSIDE, EXIT, OUTSIDE, or None
.
The first three marks match START and END events, and any events contained INSIDE any selected XML/HTML element. A non-element match outside a START/END container (e.g. text()) will yield an OUTSIDE mark.
>>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>')
Transformations act on selected stream events matching an XPath expression. Here's an example of removing some markup (the title, in this case) selected by an expression:
>>> print html | Transformer('head/title').remove() <html><head/><body>Some <em>body</em> text.</body></html>
Inserted content can be passed in the form of a string, or a markup event stream, which includes streams generated programmatically via the builder module:
>>> from genshi.builder import tag >>> print html | Transformer('body').prepend(tag.h1('Document Title')) <html><head><title>Some Title</title></head><body><h1>Document Title</h1>Some <em>body</em> text.</body></html>
Each XPath expression determines the set of tags that will be acted upon by subsequent transformations. In this example we select the <title> text, copy it into a buffer, then select the <body> element and paste the copied text into the body as <h1> enclosed text:
>>> buffer = StreamBuffer() >>> print html | Transformer('head/title/text()').copy(buffer) \ ... .end().select('body').prepend(tag.h1(buffer)) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html>
Transformations can also be assigned and reused, although care must be taken when using buffers, to ensure that buffers are cleared between transforms:
>>> emphasis = Transformer('body//em').attr('class', 'emphasis') >>> print html | emphasis <html><head><title>Some Title</title></head><body>Some <em class="emphasis">body</em> text.</body></html>
Instance Methods | |||
|
|||
Stream |
|
||
|
|||
Inherited from |
|||
Selection operations | |||
---|---|---|---|
Transformer |
|
||
Transformer |
|
||
Transformer |
|
||
Deletion operations | |||
Transformer |
|
||
Transformer |
|
||
Direct element operations | |||
Transformer |
|
||
Transformer |
|
||
Content insertion operations | |||
Transformer |
|
||
Transformer |
|
||
Transformer |
|
||
Transformer |
|
||
Transformer |
|
||
Attribute manipulation | |||
Transformer |
|
||
Buffer operations | |||
Transformer |
|
||
Transformer |
|
||
|
|||
Miscellaneous operations | |||
Transformer |
|
||
Transformer |
|
||
Transformer |
|
||
|
|||
Transformer |
|
Properties | |
transforms | |
Inherited from |
Method Details |
|
|
Apply a transformation to the stream. Transformations can be chained, similar to stream filters. Any callable accepting a marked stream can be used as a transform. As an example, here is a simple TEXT event upper-casing transform: >>> def upper(stream): ... for mark, (kind, data, pos) in stream: ... if mark and kind is TEXT: ... yield mark, (kind, data.upper(), pos) ... else: ... yield mark, (kind, data, pos) >>> short_stream = HTML('<body>Some <em>test</em> text</body>') >>> print short_stream | Transformer('.//em/text()').apply(upper) <body>Some <em>TEST</em> text</body> |
Mark events matching the given XPath expression, within the current selection. >>> html = HTML('<body>Some <em>test</em> text</body>') >>> print html | Transformer().select('.//em').trace() (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0))) (None, ('TEXT', u'Some ', (None, 1, 6))) ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) ('INSIDE', ('TEXT', u'test', (None, 1, 15))) ('EXIT', ('END', QName(u'em'), (None, 1, 19))) (None, ('TEXT', u' text', (None, 1, 24))) (None, ('END', QName(u'body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
|
Invert selection so that marked events become unmarked, and vice versa. Specificaly, all marks are converted to null marks, and all null marks are converted to OUTSIDE marks. >>> html = HTML('<body>Some <em>test</em> text</body>') >>> print html | Transformer('//em').invert().trace() ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0))) ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) (None, ('START', (QName(u'em'), Attrs()), (None, 1, 11))) (None, ('TEXT', u'test', (None, 1, 15))) (None, ('END', QName(u'em'), (None, 1, 19))) ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
|
End current selection, allowing all events to be selected. Example: >>> html = HTML('<body>Some <em>test</em> text</body>') >>> print html | Transformer('//em').end().trace() ('OUTSIDE', ('START', (QName(u'body'), Attrs()), (None, 1, 0))) ('OUTSIDE', ('TEXT', u'Some ', (None, 1, 6))) ('OUTSIDE', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) ('OUTSIDE', ('TEXT', u'test', (None, 1, 15))) ('OUTSIDE', ('END', QName(u'em'), (None, 1, 19))) ('OUTSIDE', ('TEXT', u' text', (None, 1, 24))) ('OUTSIDE', ('END', QName(u'body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
|
Empty selected elements of all content. Example: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').empty() <html><head><title>Some Title</title></head><body>Some <em/> text.</body></html>
|
Remove selection from the stream. Example: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').remove() <html><head><title>Some Title</title></head><body>Some text.</body></html>
|
Remove outermost enclosing elements from selection. Example: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').unwrap() <html><head><title>Some Title</title></head><body>Some body text.</body></html>
|
Wrap selection in an element. >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').wrap('strong') <html><head><title>Some Title</title></head><body>Some <strong><em>body</em></strong> text.</body></html>
|
Replace selection with content. >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//title/text()').replace('New Title') <html><head><title>New Title</title></head><body>Some <em>body</em> text.</body></html>
|
Insert content before selection. In this example we insert the word 'emphasised' before the <em> opening tag: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').before('emphasised ') <html><head><title>Some Title</title></head><body>Some emphasised <em>body</em> text.</body></html>
|
Insert content after selection. Here, we insert some text after the </em> closing tag: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em').after(' rock') <html><head><title>Some Title</title></head><body>Some <em>body</em> rock text.</body></html>
|
Insert content after the ENTER event of the selection. Inserting some new text at the start of the <body>: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//body').prepend('Some new body text. ') <html><head><title>Some Title</title></head><body>Some new body text. Some <em>body</em> text.</body></html>
|
Insert content before the END event of the selection. >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//body').append(' Some new body text.') <html><head><title>Some Title</title></head><body>Some <em>body</em> text. Some new body text.</body></html>
|
Add, replace or delete an attribute on selected elements. If >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em class="before">body</em> <em>text</em>.</body>' ... '</html>') >>> print html | Transformer('body/em').attr('class', None) <html><head><title>Some Title</title></head><body>Some <em>body</em> <em>text</em>.</body></html> Otherwise the attribute will be set to >>> print html | Transformer('body/em').attr('class', 'emphasis') <html><head><title>Some Title</title></head><body>Some <em class="emphasis">body</em> <em class="emphasis">text</em>.</body></html> If >>> def print_attr(name, event): ... attrs = event[1][1] ... print attrs ... return attrs.get(name) >>> print html | Transformer('body/em').attr('class', print_attr) Attrs([(QName(u'class'), u'before')]) Attrs() <html><head><title>Some Title</title></head><body>Some <em class="before">body</em> <em>text</em>.</body></html>
|
Copy selection into buffer. The buffer is replaced by each contiguous selection before being passed to the next transformation. If accumulate=True, further selections will be appended to the buffer rather than replacing it. >>> from genshi.builder import tag >>> buffer = StreamBuffer() >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('title/text()').copy(buffer) \ ... .end().select('body').prepend(tag.h1(buffer)) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html> This example illustrates that only a single contiguous selection will be buffered: >>> print html | Transformer('head/title/text()').copy(buffer) \ ... .end().select('body/em').copy(buffer).end().select('body') \ ... .prepend(tag.h1(buffer)) <html><head><title>Some Title</title></head><body><h1>Some Title</h1>Some <em>body</em> text.</body></html> >>> print buffer <em>body</em> Element attributes can also be copied for later use: >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body><em>Some</em> <em class="before">body</em>' ... '<em>text</em>.</body></html>') >>> buffer = StreamBuffer() >>> def apply_attr(name, entry): ... return list(buffer)[0][1][1].get('class') >>> print html | Transformer('body/em[@class]/@class').copy(buffer) \ ... .end().buffer().select('body/em[not(@class)]') \ ... .attr('class', apply_attr) <html><head><title>Some Title</title></head><body><em class="before">Some</em> <em class="before">body</em><em class="before">text</em>.</body></html>
|
Copy selection into buffer and remove the selection from the stream. >>> from genshi.builder import tag >>> buffer = StreamBuffer() >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('.//em/text()').cut(buffer) \ ... .end().select('.//em').after(tag.h1(buffer)) <html><head><title>Some Title</title></head><body>Some <em/><h1>body</h1> text.</body></html> Specifying accumulate=True, appends all selected intervals onto the buffer. Combining this with the .buffer() operation allows us operate on all copied events rather than per-segment. See the documentation on buffer() for more information.
Note: this transformation will buffer the entire input stream |
Buffer the entire stream (can consume a considerable amount of memory). Useful in conjunction with copy(accumulate=True) and cut(accumulate=True) to ensure that all marked events in the entire stream are copied to the buffer before further transformations are applied. For example, to move all <note> elements inside a <notes> tag at the top of the document: >>> doc = HTML('<doc><notes></notes><body>Some <note>one</note> ' ... 'text <note>two</note>.</body></doc>') >>> buffer = StreamBuffer() >>> print doc | Transformer('body/note').cut(buffer, accumulate=True) \ ... .end().buffer().select('notes').prepend(buffer) <doc><notes><note>one</note><note>two</note></notes><body>Some text .</body></doc> |
Apply a normal stream filter to the selection. The filter is called once for each contiguous block of marked events. >>> from genshi.filters.html import HTMLSanitizer >>> html = HTML('<html><body>Some text<script>alert(document.cookie)' ... '</script> and some more text</body></html>') >>> print html | Transformer('body/*').filter(HTMLSanitizer()) <html><body>Some text and some more text</body></html>
|
Applies a function to the data element of events of kind in the selection. >>> html = HTML('<html><head><title>Some Title</title></head>' ... '<body>Some <em>body</em> text.</body></html>') >>> print html | Transformer('head/title').map(unicode.upper, TEXT) <html><head><title>SOME TITLE</title></head><body>Some <em>body</em> text.</body></html>
|
Replace text matching a regular expression. Refer to the documentation for re.sub() for details. >>> html = HTML('<html><body>Some text, some more text and ' ... '<b>some bold text</b>\n' ... '<i>some italicised text</i></body></html>') >>> print html | Transformer('body/b').substitute('(?i)some', 'SOME') <html><body>Some text, some more text and <b>SOME bold text</b> <i>some italicised text</i></body></html> >>> tags = tag.html(tag.body('Some text, some more text and\n', ... Markup('<b>some bold text</b>'))) >>> print tags.generate() | Transformer('body').substitute( ... '(?i)some', 'SOME') <html><body>SOME text, some more text and <b>SOME bold text</b></body></html>
|
Print events as they pass through the transform. >>> html = HTML('<body>Some <em>test</em> text</body>') >>> print html | Transformer('em').trace() (None, ('START', (QName(u'body'), Attrs()), (None, 1, 0))) (None, ('TEXT', u'Some ', (None, 1, 6))) ('ENTER', ('START', (QName(u'em'), Attrs()), (None, 1, 11))) ('INSIDE', ('TEXT', u'test', (None, 1, 15))) ('EXIT', ('END', QName(u'em'), (None, 1, 19))) (None, ('TEXT', u' text', (None, 1, 24))) (None, ('END', QName(u'body'), (None, 1, 29))) <body>Some <em>test</em> text</body>
|
Home | Trees | Indices | Help |
|
---|
Generated by Epydoc 3.0.1 on Mon Jun 9 12:25:07 2008 | http://epydoc.sourceforge.net |