NAME

    Text::TokenStream - lexer to break text up into user-defined tokens

SYNOPSIS

        my $lexer = Text::TokenStream::Lexer->new(
            whitespace => [qr/\s+/],
            rules => [
                word => qr/\w+/,
                sym => qr/[^\w\s]+/,
            ],
        );
    
        my $stream = Text::TokenStream->new(
            lexer => $lexer,
            input => "foo *",
        );
    
        my $tok1 = $stream->next; # --> "word" token containing "foo"
        my $tok2 = $stream->next; # --> "sym" token containing "*"

DESCRIPTION

    This class is part of a collection of classes that act together to lex
    (aka scan) an input text into a stream of tokens.

    This token stream class provides the stream interface, along with a
    notion of the "current position" in the input text, and position-aware
    error reporting. It composes Text::TokenStream::Role::Stream; that role
    lists the methods this class provides (so that you can easily write a
    parser class that has a token stream which in turn handles the
    tokenizer methods).

    The basic lexer machinery is found in Text::TokenStream::Lexer; it is
    separated out from the token stream so that it can be reused across
    many inputs.

    Tokens are instances of a class, Text::TokenStream::Token by default.

CONSTRUCTOR

    This class uses Moo, and inherits the standard new constructor.

ATTRIBUTES

 lexer

    An instance of Text::TokenStream::Lexer; required; read-only. Will be
    used to find tokens in the input.

 input

    Str; required; read-only. The text that will be lexed into a stream of
    tokens.

 input_name

    A Maybe[Path]; read-only. Can be coerced from a string. If a defined
    value is present, it should contain the name of the file that the input
    was read from, and that name will be used in any error messages.

 token_class

    The name of a class that inherits from Text::TokenStream::Token;
    defaults to Text::TokenStream::Token itself; read-only. Tokens found in
    the input will be constructed as instances of this class.

OTHER METHODS

 collect_all

    Takes no arguments. Returns a list of all remaining tokens found in the
    input.

    In the current implementation, this method is provided by
    Text::TokenStream::Role::Stream.

 collect_upto

    Takes a single argument indicating a token to match, as with
    Text::TokenStream::Token#matches. Scans through the input until it
    finds a token that matches the argument, and returns a list of all
    tokens before the matching one. If no remaining token in the input
    matches the argument, behaves as "collect_all".

    In the current implementation, this method is provided by
    Text::TokenStream::Role::Stream.

 create_token

    Takes a listified hash of token attributes, and creates a token
    instance. The token object is created by calling:

        $self->token_class->new(%data);

    If you have particularly complex needs, you may wish to override this
    method in a subclass.

 current_position

    Takes no arguments. Returns the 0-based position of the first input
    character that hasn't yet been returned by "next".

 err

    Takes multiple arguments, that are concatenated into an error message.
    (If no arguments are supplied, acts as if you'd supplied the string
    "Something's wrong".) Throws an exception, reporting the locus of the
    error as the current input position (using 1-based line and column
    numbers).

 fill

    Takes a single positive-integer argument. Attempts to fill an internal
    buffer of already-lexed tokens so that it contains that many tokens.
    Returns a boolean that is true iff there were enough tokens to do that.

 looking_at

    Takes zero or more arguments, each of which indicates a token to match,
    as with Text::TokenStream::Token#matches. Returns a boolean that is
    true iff there's at least one more token in the input, and it matches
    the argument.

 next

    Takes no arguments. Returns the next token found in the input, and
    advances the current position past it; if no tokens remain, returns
    undef. The token instance is created by "create_token".

 next_of

    Takes a single argument indicating a token to match, as with
    Text::TokenStream::Token#matches, and an optional string argument
    describing the current position (for example, "in expression", or
    "after keyword"). If there are no more tokens in the input, reports an
    error at the current position, using "err". Otherwise, if the next
    token doesn't match the argument, reports an error at the position of
    that token, using "token_err". Otherwise, the next token matches what
    is being looked for, so that token is returned.

 peek

    Takes no arguments. Returns the next token that would be returned by
    "next", but doesn't advance the current input position, and a
    subsequent "next" call will return the same token.

    An internal buffer is used to ensure that every token is lexed only
    once.

 skip_optional

    Takes a single argument indicating a token to match, as with
    Text::TokenStream::Token#matches. If there are no more tokens in the
    input, or the next token doesn't match the argument, returns false;
    otherwise, advances past the next token, and returns true.

 token_err

    Takes a token as an argument, followed by multiple arguments that are
    concatenated into an error message. (If no non-token arguments are
    supplied, acts as if you'd supplied the string "Something's wrong".)
    Throws an exception, reporting the locus of the error as the position
    of the token (using 1-based line and column numbers).

AUTHOR

    Aaron Crane, <arc@cpan.org>

COPYRIGHT

    Copyright 2021 Aaron Crane.

LICENCE

    This library is free software and may be distributed under the same
    terms as perl itself. See http://dev.perl.org/licenses/.