NAME

    Catmandu::Importer::PDF - Catmandu importer to extract data from one
    pdf

SYNOPSIS

        #From the command line
    
        #Export pdf information, and text
    
        $ catmandu convert PDF --file input.pdf to YAML
    
        #In a script
    
        use Catmandu::Sane;
    
        use Catmandu::Importer::PDF;
    
        my $importer = Catmandu::Importer::PDF->new( file => "/tmp/input.pdf" );
    
        $importer->each(sub{
    
            my $pdf = $_[0];
            #..
    
        });

EXAMPLE OUTPUT IN YAML

        document:
          author: ~
          creation_date: 1207274644
          creator: PDFplus
          keywords: ~
          metadata: ~
          modification_date: 1421574847
          producer: "Nobody at all"
          subject: ~
          title: "Hello there"
          version: PDF-1.6
        pages:
        - label: Cover Page
          height: 878
          width: 595
          text: "Hello world"

INSTALLATION

    In order to install this package you need the following system packages
    installed

    Centos

      Requires Centos 7 at minimum. Centos 6 only has poppler-glib 0.12.

      * perl-devel

      * make

      * gcc

      * gcc-c++

      * libyaml-devel

      * libyaml

      * poppler-glib ( >= 0.16 )

      * poppler-glib-devel ( >= 0.16 )

      * gobject-introspection-devel

    Ubuntu

      Requires Ubuntu 14 at minimum.

      * libpoppler-glib8

      * libpoppler-glib-dev

      * gobject-introspection

      * libgirepository1.0-dev

NOTES

    * Catmandu::Importer::PDF returns one record, containing both document
    information, and page text

    * Catmandu::Importer::PDFPages returns multiple records, each for each
    page

    * Catmandu::Importer::PDFInfo returns one record, containing document
    information

KNOWN ISSUES

    * Due to a bug in older versions of Poppler (bug #94173), the
    creation_date and modification_date can be returned in local time,
    instead of utc.

    * Some versions of Poppler add form feeds and newlines to a text line,
    while others don't.

AUTHORS

    Nicolas Franck <nicolas.franck at ugent.be>

SEE ALSO

    Catmandu::Importer::PDFInfo, Catmandu::Importer::PDFPages, Catmandu,
    Catmandu::Importer , Poppler