=pod
=head1 NAME
Data::Walk::Extracted - An extracted dataref walker
=begin html
=end html
=head1 SYNOPSIS
This is a contrived example! For a more functional (complex/useful) example see the
roles in this package.
package Data::Walk::MyRole;
use Moose::Role;
requires '_process_the_data';
use MooseX::Types::Moose qw(
Str
ArrayRef
HashRef
);
my $mangle_keys = {
Hello_ref => 'primary_ref',
World_ref => 'secondary_ref',
};
#########1 Public Method 3#########4#########5#########6#########7#########8
sub mangle_data{
my ( $self, $passed_ref ) = @_;
@$passed_ref{ 'before_method', 'after_method' } =
( '_mangle_data_before_method', '_mangle_data_after_method' );
### Start recursive parsing
$passed_ref = $self->_process_the_data( $passed_ref, $mangle_keys );
### End recursive parsing with: $passed_ref
return $passed_ref->{Hello_ref};
}
#########1 Private Methods 3#########4#########5#########6#########7#########8
### If you are at the string level merge the two references
sub _mangle_data_before_method{
my ( $self, $passed_ref ) = @_;
if(
is_Str( $passed_ref->{primary_ref} ) and
is_Str( $passed_ref->{secondary_ref} ) ){
$passed_ref->{primary_ref} .= " " . $passed_ref->{secondary_ref};
}
return $passed_ref;
}
### Strip the reference layers on the way out
sub _mangle_data_after_method{
my ( $self, $passed_ref ) = @_;
if( is_ArrayRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->[0];
}elsif( is_HashRef( $passed_ref->{primary_ref} ) ){
$passed_ref->{primary_ref} = $passed_ref->{primary_ref}->{level};
}
return $passed_ref;
}
package main;
use MooseX::ShortCut::BuildInstance qw( build_instance );
my $AT_ST = build_instance(
package => 'Greeting',
superclasses => [ 'Data::Walk::Extracted' ],
roles => [ 'Data::Walk::MyRole' ],
);
print $AT_ST->mangle_data( {
Hello_ref =>{ level =>[ { level =>[ 'Hello' ] } ] },
World_ref =>{ level =>[ { level =>[ 'World' ] } ] },
} ) . "\n";
#################################################################################
# Output of SYNOPSIS
# 01:Hello World
#################################################################################
=head1 DESCRIPTION
This module takes a data reference (or two) and
L
travels through it(them). Where the two references diverge the walker follows the
primary data reference. At the L
and L of each branch or L
in the data the code will attempt to call a L
on the remaining unparsed data.
=head2 Acknowledgement of MJD
This is an implementation of the concept of extracted data walking from
L Chapter 1 by
L. I With that said I diverged from MJD purity in two ways. This is object oriented
code not functional code. Second, when taking action the code will search for class
methods provided by (your) role rather than acting on passed closures. There is clearly
some overhead associated with both of these differences. I made those choices consciously
and if that upsets you L!
=head2 What is the unique value of this module?
With the recursive part of data walking extracted the various functionalities desired
when walking the data can be modularized without copying this code. The Moose
framework also allows diverse and targeted data parsing without dragging along a
L API
for every use of this class.
=head2 Extending Data::Walk::Extracted
B. It usually also makes sense to build an
initial action method as well. The initial action method can do any data-preprocessing
that is useful as well as providing the necessary set up for the generic walker. All
of these elements can be combined with this class using a L, by
L, or it can be
joined to the class at run time. See L. or L for more class building information. See the
L to understand the details of how the methods are
used. See L for the available
methods to implement the roles.
Then, L
=head1 Recursive Parsing Flow
=head2 Initial data input and scrubbing
The primary input method added to this class for external use is refered to as
the 'action' method (ex. 'mangle_data'). This action method needs to receive
data and organize it for sending to the L for the generic data walker.
I
=head2 Assess and implement the before_method
The class next checks for an available 'before_method'. Using the test;
exists $passed_ref->{before_method};
If the test passes then the next sequence is run.
$method = $passed_ref->{before_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'before_method' then the recursive parser will
parse the new ref and not the old one. The before_method can set;
$passed_ref->{skip} = 'YES'
Then the flow checks for the need to investigate deeper.
=head2 Test for deeper investigation
The code now checks if deeper investigation is required checking both that the 'skip' key
= 'YES' in the $passed_ref or if the node is a L.
If either case is true the process jumps to the L otherwise it begins to investigate the next
level.
=head2 Identify node elements
If the next level in is not skipped then a list is generated for all L
in the node. For example a 'HASH' node would generate a list of hash keys for that node.
SCALAR nodes will generate a list with only one element containing the scalar contents.
UNDEF nodes will generate an empty list.
=head2 Sort the node as required
If the list L
then the list is sorted. B I
=head2 Process each element
For each identified element of the node a new $data_ref is generated containing data that
represents just that sub element. The secondary_ref is only constructed if it has a
matching type and element to the primary ref. Matching for hashrefs is done by key
matching only. Matching for arrayrefs is done by position exists testing only. I Scalars are matched on content. The list of items
generated for this element is as follows;
=over
B> --Ename of before method for this role hereE--
B> --Ename of after method for this role hereE--
B> the piece of the primary data ref below this element
B> the lower primary (walker)
L[
B> YES|NO (This indicates if the secondary ref meets matching critera)
B> YES|NO Checks L against
the lower primary_ref node. This can also be set in the 'before_method' upon arrival
at that node.
B> if match eq 'YES' then built like the primary ref
B> if match eq 'YES' then calculated like the primary type
B> L
=back
=head2 A position trace is generated
The current node list position is then documented and pushed onto the array at
$passed_ref->{branch_ref}. The array reference stored in branch_ref can be
thought of as the stack trace that documents the node elements directly between the
current position and the initial (or zeroth) level of the parsed primary data_ref.
Past completed branches and future pending branches are not maintained. Each element
of the branch_ref contains four positions used to describe the node and selections
used to traverse that node level. The values in each sub position are;
[
ref_type, #The node reference type
the list item value or '' for ARRAYs,
#key name for hashes, scalar value for scalars
element sequence position (from 0),
#For hashes this is only relevent if sort_HASH is called
level of the node (from 0),
`#The zeroth level is the initial data ref
]
=head2 Going deeper in the data
The down level ref is then passed as a new data set to be parsed and it starts
at the L again.
=head2 Actions on return from recursion
When the values are returned from the recursion call the last branch_ref element is
Led off and the returned data ref
is used to L the sub elements of the primary_ref and secondary_ref
associated with that list element in the current level of the $passed_ref. If there are
still pending items in the node element list then the program L
=head2 Assess and implement the after_method
After the node elements have all been processed the class checks for an available
'after_method' using the test;
exists $passed_ref->{after_method};
If the test passes then the following sequence is run.
$method = $passed_ref->{after_method};
$passed_ref = $self->$method( $passed_ref );
If the $passed_ref is modified by the 'after_method' then the recursive parser will
parse the new ref and not the old one.
=head2 Go up
The updated $passed_ref is passed back up to the L.
=head1 Attributes
Data passed to -Enew when creating an instance. For modification of these attributes
see L. The -Enew function will either accept fat
comma lists or a complete hash ref that has the possible attributes as the top keys.
Additionally some attributes that have the following prefixed methods; get_$name, set_$name,
clear_$name, and has_$name can be passed to L<_process_the_data
|/_process_the_data( $passed_ref, $conversion_ref )> and will be adjusted for just the
run of that method call. These are called L
attributes. Nested calls to _process_the_data will be tracked and the attribute will
remain in force until the parser returns to the calling 'one shot' level. Previous
attribute values are restored after the 'one shot' attribute value expires.
=head2 sorted_nodes
=over
B If the primary_type of the L<$element_ref|/Process each element>
is a key in this attribute hash ref then the node L] is
sorted. If the value of that key is a CODEREF then the sort L function will called as follows.
@node_list = sort $coderef @node_list
I
B {} #Nothing is sorted
B This accepts a HashRef.
B
sorted_nodes =>{
ARRAY => 1,#Will sort the primary_ref only
HASH => sub{ $b cmp $a }, #reverse sort the keys
}
=back
=head2 skipped_nodes
=over
B If the primary_type of the L<$element_ref|/Process each element>
is a key in this attribute hash ref then the 'before_method' and 'after_method' are
run at that node but no L is done.
B {} #Nothing is skipped
B This accepts a HashRef.
B
sorted_nodes =>{
OBJECT => 1,#skips all object nodes
}
=back
=head2 skip_level
=over
B This attribute is set to skip (or not) node parsing at the set level.
Because the process doesn't start checking until after it enters the data ref
it effectivly ignores a skip_level set to 0 (The base node level). I
array ref + 1>.
B undef = Nothing is skipped
B This accepts an integer
=back
=head2 skip_node_tests
=over
B This attribute contains a list of test conditions used to skip
certain targeted nodes. The test can target an array position, match a hash key, even
restrict the test to only one level. The test is run against the latest
L element so it skips the node below the
matching conditions not the node at the matching conditions. Matching is done with
'=~' and so will accept a regex or a string. The attribute contains an ArrayRef of
ArrayRefs. Each sub_ref contains the following;
=over
B<$type> - This is any of the L
reference node types
B<$key> - This is either a scalar or regex to use for matching a hash key
B<$position> - This is used to match an array position. It can be an integer or 'ANY'
B<$level> - This restricts the skipping test usage to a specific level only or 'ANY'
=back
B
[
[ 'HASH', 'KeyWord', 'ANY', 'ANY'],
# Skip the node below the value of any hash key eq 'Keyword'
[ 'ARRAY', 'ANY', '3', '4'], ],
# Skip the node stored in arrays at position three on level four
]
B An infinite number of skip tests added to an array
B [] = no nodes are skipped
=back
=head2 change_array_size
=over
B This attribute will not be used by this class directly. However
the L
role may share it with other roles in the future so it is placed here so there will be
no conflicts. This is usually used to define whether an array size shinks when an element
is removed.
B 1 (This probably means that the array will shrink when a position is removed)
B Boolean values.
=back
=head2 fixed_primary
=over
B This means that no changes made at lower levels will be passed
upwards into the final ref.
B 0 = The primary ref is not fixed (and can be changed) I<0 -E effectively
deep clones the portions of the primary ref that are traversed.>
B Boolean values.
=back
=head1 Methods
=head2 Methods used to write roles
These are methods that are not meant to be exposed to the final user of a composed role and
class but are used by the role to excersize the class.
=head3 _process_the_data( $passed_ref, $conversion_ref )
=over
B This method is the gate keeper to the recursive parsing of
Data::Walk::Extracted. This method ensures that the minimum requirements for the recursive
data parser are met. If needed it will use a conversion ref (also provided by the caller) to
change input hash keys to the generic hash keys used by this class. This function then
calls the actual recursive function. For an overview of the recursive steps see the
L.
B ( $passed_ref, $conversion_ref )
=over
B<$passed_ref> this ref contains key value pairs as follows;
=over
B - a dataref that the walker will walk - required
=over
review the $conversion_ref functionality in this function for renaming of this key.
=back
B - a dataref that is used for comparision while walking. - optional
=over
review the $conversion_ref functionality in this function for renaming of this key.
=back
B - a method name that will perform some action at the beginning
of each node - optional
B - a method name that will perform some action at the end
of each node - optional
B<[attribute name]> - L attribute names are
accepted with temporary attribute settings here. These settings are temporarily set for
a single "_process_the_data" call and then the original attribute values are restored.
=back
B<$conversion_ref> This allows a public method to accept different key names for the
various keys listed above and then convert them later to the generic terms used by this class.
- optional
B
$passed_ref ={
print_ref =>{
First_key => [
'first_value',
'second_value'
],
},
match_ref =>{
First_key => 'second_value',
},
before_method => '_print_before_method',
after_method => '_print_after_method',
sorted_nodes =>{ Array => 1 },#One shot attribute setter
}
$conversion_ref ={
primary_ref => 'print_ref',# generic_name => role_name,
secondary_ref => 'match_ref',
}
=back
B the $passed_ref (only) with the key names restored to the ones passed to this
method using the $conversion_ref.
=back
=head3 _build_branch( $seed_ref, @arg_list )
=over
B There are times when a role will wish to reconstruct the data branch
that lead from the 'zeroth' node to where the data walker is currently at. This private
method takes a seed reference and uses data found in the L to recursivly append to the front of the seed until a
complete branch to the zeroth node is generated. I
B a list of arguments starting with the $seed_ref to build from.
The remaining arguments are just the array elements of the 'branch ref'.
B
$ref = $self->_build_branch(
$seed_ref,
@{ $passed_ref->{branch_ref}},
);
B a data reference with the current path back to the start pre-pended
to the $seed_ref
=back
=head3 _extracted_ref_type( $test_ref )
=over
B In order to manage data types necessary for this class a data
walker compliant 'Type' tester is provided. This is necessary to support a few non
perl-standard types not generated in standard perl typing systems. First, 'undef'
is the UNDEF type. Second, strings and numbers both return as 'SCALAR' (not '' or undef).
B
B It receives a $test_ref that can be undef.
B a data walker type or it confesses.
=back
=head3 _get_had_secondary
=over
B during the initial processing of data in
L<_process_the_data|/_process_the_data( $passed_ref, $conversion_ref )> the existence
of a passed secondary ref is tested and stored in the attribute '_had_secondary'. On
occasion a role might need to know if a secondary ref existed at any level if it it is
not represented at the current level.
B nothing
B True|1 if the secondary ref ever existed
=back
=head3 _get_current_level
=over
B on occasion you may need for one of the methods to know what
level is currently being parsed. This will provide that information in integer
format.
B nothing
B the integer value for the level
=back
=head2 Public Methods
=head3 add_sorted_nodes( NODETYPE => 1, )
=over
B This method is used to add nodes to be sorted to the walker by
adjusting the attribute L.
B Node key => value pairs where the key is the Node name and the value is
1. This method can accept multiple key => value pairs.
B nothing
=back
=head3 has_sorted_nodes
=over
B This method checks if any sorting is turned on in the attribute
L.
B Nothing
B the count of sorted node types listed
=back
=head3 check_sorted_nodes( NODETYPE )
=over
B This method is used to see if a node type is sorted by testing the
attribute L.
B the name of one node type
B true if that node is sorted as determined by L
=back
=head3 clear_sorted_nodes
=over
B This method will clear all values in the attribute
L. I.
B nothing
B nothing
=back
=head3 remove_sorted_node( NODETYPE1, NODETYPE2, )
=over
B This method will clear the key / value pairs in L
for the listed items.
B a list of NODETYPES to delete
B In list context it returns a list of values in the hash for the deleted
keys. In scalar context it returns the value for the last key specified
=back
=head3 set_sorted_nodes( $hashref )
=over
B This method will completely reset the attribute L to
$hashref.
B a hashref of NODETYPE keys with the value of 1.
B nothing
=back
=head3 get_sorted_nodes
=over
B This method will return a hashref of the attribute L
B nothing
B a hashref
=back
=head3 add_skipped_nodes( NODETYPE1 => 1, NODETYPE2 => 1 )
=over
B This method adds additional skip definition(s) to the
L attribute.
B a list of key value pairs as used in 'skipped_nodes'
B nothing
=back
=head3 has_skipped_nodes
=over
B This method checks if any nodes are set to be skipped in the
attribute L.
B Nothing
B the count of skipped node types listed
=back
=head3 check_skipped_node( $string )
=over
B This method checks if a specific node type is set to be skipped in
the L attribute.
B a string
B Boolean value indicating if the specific $string is set
=back
=head3 remove_skipped_nodes( NODETYPE1, NODETYPE2 )
=over
B This method deletes specificily identified node skips from the
L attribute.
B a list of NODETYPES to delete
B In list context it returns a list of values in the hash for the deleted
keys. In scalar context it returns the value for the last key specified
=back
=head3 clear_skipped_nodes
=over
B This method clears all data in the L attribute.
B nothing
B nothing
=back
=head3 set_skipped_nodes( $hashref )
=over
B This method will completely reset the attribute L to
$hashref.
B a hashref of NODETYPE keys with the value of 1.
B nothing
=back
=head3 get_skipped_nodes
=over
B This method will return a hashref of the attribute L
B nothing
B a hashref
=back
=head3 set_skip_level( $int )
=over
B This method is used to reset the L
attribute after the instance is created.
B an integer (negative numbers and 0 will be ignored)
B nothing
=back
=head3 get_skip_level()
=over
B This method returns the current L
attribute.
B nothing
B an integer
=back
=head3 has_skip_level()
=over
B This method is used to test if the L attribute is set.
B nothing
B $Bool value indicating if the 'skip_level' attribute has been set
=back
=head3 clear_skip_level()
=over
B This method clears the L attribute.
B nothing
B nothing (always successful)
=back
=head3 set_skip_node_tests( ArrayRef[ArrayRef] )
=over
B This method is used to change (completly) the 'skip_node_tests'
attribute after the instance is created. See L for an example.
B an array ref of array refs
B nothing
=back
=head3 get_skip_node_tests()
=over
B This method returns the current master list from the
L attribute.
B nothing
B an array ref of array refs
=back
=head3 has_skip_node_tests()
=over
B This method is used to test if the L attribute
is set.
B nothing
B The number of sub array refs there are in the list
=back
=head3 clear_skip_node_tests()
=over
B This method clears the L attribute.
B nothing
B nothing (always successful)
=back
=head3 add_skip_node_tests( ArrayRef1, ArrayRef2 )
=over
B This method adds additional skip_node_test definition(s) to the the
L attribute list.
B a list of array refs as used in 'skip_node_tests'. These are 'pushed
onto the existing list.
B nothing
=back
=head3 set_change_array_size( $bool )
=over
B This method is used to (re)set the L attribute
after the instance is created.
B a Boolean value
B nothing
=back
=head3 get_change_array_size()
=over
B This method returns the current state of the L
attribute.
B nothing
B $Bool value representing the state of the 'change_array_size'
attribute
=back
=head3 has_change_array_size()
=over
B This method is used to test if the L
attribute is set.
B nothing
B $Bool value indicating if the 'change_array_size' attribute
has been set
=back
=head3 clear_change_array_size()
=over
B This method clears the L attribute.
B nothing
B nothing
=back
=head3 set_fixed_primary( $bool )
=over
B This method is used to change the L attribute
after the instance is created.
B a Boolean value
B nothing
=back
=head3 get_fixed_primary()
=over
B This method returns the current state of the L
attribute.
B nothing
B $Bool value representing the state of the 'fixed_primary' attribute
=back
=head3 has_fixed_primary()
=over
B This method is used to test if the L attribute is set.
B nothing
B $Bool value indicating if the 'fixed_primary' attribute has been set
=back
=head3 clear_fixed_primary()
=over
B This method clears the L attribute.
B nothing
B nothing
=back
=head1 Definitions
=head2 node
Each branch point of a data reference is considered a node. The possible paths
deeper into the data structure from the node are followed 'vertically first' in
recursive parsing. The original top level reference is considered the 'zeroth'
node.
=head2 base node type
Recursion 'base' node L are considered
to not have any possible deeper branches. Currently that list is SCALAR and UNDEF.
=head2 Supported node walking types
=over
=item ARRAY
=item HASH
=item SCALAR
=item UNDEF
I
Support for Objects is partially implemented and as a consequence '_process_the_data'
won't immediatly die when asked to parse an object. It will still die but on a
dispatch table call that indicates where there is missing object support, not at the
top of the node. This allows for some of the L to
use 'OBJECT' in their definitions.
=back
=head2 Supported one shot attributes
L
=over
=item sorted_nodes
=item skipped_nodes
=item skip_level
=item skip_node_tests
=item change_array_size
=item fixed_primary
=back
=head2 Dispatch Tables
This class uses the role L to implement dispatch
tables. When there is a decision point, that role is used to make the class
extensible.
=head1 Caveat utilitor
This is not an extention of L
The core class has no external effect. All output comes from
L.
This module uses the 'L'
( //= ) and so requires perl 5.010 or higher.
This is a L based data handling class.
Many coders will tell you Moose and data manipulation don't belong together. They are
most certainly right in speed intensive circumstances.
Recursive parsing is not a good fit for all data since very deep data structures will
fill up a fair amount of memory! Meaning that as the module recursively parses through
the levels it leaves behind snapshots of the previous level that allow it to keep
track of it's location.
The passed data references are effectivly deep cloned during this process. To leave
the primary_ref pointer intact see L
=head1 Build/Install from Source
B<1.> Download a compressed file with the code
B<2.> Extract the code from the compressed file. If you are using tar this should work:
tar -zxvf Data-Walk-Extracted-v0.xx.xx.tar.gz
B<3.> Change (cd) into the extracted directory
B<4.> Run the following commands
=over
(For Windows find what version of make was used to compile your perl)
perl -V:make
(then for Windows substitute the correct make function (ex. s/make/dmake/g))
=back
>perl Makefile.PL
>make
>make test
>make install # As sudo/root
>make clean
=head1 SUPPORT
=over
L
=back
=head1 TODO
=over
B<1.> provide full recursion through Objects
B<2.> Support recursion through CodeRefs (Closures)
B<3.> Add a Data::Walk::Diff Role to the package
B<4.> Add a Data::Walk::Top Role to the package
B<5.> Add a Data::Walk::Thin Role to the package
B<6.> Convert test suite to Test2 direct usage
=back
=head1 AUTHOR
=over
=item Jed Lund
=item jandrew@cpan.org
=back
=head1 COPYRIGHT
This program is free software; you can redistribute
it and/or modify it under the same terms as Perl itself.
The full text of the license can be found in the
LICENSE file included with this module.
This software is copyrighted (c) 2012, 2016 by Jed Lund.
=head1 Dependencies
=over
L
L<5.010|http://perldoc.perl.org/perl5100delta.html> (for use of
L //)
L
L
L
L - confess
L - 2.1803
L
L
L
L
L