# NAME Web::Query - Yet another scraping library like jQuery # VERSION version 1.01 # SYNOPSIS ```perl use Web::Query; wq('http://www.w3.org/TR/html401/') ->find('div.head dt') ->each(sub { my $i = shift; printf("%d %s\n", $i+1, $_->text); }); ``` # DESCRIPTION Web::Query is a yet another scraping framework, have a jQuery like interface. Yes, I know Ingy's [pQuery](https://metacpan.org/pod/pQuery). But it's just alpha quality. It doesn't work. Web::Query built at top of the CPAN modules, [HTML::TreeBuilder::XPath](https://metacpan.org/pod/HTML%3A%3ATreeBuilder%3A%3AXPath), [LWP::UserAgent](https://metacpan.org/pod/LWP%3A%3AUserAgent), and [HTML::Selector::XPath](https://metacpan.org/pod/HTML%3A%3ASelector%3A%3AXPath). So, this module uses [HTML::Selector::XPath](https://metacpan.org/pod/HTML%3A%3ASelector%3A%3AXPath) and only supports the CSS 3 selector supported by that module. Web::Query doesn't support jQuery's extended queries(yet?). If a selector is passed as a scalar ref, it'll be taken as a straight XPath expression. ``` $wq( '
hello
there
hello
there
foo
' ); $q = Web::Query->new( undef ); ``` This method throw the exception on unknown $stuff. This method returns undefined value on non-successful response with URL. Currently, the only two valid options are _indent_, which will be used as the indentation string if the object is printed, and _no\_space\_compacting_, which will prevent the compaction of whitespace characters in text blocks. - my $q = Web::Query->new\_from\_element($element: HTML::Element) Create new instance of Web::Query from instance of [HTML::Element](https://metacpan.org/pod/HTML%3A%3AElement). - `my $q = Web::Query->new_from_html($html: Str)` Create new instance of Web::Query from HTML. - my $q = Web::Query->new\_from\_url($url: Str) Create new instance of Web::Query from URL. If the response is not success(It means /^20\[0-9\]$/), this method returns undefined value. You can get a last result of response, use the `$Web::Query::RESPONSE`. Here is a best practical code: ```perl my $url = 'http://example.com/'; my $q = Web::Query->new_from_url($url) or die "Cannot get a resource from $url: " . Web::Query->last_response()->status_line; ``` - my $q = Web::Query->new\_from\_file($file\_name: Str) Create new instance of Web::Query from file name. ## TRAVERSING ### add Returns a new object augmented with the new element(s). - add($html) An HTML fragment to add to the set of matched elements. - add(@elements) One or more @elements to add to the set of matched elements. @elements that already are part of the set are not added a second time. ```perl my $group = $wq->find('#foo'); # collection has 1 element $group = $group->add( '#bar', $wq ); # 2 elements $group->add( '#foo', $wq ); # still 2 elements ``` - add($wq) An existing Web::Query object to add to the set of matched elements. - add($selector, $context) $selector is a string representing a selector expression to find additional elements to add to the set of matched elements. $context is the point in the document at which the selector should begin matching ### contents Get the immediate children of each element in the set of matched elements, including text and comment nodes. ### each Visit each nodes. `$i` is a counter value, 0 origin. `$elem` is iteration item. `$_` is localized by `$elem`. ```perl $q->each(sub { my ($i, $elem) = @_; ... }) ``` ### end Back to the before context like jQuery. ### filter Reduce the elements to those that pass the function's test. ```perl $q->filter(sub { my ($i, $elem) = @_; ... }) ``` ### find Get the descendants of each element in the current set of matched elements, filtered by a selector. ```perl my $q2 = $q->find($selector); # $selector is a CSS3 selector. ``` **NOTE** If you want to match the element itself, use ["filter"](#filter). **INCOMPATIBLE CHANGE** From v0.14 to v0.19 (inclusive) find() also matched the element itself, which is not jQuery compatible. You can achieve that result using `filter()`, `add()` and `find()`: ```perl my $wq = wq('bar
bar
bar
``` ### first Return the first matching element. This method constructs a new Web::Query object from the first matching element. ### last Return the last matching element. This method constructs a new Web::Query object from the last matching element. ### match($selector) Returns a boolean indicating if the elements match the `$selector`. In scalar context returns only the boolean for the first element. For the reverse of `not()`, see `filter()`. ### not($selector) Returns all the elements not matching the `$selector`. ```perl # $do_for_love will be every thing, except #that my $do_for_love = $wq->find('thing')->not('#that'); ``` ### and\_back Add the previous set of elements to the current one. ``` # get the h1 plus everything until the next h1 $wq->find('h1')->next_until('h1')->and_back; ``` ### map Creates a new array with the results of calling a provided function on every element. ```perl $q->map(sub { my ($i, $elem) = @_; ... }) ``` ### parent Get the parent of each element in the current set of matched elements. ### prev Get the previous node of each element in the current set of matched elements. ```perl my $prev = $q->prev; ``` ### next Get the next node of each element in the current set of matched elements. ```perl my $next = $q->next; ``` ### next\_until( $selector ) Get all subsequent siblings, up to (but not including) the next node matched `$selector`. ## MANIPULATION ### add\_class Adds the specified class(es) to each of the set of matched elements. ``` # add class 'foo' toelements wq('
foo
bar
foo
foo
barfoo
')->as_html; #foo
foo
bar
foo
!bar
``` ### ` attr ` Get/set attribute values. In getter mode, it'll return either the values of the attribute for all elements of the set, or only the first one depending of the calling context. ```perl my @values = $q->attr('style'); # style of all elements my $first_value = $q->attr('style'); # style of first element ``` In setter mode, it'll set attributes value for all elements, and return back the original object for easy chaining. ```perl $q->attr( 'alt' => 'a picture' )->find( ... ); # can pass more than 1 element too $q->attr( alt => 'a picture', src => 'file:///...' ); ``` The value passed for an attribute can be a code ref. In that case, the code will be called with `$_` set to the current attribute value. If the code modifies `$_`, the attribute will be updated with the new value. ```perl $q->attr( alt => sub { $_ ||= 'A picture' } ); ``` ### ` id ` Get/set the elements's id attribute. In getter mode, it behaves just like `attr()`. In setter mode, it behaves like `attr()`, but with the following exceptions. If the attribute value is a scalar, it'll be only assigned to the first element of the set (as ids are supposed to be unique), and the returned object will only contain that first element. ```perl my $first_element = $q->id('the_one'); ``` It's possible to set the ids of all the elements by passing a sub to `id()`. The sub is given the same arguments as for `each()`, and its return value is taken to be the new id of the elements. ```perl $q->id( sub { my $i = shift; 'foo_' . $i } ); ``` ### ` name ` Get/set the elements's 'name' attribute. ```perl my $name = $q->name; # equivalent to $q->attr( 'name' ); $q->name( 'foo' ); # equivalent to $q->attr( name => 'foo' ); ``` ### ` data ` Get/set the elements's 'data-\*name\*' attributes. ```perl my $data = $q->data('foo'); # equivalent to $q->attr( 'data-foo' ); $q->data( 'foo' => 'bar' ); # equivalent to $q->attr( 'data-foo' => 'bar' ); ``` ### tagname Get/Set the tag name of elements. ```perl my $name = $q->tagname; $q->tagname($new_name); ``` ### before Insert content, specified by the parameter, before each element in the set of matched elements. ``` wq('foo
foo
foo
'); ``` ### insert\_before Insert every element in the set of matched elements before the target. ### insert\_after Insert every element in the set of matched elements after the target. ### ` prepend ` Insert content, specified by the parameter, to the beginning of each element in the set of matched elements. ### remove Delete the elements associated with the object from the DOM. ``` # remove all