kosuha606 / html-uni-parser
Uni parser for sites
Installs: 50
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 0
Forks: 0
Open Issues: 2
Type:composer-pugin
pkg:composer/kosuha606/html-uni-parser
Requires
- php: >=7.0.0
- beberlei/assert: dev-master
- zendframework/zend-dom: ^2.7@dev
Requires (Dev)
- phpunit/phpunit: 6.5
- seregazhuk/php-watcher: dev-master
README
Universal html parser which can parse every kind of html page
Installation
To install this plugin use composer:
$ composer require kosuha606/html-uni-parser
Usage
There is four available types of parsing html.
Example:
$results = HtmlUniParser::create([ 'pageUrl' => 'http://example.com', 'xpathOnCard' => [ 'h1' => '//h1', 'description' => 'HTML//p' ] ])->parseCard();
Examples
For more examples see the examples/ direcotry
Description of configurable properties
| Property | Description |
|---|---|
| catalogUrl | The url address for parsing by catalog strategy parseCatalog() |
| searchUrl | The url what used to search on goal site. parseSearch() |
| pageUrl | The url what used to parse one page. parseCard() |
| urlGenerator | Callback function what can be used to generate links to parse parseGenerator() |
| encoding | The encoding of goal site |
| siteBaseUrl | Base url for process links after parse |
| resultLimit | Here you can limit the results count |
| sleepAfterRequest | Number of seconds to sleep after each request |
| goIntoCard | Wheather need to go into card when parse catalog links |
| xpathItem | Xpath query what can be used for parse items in list |
| xpathLink | Xpath query what can be used for parse link inside parsed item |
| xpathOnCard | Array of xpath queries, every key will be key in result array |
| typeMech | Type of parsing mechanizm, for example: wget, curl, phantomjs, filegetcontents |
| forceOuterHtml | Force parser to use outer html for xpaths |
Available methods
| Method | Description |
|---|---|
| parseCatalog | To parse catalog links and parse every link this function reutrn results as array of parsed links |
| parseSearch | This method takes an argument of query string for search page and after building search link it behave like parseCatalog |
| parseCard | To parse one page of site |
| parseGenerator | To parse links what was generated by urlGenerator callback |
Run tests
To run tests you can use this command:
./vendor/bin/phpunit