lukaswhite / php-meta-tags-parser
A PHP package for parsing meta tags in HTML documents
Installs: 64
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Type:project
Requires
- voku/simple_html_dom: ^4.7
Requires (Dev)
- phpunit/php-code-coverage: ^9.2
- phpunit/phpunit: ^9.5
This package is auto-updated.
Last update: 2024-12-04 16:17:07 UTC
README
Extracts metadata (title, description, Open Graph etc) from the content of a web page.
Note that this library simply deals with raw HTML, rather than try to tie you down to one particular method for retrieving the content of an external URL. (I usually use Guzzle, but to make it a dependency might cause difficulties in terms of versioning.)
Installation
composer require lukaswhite/php-meta-tags-parser
Usage
use Lukaswhite\MetaTagsParser\Parser; $html = '<html><head>...</head></html>'; $parser = new Parser(); $result = $parser->parse($html);
Using the result
The parse()
method returns an object that encapsulates any page data it's extracted from the provided HTML.
$result->getTitle(); $result->getDescription(); $result->getKeywords(); $result->getUrl(); $result->getFacebookAppId(); $result->openGraph()->getSiteName(); $result->openGraph()->getType(); $result->openGraph()->getTitle(); $result->openGraph()->getDescription(); $result->openGraph()->getLocale(); $result->openGraph()->getImages(); // returns an array of URLs $result->openGraph()->getLatitude(); $result->openGraph()->getLongitude(); $result->openGraph()->getAltitude(); $result->toArray(); // all of the extracted metadata
It will also extract RSS and/or Atom feeds; getFeeds()
returns an array of instances of the Feed
class:
$feed->getType(); // Feed::RSS or Feed::ATOM $feed->isRSS(); $feed->isAtom(); $feed->getUri(); $feed->getTitle();
The getFeeds()
method accepts an optional $type
argument, to choose one or the other:
$result->getFeeds(Feed::RSS); // or $result->getFeeds(Feed::ATOM);
Cleansing the data
The package ships with a very simple string cleanser; essentially it just decodes any HTML entities. You're free to provide your own cleanser; just implement the CleansesStrings
interface, and provide an instance to the parser's constructor. It simply needs to provide a run()
method, that accepts a string and returns the cleansed version.
Sanitizing the data
The package ships with a very simple string sanitzer; under the hood it simply uses the strip_tags()
function. If you wish to provide your own sanitizer, just implement the SanitizesStrings
interface, and provide an instance to the parser's constructor. It simply needs to provide a run()
method, that accepts a string and returns the sanitized version.