sweetrdf / quick-rdf-io
Collection of parser and serializers compatible with sweetrdf/rdfInterface
Installs: 9 701
Dependents: 4
Suggesters: 2
Security: 0
Stars: 2
Watchers: 1
Forks: 0
Open Issues: 0
pkg:composer/sweetrdf/quick-rdf-io
Requires
- php: >=8.0
- ext-mbstring: *
- ext-pcre: *
- ext-xmlreader: *
- ml/json-ld: ^1.2
- pietercolpaert/hardf: >=0.3.1 <1
- sweetrdf/rdf-helpers: ^2
- sweetrdf/rdf-interface: ^3.2
- zozlak/rdf-constants: ^1.1
Requires (Dev)
- php-coveralls/php-coveralls: ^2.4
- phpstan/phpstan: ^1
- phpunit/phpunit: ^10
- sweetrdf/quick-rdf: ^2
- sweetrdf/simple-rdf: ^2
- sweetrdf/term-templates: ^2.0.2
Suggests
- dev-master
- 1.2.1
- 1.2.0
- 1.1.4
- 1.1.3
- 1.1.2
- 1.1.1
- 1.1.0
- 1.0.9
- 1.0.6
- 1.0.5
- 1.0.4
- 1.0.3
- 1.0.2
- 1.0.1
- 1.0.0
- 1.0.0-RC1
- 0.11.1
- 0.11.0
- 0.10.0
- 0.9.0
- 0.7.0
- 0.6.0
- 0.5.0
- 0.4.0
- 0.3.0
- 0.2.0
- 0.1.2
- dev-feature/php8.5
- dev-issue11
- dev-issue7
- dev-nquads-failure-unicode-multext-east
- dev-fix/readme-trigparser-first-parameter-missing
- dev-readme-fix-examples-instream
This package is auto-updated.
Last update: 2025-10-03 09:42:40 UTC
README
Collection of RDF parsers and serializers implementing the https://github.com/sweetrdf/rdfInterface interface.
Originally developed for the quickRdf library.
Supported formats
| format | read/write | class | implementation | streaming[1] |
|---|---|---|---|---|
| rdf-xml | rw | RdfXmlParser, RdfXmlSerializer | own | yes |
| n-triples | rw | NQuadsParser, NQuadsSerializer | own | yes |
| n-triples* | rw | NQuadsParser, NQuadsSerializer | own | yes |
| n-quads | rw | NQuadsParser, NQuadsSerializer | own | yes |
| n-quads* | rw | NQuadsParser, NQuadsSerializer | own | yes |
| turtle | rw | TriGParser, TrigSerializer | pietercolpaert/hardf | yes |
| trig | rw | TriGParser, TrigSerializer | pietercolpaert/hardf | yes |
| JsonLD | rw | JsonLdParser, JsonLdSerializer | ml/json-ld | no |
| JsonLD[2] | w | JsonLdStreamSerializer | own[3] | yes |
[1] A streaming parser/serializer doesn't materialize the whole dataset in memory which assures constant (and low) memory footprint.
(this feature applies only to the parser/serializer - see the section on memory usage below)
[2] Use the jsonld-stream value for the $format parameter of the \quickRdfIo\Util::serialize() to use this serializer.
[3] Outputs data only in the extremely flattened Json-LD but works in a streaming mode.
Installation
- Obtain the Composer
- Run
composer require sweetrdf/quick-rdf-io
Automatically generated documentation
https://sweetrdf.github.io/quickRdfIo/namespaces/quickrdfio.html
It's very incomplete but better than nothing.
RdfInterface and ml/json-ld documentation is included.
Usage
Remark - there are calls to two other libraries in examples
sweetrdf/quick-rdf and sweetrdf/term-templates.
You may install them with composer require sweetrdf/quick-rdf and composer require sweetrdf/term-templates.
Basic parsing
Just use \quickRdfIo\Util::parse($input, $dataFactory, $format, $baseUri), where:
$inputcan be (almost) "anything containing RDF" (an RDF string, a path to a file, an URL, an opened resource (result offopen()), a PSR-7 response or a PSR-7 StreamInterface object).$dataFactoryis an object implementing the\rdfInterface\DataFactoryinterface, e.g.new \quickRdf\DataFactory().$formatis an optional explicit RDF format indication for handling rare situtations when the format can't be autodetected. See thesrc/quickRdfIo/Util.php::getParser()source code to see a list of all accepted$formatvalues.$baseUriis an optional baseURI value (for some kind of$inputvalues it can be autodected).
include 'vendor/autoload.php'; // create a DataFactory - it's needed by all parsers // (DataFactory implementation comes from other package, here sweetrdf/quick-rdf) $dataFactory = new \quickRdf\DataFactory(); // parse a file $iterator = \quickRdfIo\Util::parse('tests/files/quadsPositive.nq', $dataFactory); foreach ($iterator as $i) echo "$i\n"; // parse a remote file (with format autodetection as github wrongly reports text/html) $url = 'https://github.com/sweetrdf/quickRdfIo/raw/master/tests/files/spec2.10.rdf'; $iterator = \quickRdfIo\Util::parse($url, $dataFactory); foreach ($iterator as $i) echo "$i\n"; // parse a PSR-7 response (format recognized from the response content-type header) $url = 'https://www.w3.org/2000/10/rdf-tests/RDF-Model-Syntax_1.0/ms_7.2_1.rdf'; $client = new \GuzzleHttp\Client(); $request = new \GuzzleHttp\Psr7\Request('GET', $url); $response = $client->send($request); $iterator = \quickRdfIo\Util::parse($response, $dataFactory); foreach ($iterator as $i) echo "$i\n"; // parse a string containing RDF with format autodetection $rdf = file_get_contents('https://www.w3.org/2000/10/rdf-tests/RDF-Model-Syntax_1.0/ms_7.2_1.rdf'); $iterator = \quickRdfIo\Util::parse($rdf, $dataFactory); foreach ($iterator as $i) echo "$i\n"; // parse an PHP stream $stream = fopen('tests/files/quadsPositive.nq', 'r'); $iterator = \quickRdfIo\Util::parse($stream, $dataFactory); fclose($stream); // in most cases you will populate a Dataset with parsed triples/quads // (note that a Dataset implementation comes from other package, e.g. sweetrdf/quick-rdf) $dataset = new \quickRdf\Dataset(); $url = 'https://github.com/sweetrdf/quickRdfIo/raw/master/tests/files/spec2.10.rdf'; $dataset->add(\quickRdfIo\Util::parse($url, $dataFactory)); echo $dataset;
Basic serialization
Just use \quickRdfIo\Util::serialize($data, $format, $output, $nmsp), where:
$datais an object implementing the\rdfInterface\QuadIteratorinterface, e.g. a Dataset or an iterator returned by the parser.$formatspecifies an RDF serialization format, e.g.turtleorntriples. See thesrc/quickRdfIo/Util.php::getSeriazlier()source code to see a list of all accepted$formatvalues.$outputis an optional parameter describing where the output should be written. If it's missing or null, output is returned as a string. If it's a string, it's treated as a path to open withfopen($output, 'wb'). If it's a stream resource or PSR-8StreamInterfaceinstance, the output is just written into it.$nmspis an optional parameter used to pass desired RDF namespace aliases to the serializer. Note some formats like n-triples and n-quads don't support namespace aliases while in others (e.g. turtle) it's very common to use them.
include 'vendor/autoload.php'; $iterator = ...some \rdfInterface\QuadIterator, e.g. one from parsing examples... // serialize to file in text/turtle format \quickRdfIo\Util::serialize($iterator, 'turtle', 'myFile.ttl'); // serialize to string echo \quickRdfIo\Util::serialize($iterator, 'turtle'); // use given namespace aliases when serializing to turtle $nmsp = new \quickRdf\RdfNamespace(); $nmsp->add('http://purl.org/dc/terms/', 'dc'); $nmsp->add('http://www.w3.org/1999/02/22-rdf-syntax-ns#', 'rdf'); echo \quickRdfIo\Util::serialize($iterator, 'turtle', null, $nmsp);
Basic conversion
include 'vendor/autoload.php'; // create a DataFactory - it's needed by all parsers // (note that DataFactory implementation comes from other package, e.g. sweetrdf/quick-rdf) $dataFactory = new \quickRdf\DataFactory(); // or any other example from the "Basic parsing" section above $iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory); // or any other example from the "Basic serialization" section above \quickRdfIo\Util::serialize($iterator, 'rdf', 'output.rdf');
Basic filtering without a Dataset
It's worth noting that basic triples/quads filtering can be done in a memory efficient way without usage of a Dataset implementation.
Let's say we want to copy all triples with the https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier predicate
from the test/files/puzzle4d_100k.nt n-triples file into a ids.ttl turtle file.
A typical approach would be to load data into a Dataset, filter them there and finally serialize the Dataset:
include 'vendor/autoload.php'; $dataFactory = new \quickRdf\DataFactory(); $t = microtime(true); // parse input into a Dataset $iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory); $dataset = new \quickRdf\Dataset(); $dataset->add($iterator); // filter out non-matching triples $template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null); $dataset->deleteExcept($template); // serialize \quickRdfIo\Util::serialize($dataset, 'turtle', 'ids.ttl'); print_r([ 'time [s]' => microtime(true) - $t, 'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024), ]); // 4.4s, 125 MB of RAM
but it can be also done by using a "filtering generator" instead of the Dataset. With this approach we avoid materializing the whole dataset in memory which should both reduce memory footprint and speed things up a little:
include 'vendor/autoload.php'; $dataFactory = new \quickRdf\DataFactory(); $t = microtime(true); // prepare input generator $iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory); // create a generator performing the filtering $template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null); $filter = function($iter, $tmpl) { foreach ($iter as $quad) { if ($tmpl->equals($quad)) { yield $quad; } } }; // wrap it into something implementing \rdfInterface\QuadIterator for types compatibility $wrapper = new \rdfHelpers\GenericQuadIterator($filter($iterator, $template)); // serialize our filtering generator \quickRdfIo\Util::serialize($wrapper, 'turtle', 'ids.ttl'); print_r([ 'time [s]' => microtime(true) - $t, 'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024), ]); // 2.7s, 51 MB of RAM
Results are better but the memory footprint is still surprisingly high.
This is because of the DataFactory implementation we've used and performance optimizations it's applying
(which admitedly in our scenario only slow things down).
We can can optimize further by using as dumb as possible DataFactory implementation
(for that we need another package - sweetrdf/simple-rdf):
include 'vendor/autoload.php'; $dataFactory = new \simpleRdf\DataFactory(); $t = microtime(true); // prepare input generator $iterator = \quickRdfIo\Util::parse('tests/files/puzzle4d_100k.nt', $dataFactory); // create a generator performing the filtering $template = new \termTemplates\QuadTemplate(null, $dataFactory->namedNode('https://vocabs.acdh.oeaw.ac.at/schema#hasIdentifier'), null); $filter = function($iter, $tmpl) { foreach ($iter as $quad) { if ($tmpl->equals($quad)) { yield $quad; } } }; // wrap it into something implementing \rdfInterface\QuadIterator for types compatibility $wrapper = new \rdfHelpers\GenericQuadIterator($filter($iterator, $template)); // serialize our filtering generator \quickRdfIo\Util::serialize($wrapper, 'turtle', 'ids.ttl'); print_r([ 'time [s]' => microtime(true) - $t, 'memory [MB]' => (int) (memory_get_peak_usage(true) / 1024 / 1024), ]); // 1.9s, 2 MB of RAM
As we can see the optimized implementation is 2.3 times faster and has 60 times lower memory footprint that a Dataset-based one.
Notes:
- Check the sweetrdf/term-templates library for more classes allowing to easily match triples/quads fulfilling given conditions.
- This approach is not limited to filtering. Simple triples/quads modifications can be applied similar way
(just adjust the "filtering generator"
foreachloop body).
Manual parser/serializer instantiation
It's of course possible to instantiate particular parser/serializer explicitly.
This is the only option to fine-tune parser/serializer configuration, e.g.:
- Create a strict n-triples parser
$parser = new \quickRdfIo\NQuadsParser($dataFactory, true, \quickRdfIo\NQuadsParser::MODE_TRIPLES);
- Create a JsonLD serializer applying compacting with a context read from a given file and producing pretty-printed JSON:
$serializer = new \quickRdfIo\JsonLdSerializer( 'http://baseUri', \quickRdfIo\JsonLdSerializer::TRANSFORM_COMPACT, JSON_UNESCAPED_SLASHES | JSON_PRETTY_PRINT, 'context.jsonld' );
Be aware that parsing/serialization with the manually created parser/serializer instance requires a little more code.
Compare
include 'vendor/autoload.php'; $data = ...data read from somewhere... // using \quickRdfIo\Util::serialize() \quickRdfIo\Util::serialize($data, 'jsonld', 'output.jsonld'); // using manually instantiated serializer $serializer = new \quickRdfIo\JsonLdSerializer(); $output = fopen('output.jsonld', 'w'); $serializer->serialize($data, $output); fclose($output);