1tomany / pdf-pack-bundle
Symfony bundle for the 1tomany/pdf-pack library
Package info
github.com/1tomany/pdf-pack-bundle
Type:symfony-bundle
pkg:composer/1tomany/pdf-pack-bundle
Requires
- php: >=8.2
- 1tomany/pdf-pack: ^0.6.0
- symfony/config: ^7.2|^8.0
- symfony/dependency-injection: ^7.2|^8.0
- symfony/http-kernel: ^7.2|^8.0
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.93
- phpstan/phpstan: ^2.1
README
pdf-pack is a simple PHP library that makes rasterizing pages and extracting text from PDFs for large language models easy.
Install the bundle
composer require 1tomany/pdf-pack-bundle
Usage
Symfony will autowire the necessary classes after the bundle is installed. Any constructor argument typed with OneToMany\PdfPack\Contract\Action\ExtractActionInterface or OneToMany\PdfPack\Contract\Action\ReadActionInterface will allow you to interact with the concrete extractor client via the act() method.
<?php namespace App\File\Action\Handler; use OneToMany\PdfPack\Contract\Action\ExtractActionInterface; use OneToMany\PdfPack\Contract\Action\ReadActionInterface; use OneToMany\PdfPack\Request\ExtractRequest; use OneToMany\PdfPack\Request\ReadRequest; final readonly class UploadFileHandler { public function __construct( private ReadActionInterface $readAction, private ExtractActionInterface $extractAction, ) { } public function handle(string $filePath): void { // Read PDF metadata like page count $metadata = $this->readAction->act( new ReadRequest($filePath) ); // Rasterize all pages of a PDF $request = new ExtractRequest($filePath) ->fromPage(1) // First page to extract ->toPage(null) // Last page to extract, NULL for all pages ->asPngOutput() // Generate PNG images ->atResolution(150); // At 150 DPI // @see OneToMany\PdfPack\Response\ExtractResponse foreach ($this->extractAction->act($request) as $page) { // $page->getData() or $page->toDataUri() } // Extract text from pages 2 through 8 $request = new ExtractRequest($filePath, 2, 8)->asTextOutput(); // @see OneToMany\PdfPack\Response\ExtractResponse foreach ($this->extractAction->act($request) as $page) { // $page->getData() or $page->toDataUri() } } }
Testing
If you wish to avoid interacting with an external process in your test environment, you can take advantage of the MockClient by simply setting the onetomany_pdfpack.client parameter to the value 'mock' in your Symfony service configuration for the test environment.
when@test: onetomany_pdfpack: client: 'mock'
Without changing any other code, Symfony will automatically inject the MockClient instead of the default PopplerClient for your tests.
Creating your own client
Don't want to use Poppler? No problem! Create your own extractor class that implements the OneToMany\PdfPack\Contract\Client\ClientInterface interface and tag it accordingly.
<?php namespace App\PdfPack\Client\Magick; use OneToMany\PdfPack\Contract\Client\ClientInterface; use OneToMany\PdfPack\Contract\Request\ExtractRequest; use OneToMany\PdfPack\Contract\Request\ReadRequest; use OneToMany\PdfPack\Contract\Response\ReadResponse; class MagickClient implements ClientInterface { public function read(ReadRequest $request): ReadResponse { // Add your implementation here } public function extract(ExtractRequest $request): \Generator { // Add your implementation here } }
onetomany_pdfpack: client: 'magick' services: App\PdfPack\Client\Magick\MagickClient: tags: - { name: onetomany.pdfpack.client, key: magick }
That's it! Again, without changing any code, Symfony will automatically inject the correct extractor client for the action interfaces outlined above.
Credits
License
The MIT License