1tomany / pdf-to-image-bundle
Symfony bundle for the 1tomany/pdf-ai library
Installs: 109
Dependents: 0
Suggesters: 0
Security: 0
Stars: 0
Watchers: 2
Forks: 0
Open Issues: 0
Type:symfony-bundle
Requires
- php: >=8.2
- 1tomany/pdf-ai: ^0.5.1
- symfony/dependency-injection: ^7.2
- symfony/http-kernel: ^7.2
Requires (Dev)
- friendsofphp/php-cs-fixer: ^3.87.2
- phpstan/phpstan: ^2.1.23
README
PDFAI is a simple PHP library that makes extracting data from PDFs for large language models easy.
Install PDFAI
composer require 1tomany/pdf-ai-bundle
Usage
Symfony will autowire the necessary classes after the bundle is installed. Any constructor argument typed with OneToMany\PDFAI\Contract\Action\ExtractDataActionInterface
or OneToMany\PDFAI\Contract\Action\ReadMetadataActionInterface
will allow you to interact with the concrete extractor client via the act()
method.
<?php namespace App\File\Action\Handler; use OneToMany\PDFAI\Contract\Action\ExtractDataActionInterface; use OneToMany\PDFAI\Contract\Action\ReadMetadataActionInterface; use OneToMany\PDFAI\Request\ExtractDataRequest; use OneToMany\PDFAI\Request\ExtractTextRequest; use OneToMany\PDFAI\Request\ReadMetadataRequest; final readonly class UploadFileHandler { public function __construct( private ReadMetadataActionInterface $readMetadataAction, private ExtractDataActionInterface $extractDataAction, ) { } public function handle(string $filePath): void { // Read PDF metadata like page count $metadata = $this->readMetadataAction->act( new ReadMetadataRequest($filePath) ); // Rasterize all pages of a PDF to a 150 DPI PNG $request = new ExtractDataRequest( $filePath, // Full path to PDF file 1, // First page to extract null, // Last page to extract, NULL for all pages OutputType::Png, // Jpg and Txt are other options 150, // Output resolution in dots per inch ); // @see OneToMany\PDFAI\Contract\Response\ExtractedDataResponseInterface foreach ($this->extractDataAction->act($request) as $image) { // $image->getData() or $image->toDataUri() } // Extract text from pages 2 through 8 $request = new ExtractTextRequest($filePath, 2, 8); // @see OneToMany\PDFAI\Contract\Response\ExtractedDataResponseInterface foreach ($this->extractDataAction->act($request) as $text) { // $text->getData() or $text->toDataUri() } } }
Testing
If you wish to avoid interacting with an external process in your test environment, you can take advantage of the MockExtractorClient
by simply setting the 1tomany.pdfai_extractor_client
parameter to the value mock
in your Symfony service configuration for the test
environment.
when@test: parameters: 1tomany.pdfai_extractor_client: 'mock'
Without changing any other code, Symfony will automatically inject the MockExtractorClient
instead of the default PopplerExtractorClient
for your tests.
Extending
Don't want to use Poppler? No problem! Create your own extractor class that implements the OneToMany\PDFAI\Contract\Client\ExtractorClientInterface
interface and tag it accordingly.
<?php namespace App\File\Service\PDFAI\Client\Magick; use OneToMany\PDFAI\Contract\Client\ExtractorClientInterface; use OneToMany\PDFAI\Contract\Request\ExtractDataRequestInterface; use OneToMany\PDFAI\Contract\Request\ReadMetadataRequestInterface; use OneToMany\PDFAI\Contract\Response\MetadataResponseInterface; class MagickExtractorClient implements ExtractorClientInterface { public function readMetadata(ReadMetadataRequestInterface $request): MetadataResponseInterface { // Add your implementation here } public function extractData(ExtractDataRequestInterface $request): \Generator { // Add your implementation here } }
parameters: 1tomany.pdfai_extractor_client: 'magick' services: App\File\Service\PDFAI\Client\Magick\MagickExtractorClient: tags: - { name: 1tomany.pdfai_extractor_client, key: magick }
That's it! Again, without changing any code, Symfony will automatically inject the correct extractor client for the action interfaces outlined above.
Run Static Analysis
./vendor/bin/phpstan
Credits
License
The MIT License