survos/scraper-bundle

Scrape and cache web pages

Fund package maintenance!
kbond

Installs: 2 347

Dependents: 1

Suggesters: 0

Security: 0

Stars: 1

Watchers: 3

Forks: 0

Open Issues: 0

Type:symfony-bundle

1.5.464 2024-11-29 15:36 UTC

This package is auto-updated.

Last update: 2025-01-09 00:39:58 UTC


README

A Symfony bundle that allows a disk-based web scaper cache.

It also allows a fetch to happen from twig. While this is not a good practice in production, it can speed up prototyping and demos.

Eventually this will be a real cache adapter, but for the moment simply fetching web pages to local storage is sufficient.

After installing the bundle,

Installation

composer req survos/scraper-bundle

If you're not using Flex, enable the bundle by adding the class to bundles.php

// config/bundles.php
<?php

return [
    //...
    Survos\Bundle\SurvosScraperBundle::class => ['all' => true],
    //...
];

Working Demo

Cut and paste the following to see it in action.

symfony new --webapp scraper-bundle-demo && cd scraper-bundle-demo
composer req survos/scraper-bundle
symfony console make:controller AppController
sed -i "s|/app|/|" src/Controller/AppController.php 

cat <<'EOF' > templates/app/index.html.twig
{% extends 'base.html.twig' %}
{% block body %}
    {% set url = 'https://jsonplaceholder.typicode.com/users' %}
    {% set users = request_data(url) %}
    <ul>
        {% for row in users %}
            <li>{{ row.name }} / {{ row.website }}</li>
        {% endfor %}
    </ul>
{% endblock %}
EOF
symfony server:start -d
symfony open:local

When you refresh the page, it will use the cached data and be much faster. To see the fetch in the debug toolbar, clear the cache and reload.

bin/console cache:pool:clear --all
symfony open:local

To use in a service or controller, inject the cache.

    public function index(ScraperService $scraper): Response
    {
        $data = $scraper->fetchData('https://jsonplaceholder.typicode.com/albums', asData: 'object');
        
    }