sukohi / search-bot
Laravel package to crawl websites.
Installs: 14
Dependents: 0
Suggesters: 0
Security: 0
Stars: 2
Watchers: 2
Forks: 0
Open Issues: 0
pkg:composer/sukohi/search-bot
Requires
- fabpot/goutte: ^3.2
 - laravel/framework: ~5.0
 - sukohi/laravel-absolute-url: 1.*
 
This package is not auto-updated.
Last update: 2025-10-26 03:00:38 UTC
README
Laravel package to crawl websites.(Laravel 5+)
Requirements
Installation
Execute the next command.
composer require sukohi/search-bot:1.*
Set the service providers in app.php
'providers' => [
    ...Others...,
    Sukohi\SearchBot\SearchBotServiceProvider::class,
    Sukohi\LaravelAbsoluteUrl\LaravelAbsoluteUrlServiceProvider::class, 
]
Also alias
'aliases' => [
    ...Others...,
    'LaravelAbsoluteUrl' => Sukohi\LaravelAbsoluteUrl\Facades\LaravelAbsoluteUrl::class,
    'SearchBot' => Sukohi\SearchBot\Facades\SearchBot::class,
]
Then execute the next commands.
php artisan vendor:publish
php artisan migrate
Now you have config/search_bot.php which you can set domains restrictions.
Config
return [
    'main' => '*',
    'yahoo' => ['yahoo.com', 'www.yahoo.com'],
    'reddit' => ['www.reddit.com']
];
- If you don't need to set restriction, set 
*. 
Usage
$starting_url = 'http://yahoo.com';
$options = [
    'type' => 'main', // $type is optional.(Default: main),
    'url_deletion' => true  // Default: true
];
$result = \SearchBot::request($starting_url, $options);
if($result->exists()) {
    // Symfony\Component\BrowserKit\Response
    // See http://api.symfony.com/2.3/Symfony/Component/BrowserKit/Response.html
    $response = $result->response();
    // Symfony\Component\DomCrawler/Crawler
    // See http://api.symfony.com/2.3/Symfony/Component/DomCrawler/Crawler.html
    $crawler = $result->crawler();
    $result->links(function($url, $text){
        // All links including URL & text will come here.
    });
    $result->queues(function($crawler_queue, $url, $text){
        // All links that do not exist in DB will come here.
        // $crawler_queue has already type and url.
        $crawler_queue->save();
    });
} else {
    $e = $result->exception();
    echo $e->getMessage();
    $type = $result->type();
    $url = $result->url();
}
Options
- 
type
Type is string that you can decide freely.
Default ismain. - 
url_deletion
If true here, URL accessed will be removed from DB.
Default istrue. 
License
This package is licensed under the MIT License.
Copyright 2017 Sukohi Kuhoh