piedweb / crawler
Web Crawler to check few SEO basics.
Installs: 182
Dependents: 0
Suggesters: 0
Security: 0
Stars: 1
Watchers: 0
Forks: 0
Open Issues: 0
pkg:composer/piedweb/crawler
Requires
- php: >=8.3
- league/csv: ^9.8
- piedweb/curl: *
- piedweb/extractor: *
- piedweb/text-analyzer: *
- symfony/console: ^6.4|^7
- voku/stringy: dev-php84
- dev-main
- 0.1.875
- 0.1.874
- 0.1.873
- 0.1.872
- 0.1.871
- 0.1.870
- 0.1.869
- 0.1.868
- 0.1.867
- 0.1.866
- 0.1.865
- 0.1.864
- 0.1.863
- 0.1.862
- 0.1.861
- 0.1.860
- 0.1.859
- 0.1.858
- 0.1.857
- 0.1.856
- 0.1.855
- 0.1.854
- 0.1.853
- 0.1.852
- 0.1.851
- 0.1.850
- 0.1.849
- 0.1.848
- 0.1.847
- 0.1.846
- 0.1.845
- 0.1.844
- 0.1.843
- 0.1.842
- 0.1.841
- 0.1.840
- 0.1.839
- 0.1.838
- 0.1.837
- 0.1.836
- 0.1.835
- 0.1.834
- 0.1.833
- 0.1.832
- 0.1.831
- 0.1.830
- 0.1.829
- 0.1.828
- 0.1.827
- 0.1.826
- 0.1.825
- 0.1.824
- 0.1.823
- 0.1.822
- 0.1.821
- 0.1.820
- 0.1.819
- 0.1.818
- 0.1.817
- 0.1.816
- 0.1.815
- 0.1.814
- 0.1.813
- 0.1.812
- 0.1.811
- 0.1.810
- 0.1.809
- 0.1.808
- 0.1.807
- 0.1.806
- 0.1.805
- 0.1.804
- 0.1.803
- 0.1.802
- 0.1.801
- 0.1.800
- 0.1.799
- 0.1.798
- 0.1.797
- 0.1.796
- 0.1.795
- 0.1.794
- 0.1.793
- 0.1.792
- 0.1.791
- 0.1.790
- 0.1.789
- 0.1.788
- 0.1.787
- 0.1.786
- 0.1.785
- 0.1.784
- 0.1.783
- 0.1.782
- 0.1.781
- 0.1.78
- 0.1.77
- 0.1.76
- 0.1.75
- 0.1.74
- 0.1.73
- 0.1.72
- 0.1.71
- 0.1.70
- 0.1.69
- 0.1.68
- 0.1.67
- 0.1.66
- 0.1.65
- 0.1.64
- 0.1.63
- 0.1.62
- 0.1.61
- 0.1.60
- 0.1.59
- 0.1.58
- 0.1.57
- 0.1.56
- 0.1.55
- 0.1.54
- 0.1.53
- 0.1.52
- 0.1.51
- 0.1.50
- 0.1.49
- 0.1.48
- 0.1.47
- 0.1.46
- 0.1.45
- 0.1.44
- 0.1.43
- 0.1.42
- 0.1.41
- 0.1.40
- 0.1.35
- 0.1.34
- 0.1.33
- 0.1.32
- 0.1.30
- 0.1.24
- 0.1.23
- 0.1.22
- 0.1.21
- 0.1.20
- 0.0.13
This package is auto-updated.
Last update: 2025-10-07 08:05:01 UTC
README
CLI Seo Pocket Crawler
Web Crawler to check few SEO basics.
Use the collected data in your favorite spreadsheet software or retrieve them via your favorite language.
French documentation available : https://piedweb.com/seo/crawler
Install
Via Packagist
$ composer create-project piedweb/crawler
Usage
Crawler CLI
$ bin/console crawler:go $start
Arguments:
  start                            Define where the crawl start. Eg: https://piedweb.com
                                   You can specify an id from a previous crawl. Other options will not be listen.
                                   You can use `last` to continue the last crawl (just stopped)
Options:
  -l, --limit=LIMIT                Define where a depth limit [default: 5]
  -i, --ignore=IGNORE              Virtual Robots.txt to respect (could be a string or an URL).
  -u, --user-agent=USER-AGENT      Define the user-agent used during the crawl. [default: "SEO Pocket Crawler - PiedWeb.com/seo/crawler"]
  -w, --wait=WAIT                  In Microseconds, the time to wait between 2 requests. Default 0,1s. [default: 100000]
  -c, --cache-method=CACHE-METHOD  In Microseconds, the time to wait between two request. Default : 100000 (0,1s). [default: 2]
  -r, --restart=RESTART            Permit to restart a previous crawl. Values 1 = fresh restart, 2 = restart from cache
  -h, --help                       Display this help message
  -q, --quiet                      Do not output any message
  -V, --version                    Display this application version
      --ansi                       Force ANSI output
      --no-ansi                    Disable ANSI output
  -n, --no-interaction             Do not ask any interactive question
  -v|vv|vvv, --verbose             Increase the verbosity of messages: 1 for normal output, 2 for more verbose output and 3 for debug
Extract All External Links in 1s from a previous crawl
$ bin/console crawler:external $id [--host]
    --id
        id from a previous crawl
        You can use  `last` too show external links from the last crawl.
    --host -ho
        flag permitting to get only host
Calcul Page Rank
Will update the previous data.csv generated. Then you can explore your website with the PoC pagerank.html
(in a server npx http-server -c-1 --port 3000).
$ bin/console crawler:pagerank $id
    --id
        id from a previous crawl
        You can use `last` too calcul page rank from the last crawl.
Testing
$ composer test
Todo
- Better Links Harvesting and Recording (record context (list, nav, sentence...))
- Transform the PoC (Page Rank Visualizer)
- Complex Page Rank Calculator (with 301, canonical, nofollow, etc.)
Contributing
Please see contributing
Credits
License
The MIT License (MIT). Please see License File for more information.
