2017 © Pedro Peláez
 

library robotstxtparser

Robots.txt parsing library, with full support for every directive and specification.

image

vipnytt/robotstxtparser

Robots.txt parsing library, with full support for every directive and specification.

  • Saturday, July 21, 2018
  • by JanPetterMG
  • Repository
  • 3 Watchers
  • 10 Stars
  • 4,657 Installations
  • PHP
  • 2 Dependents
  • 1 Suggesters
  • 2 Forks
  • 1 Open issues
  • 11 Versions
  • 230 % Grown

The README.md

Build Status Scrutinizer Code Quality Maintainability Test Coverage License Packagist Gitter, (*1)

Robots.txt parser

An easy to use, extensible robots.txt parser library with full support for literally every directive and specification on the Internet., (*2)

SensioLabsInsight, (*3)

Usage cases:

  • Permission checks
  • Fetch crawler rules
  • Sitemap discovery
  • Host preference
  • Dynamic URL parameter discovery
  • robots.txt rendering

Advantages

(compared to most other robots.txt libraries) - Automatic robots.txt download. (optional) - Integrated Caching system. (optional) - Crawl Delay handler. - Documentation available. - Support for literally every single directive, from every specification. - HTTP Status code handler, according to Google's spec. - Dedicated User-Agent parser and group determiner library, for maximum accuracy. - Provides additional data like preferred host, dynamic URL parameters, Sitemap locations, etc. - Protocols supported: HTTP, HTTPS, FTP, SFTP and FTP/S., (*4)

Requirements:

Installation

The recommended way to install the robots.txt parser is through Composer. Add this to your composer.json file:, (*5)

{
  "require": {
    "vipnytt/robotstxtparser": "^2.1"
  }
}

Then run: php composer update, (*6)

Getting started

Basic usage example

<?php
$client = new vipnytt\RobotsTxtParser\UriClient('http://example.com');

if ($client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html')) {
    // Access is granted
}
if ($client->userAgent('MyBot')->isDisallowed('http://example.com/admin')) {
    // Access is denied
}

A small excerpt of basic methods

<?php
// Syntax: $baseUri, [$statusCode:int|null], [$robotsTxtContent:string], [$encoding:string], [$byteLimit:int|null]
$client = new vipnytt\RobotsTxtParser\TxtClient('http://example.com', 200, $robotsTxtContent);

// Permission checks
$allowed = $client->userAgent('MyBot')->isAllowed('http://example.com/somepage.html'); // bool
$denied = $client->userAgent('MyBot')->isDisallowed('http://example.com/admin'); // bool

// Crawl delay rules
$crawlDelay = $client->userAgent('MyBot')->crawlDelay()->getValue(); // float | int

// Dynamic URL parameters
$cleanParam = $client->cleanParam()->export(); // array

// Preferred host
$host = $client->host()->export(); // string | null
$host = $client->host()->getWithUriFallback(); // string
$host = $client->host()->isPreferred(); // bool

// XML Sitemap locations
$host = $client->sitemap()->export(); // array

The above is just a taste the basics, a whole bunch of more advanced and/or specialized methods are available for almost any purpose. Visit the cheat-sheet for the technical details., (*7)

Visit the Documentation for more information., (*8)

Directives

Specifications

The Versions

21/07 2018

dev-master

9999999-dev https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

21/07 2018

v2.0.1

2.0.1.0 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

15/02 2018

v2.0.0

2.0.0.0 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

15/02 2018

v2.0.0-rc.3

2.0.0.0-RC3 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

23/08 2016

v2.0.0-rc.2

2.0.0.0-RC2 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

10/08 2016

v2.0.0-rc.1

2.0.0.0-RC1 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, with full support for every directive and specification.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

02/08 2016

v2.0.0-beta.2

2.0.0.0-beta2 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parsing library, according to Google, Yandex, W3C and The Web Robots Pages specifications.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robots exclusion protocol rep

11/07 2016

v2.0.0-beta.1

2.0.0.0-beta1 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parser class, according to Google, Yandex, W3C and The Web Robots Pages specifications.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler sitemaps.org robot exclusion standard sean conner martijn koster robot exclusion protocol

19/06 2016

v2.0.0-alpha.1

2.0.0.0-alpha1 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parser class, according to Google, Yandex, W3C and The Web Robots Pages specifications.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robots robot robots.txt the web robots pages web-crawler robot exclusion standard

24/04 2016

v1.0.1

1.0.1.0 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parser class, according to Google, Yandex, W3C and The Web Robots Pages specifications.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robot robots.txt the web robots pages web-crawler

24/04 2016

v1.0.0

1.0.0.0 https://github.com/VIPnytt/RobotsTxtParser

Robots.txt parser class, according to Google, Yandex, W3C and The Web Robots Pages specifications.

  Sources   Download

MIT

The Requires

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

parser google yandex w3c spider robot robots.txt the web robots pages web-crawler