2017 © Pedro Peláez
 

library useragentparser

User-Agent parser for robot rule sets

image

vipnytt/useragentparser

User-Agent parser for robot rule sets

  • Sunday, December 17, 2017
  • by JanPetterMG
  • Repository
  • 1 Watchers
  • 1 Stars
  • 15,184 Installations
  • PHP
  • 3 Dependents
  • 0 Suggesters
  • 1 Forks
  • 0 Open issues
  • 9 Versions
  • 39 % Grown

The README.md

Build Status Scrutinizer Code Quality Maintainability Test Coverage License Packagist, (*1)

User-Agent parser for robot rule sets

Parser and group determiner optimized for robots.txt, X-Robots-tag and Robots-meta-tag usage cases., (*2)

SensioLabsInsight, (*3)

Requirements:

  • PHP 5.5+, 7.0+ or 8.0+

Installation

The library is available for install via Composer. Just add this to your composer.json file:, (*4)

{
    "require": {
        "vipnytt/useragentparser": "^1.0"
    }
}

Then run php composer update., (*5)

Features

  • Stripping of the version tag.
  • List any rule groups the User-Agent belongs to.
  • Determine the correct group of records by finding the group with the most specific User-agent that still matches.

When to use it?

  • When parsing robots.txt rule sets, for robots online.
  • When parsing the X-Robots-Tag HTTP header.
  • When parsing Robots meta tags in HTML / XHTML documents.

Note: Full User-agent strings, like them sent by eg. web-browsers, is not compatible, this is by design. Supported User-agent string formats are UserAgentName/version with or without the version tag. Eg. MyWebCrawler/2.0 or just MyWebCrawler., (*6)

Getting Started

Strip the version tag.

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot/2.1');
$product = $parser->getProduct()); // googlebot

List different groups the User-agent belongs to

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot-news/2.1');
$userAgents = $parser->getUserAgents());

array(
    'googlebot-news/2.1',
    'googlebot-news/2',
    'googlebot-news',
    'googlebotnews',
    'googlebot'
);

Determine the correct group

Determine the correct group of records by finding the group with the most specific User-agent that still matches your rule sets., (*7)

use vipnytt\UserAgentParser;

$parser = new UserAgentParser('googlebot-news');
$match = $parser->getMostSpecific(['googlebot/2.1', 'googlebot-images', 'googlebot'])); // googlebot

Cheat sheet

$parser = new UserAgentParser('MyCustomCrawler/1.2');

// Determine the correct rule set (robots.txt / robots meta tag / x-robots-tag)
$parser->getMostSpecific($array); // string

// Parse
$parser->getUserAgent(); // string 'MyCustomCrawler/1.2'
$parser->getProduct(); // string 'MyCustomCrawler'
$parser->getVersion(); // string '1.2'

// Crunch the data into groups, from most to less specific
$parser->getUserAgents(); // array
$parser->getProducts(); // array
$parser->getVersions(); // array

Specifications

The Versions

17/12 2017

dev-master

9999999-dev https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php ^5.5 || ^7.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots exclusion protocol rep robots meta tag

17/12 2017

v1.0.4

1.0.4.0 https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php ^5.5 || ^7.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt AS

crawler useragent spider user-agent robot robots.txt x-robots-tag robots exclusion protocol rep robots meta tag

10/08 2016

v1.0.3

1.0.3.0 https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php >=5.5.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots exclusion protocol rep robots meta tag

04/08 2016

v1.0.2

1.0.2.0 https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php >=5.5.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots exclusion protocol rep robots meta tag

31/07 2016

v1.0.1

1.0.1.0 https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php >=5.5.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots exclusion protocol rep robots meta tag

15/07 2016

v1.0.0

1.0.0.0 https://github.com/VIPnytt/UserAgentParser

User-Agent parser for robot rule sets

  Sources   Download

MIT

The Requires

  • php >=5.6.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots meta tag

28/04 2016

v0.2.1

0.2.1.0 https://github.com/VIPnytt/UserAgentParser

User-Agent string parser

  Sources   Download

MIT

The Requires

  • php >=5.6.0
  • ext-mbstring *

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

crawler useragent spider user-agent robot robots.txt x-robots-tag robots-meta-tag useragent-string

18/04 2016

v0.2.0

0.2.0.0 https://github.com/VIPnytt/UserAgentParser

User-Agent string parser class

  Sources   Download

MIT

The Requires

  • php >=5.4.0
  • ext-mbstring *

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

useragent spider user-agent robot robots.txt x-robots-tag web-crawler robots-meta-tag useragent-string

08/04 2016

v0.1.0

0.1.0.0 https://github.com/VIPnytt/UserAgentParser

User-Agent string parser class

  Sources   Download

MIT

The Requires

  • php >=5.4.0

 

The Development Requires

by Jan-Petter Gundersen
by VIP nytt

useragent spider user-agent robot robots.txt x-robots-tag web-crawler robots-meta-tag useragent-string