2017 © Pedro Peláez
 

library php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

image

algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  • Thursday, March 16, 2017
  • by RayRutjes
  • Repository
  • 43 Watchers
  • 13 Stars
  • 141 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 2 Forks
  • 1 Open issues
  • 11 Versions
  • 60 % Grown

The README.md

Build Status, (*1)

A simple tool to turn DOM into Algolia search friendly record objects., (*2)

It has been built with Wordpress articles indexing in mind, but the tool is now abstracted enough to be re-used on other type of projects., (*3)

For now the parsed DOM will result in the minimum possible number of records, meaning that if a node has at least one child, it will never have a record on its own. If we need such a behaviour, we could easily add it., (*4)

Requirements

This lib needs mbstring PHP extension to be enabled. Also make sure mbregex is NOT disabled., (*5)

Installation

$ composer require algolia/php-dom-parser

Examples

Simple usage

Here is a simple example where we grab the content of an article of Algolia's blog and parse it to obtain the records., (*6)

$article = file_get_contents('https://blog.algolia.com/how-we-re-invented-our-office-space-in-paris/');

$parser = new \Algolia\DOMParser();

// Exclude content by CSS selectors.
$parser->setExcludeSelectors(array(
    'pre',
    '.entry-meta',
    'div.rp4wp-related-posts'
));

// Only parse what is inside a given CSS selectors.
// If there are multiple nodes matching, they will all be parsed.
$parser->setRootSelector('article.post');

// Define your attributes sibling.
$parser->setAttributeSelectors(
    array(
        'title1'  => 'h1',
        'title2'  => 'h2',
        'title3'  => 'h3',
        'title4'  => 'h4',
        'title5'  => 'h5',
        'title6'  => 'h6',
        'content' => 'p, ul, ol, dl, table',
    )
);

// Add some attributes that will be part of every record.
$parser->setSharedAttributes(array(
    'url'    => 'http://www.example.com',
    'visits' => 1933,
));

// Turn the DOM into Algolia search friendly records.
$records = $parser->parse($article);

var_dump($records);

You will find some example of scripts / outputs in the examples folder so that you don't have to run anything to give feedback., (*7)

Little CLI

dynamic.php is a little cli for dynamically fetching the dom of some url. You can optionally pass a root selector as second argument., (*8)

This script was mainly built for testing in the early stages, and we have no intention to develop it further for now., (*9)

$ php examples/dynamic.php https://blog.algolia.com/inside-the-algolia-engine-part-2-the-indexing-challenge-of-instant-search/ article.post

Dev

Test the code., (*10)

vendor/bin/phpunit

Contributing

Please do contribute:, (*11)

  • if you have an idea, a question or just want to say hi: create an issue
  • if you have a bug fix: create a PR

Please ensure the tests passes and also please run php-cs-fixer to ensure the code styles remain consistent., (*12)

The Versions

16/03 2017

dev-master

9999999-dev https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

16/03 2017

0.5.0

0.5.0.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

16/03 2017

dev-chore/bump-0.5.0

dev-chore/bump-0.5.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

30/01 2017

0.4.0

0.4.0.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

30/01 2017

dev-bump-0.4.0

dev-bump-0.4.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

22/11 2016

0.3.0

0.3.0.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

29/09 2016

0.2.0

0.2.0.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

24/05 2016

0.1.0

0.1.0.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

22/05 2016

0.0.3

0.0.3.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

22/05 2016

0.0.2

0.0.2.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes

21/05 2016

0.0.1

0.0.1.0 https://github.com/algolia/php-dom-parser

A simple tool to turn DOM into Algolia search friendly record objects.

  Sources   Download

MIT

The Requires

  • php >=5.3

 

The Development Requires

by Raymond Rutjes