2017 © Pedro Peláez
 

silverstripe-module staticsiteconnector

An external-content connector that retrieves content by scraping a public website.

image

phptek/staticsiteconnector

An external-content connector that retrieves content by scraping a public website.

  • Tuesday, February 28, 2017
  • by phptek
  • Repository
  • 1 Watchers
  • 1 Stars
  • 6 Installations
  • PHP
  • 0 Dependents
  • 0 Suggesters
  • 12 Forks
  • 1 Open issues
  • 5 Versions
  • 0 % Grown

The README.md

WARNING: Project Has Been Archived! Please use phptek/silverstripe-exodus which works with Silverstripe 4.

SilverStripe Static Site Connector

Introduction

This module allows you to extract content from another website by crawling and parsing its DOM structure and transforms it directly into native SilverStripe objects, then imports those objects into SilverStripe's database as though they had been created via the CMS., (*1)

Although this has the disadvantage of leaving it unable to extract any information or structure that isn't represented in the site's markup, it means no special access or reliance on particular back-end systems is required. This makes the module suited for legacy and experimental site-imports, as well as connections to websites generated by obscure CMS's., (*2)

How it works

Importing a site is a 2 or 3 step process (Depending on user-selection)., (*3)

  1. Crawl
  2. Import
  3. Rewrite Links (Automatic, if selected in step 2.)

A list of URLs are fetched and extracted from the site via PHPCrawl, and cached in a text file under the assets directory., (*4)

Each cached URL corresponds to a page or asset (css, image, pdf etc) that the module will attempt to import into native SilverStripe objects e.g. SiteTree and File., (*5)

Page content is imported page-by-page using cUrl, and the desired DOM elements extracted via configurable CSS selectors via phpQuery which is leveraged for this purpose., (*6)

Migration

See the included migration documentation for detailed instruction on migrating a legacy site into SilverStripe using the module., (*7)

Installation

This module requires the PHP Sempahore functions to work. These are installed by default on Debian and some OS/X PHP distributions, but if you're using Macports you'll need to add the +ipc flag when installing php5., (*8)

If compiling PHP from source you need to pass three additional flags to PHP's configure script:, (*9)

./configure <usual flags> '--enable-sysvsem' '--enable-sysvshm' '--enable-sysvmsg'

Once that's done, you can use Composer to add the module to your SilverStripe project:, (*10)

#> composer require phptek/staticsiteconnector

Please see the included Migration document, that describes exactly how to configure the tool to perform a site-scrape / migration., (*11)

There is also an example database-dump (MySQL/MariaDB only) provided which you can import into your DB to get you up and running quickly., (*12)

License

This code is available under the BSD license, with the exception of the PHPCrawl library, bundled with this module which is GPL version 2., (*13)

Authors

The Versions

28/02 2017

dev-master

9999999-dev

An external-content connector that retrieves content by scraping a public website.

  Sources   Download

BSD-3-Clause

The Requires

 

by Mike Parkhill

html silverstripe scraper external-content staticsiteconnector

13/06 2014

dev-issue/MOSS-bugs

dev-issue/MOSS-bugs

An external-content connector that retrieves content by scraping a public website.

  Sources   Download

BSD-3-Clause

The Requires

 

by Mike Parkhill

html silverstripe scraper external-content staticsiteconnector

14/05 2014

1.0-rc1

1.0.0.0-RC1

An external-content connector that retrieves content by scraping a public website.

  Sources   Download

BSD-3-Clause

The Requires

 

by Mike Parkhill

html silverstripe scraper external-content staticsiteconnector

04/05 2014

dev-composer-test

dev-composer-test

An external-content connector that retrieves content by scraping a public website.

  Sources   Download

BSD-3-Clause

The Requires

 

by Mike Parkhill

html silverstripe scraper external-content staticsiteconnector

04/05 2014

dev-refactor-transformer

dev-refactor-transformer

An external-content connector that retrieves content by scraping a public website.

  Sources   Download

BSD-3-Clause

The Requires

 

by Mike Parkhill

html silverstripe scraper external-content staticsiteconnector