Phug Reader
What is Phug Reader?
The Reader-class is a small utility that can parse and scan strings for specific entities., (*1)
It's mostly based on Regular Expressions, but also brings in tools to scan strings and expressions of any kind (e.g. string escaping, bracket counting etc.), (*2)
The string passed to the reader is swallowed byte by byte through consume
-mechanisms.
When the string is empty, the parsing is done (usually)., (*3)
This class is specifically made for lexical analysis and expression-validation., (*4)
Installation
Install via Composer, (*5)
composer require phug/reader
Usage
Basics
The process of reading with the Phug\Reader
involves peeking and consuming.
You peek, check, if it's what you wanted and if it is, you consume., (*6)
read-methods on the Reader will peek and consume automatically until they found what you searched for.
match-methods work like peek, but work with regular expressions., (*7)
Lets create a small example code to parse:, (*8)
$code = 'someString = "some string"';
Now we create a reader for that code, (*9)
$reader = new Reader($code);
If you want a fixed encoding, use the second $encoding
parameter., (*10)
Now we can do our reading process.
First we want to read our identifier. We can do that easily with readIdentifier()
which returns null
if no identifier has been encountered and the identifier found otherwise.
It will stop on anything that is not an identifier-character (The space after the identifier, in this case), (*11)
$identifier = $reader->readIdentifier();
if ($identifier === null) {
throw new Exception("Failed to read: Identifier expected");
}
var_dump($identifier); //`someString`
To get to our =
-character directly, we can just skip all spaces we encounter.
This also allows for any spacing you want (e.g. you can indent the above with tabs if you like), (*12)
$reader->readSpaces();
If we need the spaces, we can always catch the returned result.
If no spaces are encountered, it just returns null
., (*13)
Now we want to parse the assignment-operator (=
) (or rather, validate that it's there), (*14)
if (!$reader->peekChar('=')) {
throw new Exception("Failed to read: Assignment expected");
}
//Consume the result, since we're `peek`ing, not `read`ing.
$reader->consume();
Skip spaces again, (*15)
$reader->readSpaces();
and read the string.
If no quote-character ("
or '
) is encountered, it will return null.
Otherwise, it will return the (already parsed) string, without quotes.
Notice that you have to check null
explicitly, since we could also have an empty string (""
) which evaluates to true
in PHP, (*16)
$string = $reader->readString();
if ($string === null) {
throw new Exception("Failed to read: Expected string");
}
var_dump($string); //`some string`
The quote-style encountered will be escaped by default, so you can scan "some \" string"
correctly.
If you want to add other escaping, use the first parameter of readString
., (*17)
Now you have all parts parsed to make up your actual action, (*18)
echo "Set `$identifier` to `$string`"; //Set `someString` to `some string`
and it was validated on that way., (*19)
This was just a small example, Phug Reader is made for loop-parsing., (*20)
Build a small tokenizer
use Phug\Reader;
//Some C-style example code
$code = 'someVar = {a, "this is a string (really, it \"is\")", func(b, c), d}';
$reader = new Reader($code);
$tokens = [];
$blockLevel = 0;
$expressionLevel = 0;
while ($reader->hasLength()) {
//Skip spaces of any kind.
$reader->readSpaces();
//Scan for identifiers
if ($identifier = $reader->readIdentifier()) {
$tokens[] = ['type' => 'identifier', 'name' => $identifier];
continue;
}
//Scan for Assignments
if ($reader->peekChar('=')) {
$reader->consume();
$tokens[] = ['type' => 'assignment'];
continue;
}
//Scan for strings
if (($string = $reader->readString()) !== null) {
$tokens[] = ['type' => 'string', 'value' => $string];
continue;
}
//Scan block start
if ($reader->peekChar('{')) {
$reader->consume();
$blockLevel++;
$tokens[] = ['type' => 'blockStart'];
continue;
}
//Scan block end
if ($reader->peekChar('}')) {
$reader->consume();
$blockLevel--;
$tokens[] = ['type' => 'blockEnd'];
continue;
}
//Scan parenthesis start
if ($reader->peekChar('(')) {
$reader->consume();
$expressionLevel++;
$tokens[] = ['type' => 'listStart'];
continue;
}
//Scan parenthesis end
if ($reader->peekChar(')')) {
$reader->consume();
$expressionLevel--;
$tokens[] = ['type' => 'listEnd'];
continue;
}
//Scan comma
if ($reader->peekChar(',')) {
$reader->consume();
$tokens[] = ['type' => 'next'];
continue;
}
throw new \Exception(
"Unexpected ".$reader->peek(10)
);
}
if ($blockLevel || $expressionLevel)
throw new \Exception("Unclosed bracket encountered");
var_dump($tokens);
/* Output:
[
['type' => 'identifier', 'name' => 'someVar'],
['type' => 'assignment'],
['type' => 'blockStart'],
['type' => 'identifier', 'name' => 'a'],
['type' => 'next'],
['type' => 'string', 'value' => 'this is a string (really, it "is")'],
['type' => 'next'],
['type' => 'identifier', 'name' => 'func'],
['type' => 'listStart'],
['type' => 'identifier', 'name' => 'b'],
['type' => 'next'],
['type' => 'identifier', 'name' => 'c'],
['type' => 'listEnd'],
['type' => 'next'],
['type' => 'identifier', 'name' => 'd'],
['type' => 'blockEnd']
]
*/
Keep expressions intact
Sometimes you want to keep expressions intact, e.g. when you allow inclusion of third-party-code that needs to be parsed separately., (*21)
The Reader brings a bracket-counting-utility that can do just that exactly.
Let's take Jade
as an example:, (*22)
a(href=getUri('/abc', true), title=(title ? title : 'Sorry, no title.'))
To parse this, let's do the following:, (*23)
//Scan Identifier ("a")
$identifier = $reader->readIdentifier();
$attributes = [];
//Enter an attribute block if available
if ($reader->peekChar('(')) {
$reader->consume();
while ($reader->hasLength()) {
//Ignore spaces
$reader->readSpaces();
//Scan the attribute name
if (!($name = $this->readIdentifier())) {
throw new \Exception("Attributes need a name!");
}
//Ignore spaces
$reader->readSpaces();
//Make sure there's a =-character
if (!$reader->peekChar('=')) {
throw new \Exception("Failed to read: Expected attribute value");
}
$reader->consume();
//Ignore spaces
$reader->readSpaces();
//Read the expression until , or ) is encountered
//It will ignore , and ) inside any kind of brackets and count brackets correctly until we actually
//reached the end-bracket
$value = $reader->readExpression([',', ')']);
//Add the attribute to our attribute array
$attributes[$name] = $value;
//If we don't encounter a , to go on, we break the loop
if (!$reader->peekChar(',')) {
break;
}
//Else we consume the , and continue our attribute parsing
$reader->consume();
}
//Now make sure we actually closed our attribute block correctly.
if (!$reader->peekChar(')')) {
throw new \Exception("Failed to read: Expected closing bracket");
}
}
$element = ['identifier' => $identifier, 'attributes' => $attributes];
var_dump($element);
/* Output:
[
'identifier' => 'a',
'attributes' => [
'href' => 'getUri(\'/abc\', true)',
'title' => '(title ? title : \'Sorry, no title.\')'
]
]
*/
You now got a parser for (really, really basic) Jade-elements!
It can handle as many attributes as you like with all possible values you could think of without ever breaking the listing, regardless of contained commas and brackets., (*24)
Digging deeper, the Phug Reader is actually able to lex source code and text of any kind., (*25)