PHP-Yacc
This is a port of kmyacc
into PHP. It is a parser-generator, meaning it takes a YACC grammar file and generates a parser file., (*1)
A Direct Port (For Now)
Right now, this is a direct port. Meaning that it works exactly like kmyacc
. Looking in the examples, you can see that this means that you must supply a "parser template" in addition to the grammar., (*2)
Longer term, we want to add simplifying functionality. We will always support providing a template, but we will offer a series of default templates for common use-cases., (*3)
What can I do with this?
You can parse most structured and unstructured grammars. There are some gotchas to LALR(1) parsers that you need to be aware of (for example, Shift/Shift conflicts and Shift/Reduce conflicts). But those are beyond this simple intro., (*4)
How does it work?
I don't know. I just ported the code until it worked correctly., (*5)
YACC Grammar
That's way beyond the scope of this documentation, but checkout The YACC page here for some info., (*6)
Over time we will document the grammar more..., (*7)
How do I use it?
For now, check out the examples folder. The current state of the CLI tool will change, so any usage today should please provide feedback and use-cases so that we can better design the tooling support., (*8)
Why did you do this?
Many projects have the need for parsers (and therefore parser-generators). Nikita's PHP-Parser is one tool that uses kmyacc to generate its parser. There are many other projects out there that either use hand-written parsers, or use kmyacc or another parser-generator., (*9)
Unfortunately, not many parser-generators exist for PHP. And those that do exist I have found to be rigid or not powerful enough to parse PHP itself., (*10)
This project is an aim to resolve that., (*11)
There's a TON of performance optimizations possible here. The original code was a direct port, so some structures are definitely sub-optimal. Over time we will improve the performance., (*12)
However, this will always be at least a slightly-slow process. Generating a parser requires a lot of resources, so should never happen inside of a web request., (*13)
Using the generated parser however should be quite fast (the generated parser is fairly well optimized already)., (*14)
What's left to do?
A bunch of things. Here's the wishlist:, (*15)
- Refactor to make conventions consistent (some parts currently use camel-case, some parts use snakeCase, etc).
- Performance tuning
- Unit test as much as possible
- Document as much as possible (It's a complicated series of algorithms with no source documentation in either project).
- Redesign the CLI binary and how it operates
- Decide whether multi-language support is worth while, or if we should just move to only PHP codegen support.
- Add default templates and parser implementations
- At least one of which generates an "AST" by default, similar to Ruby's Treetop library
- Build a reasonably performant lexer-generator (very likely as a separate project)
- A lot of debugging (though we don't know of any bugs, they are there)
- Building out of features we didn't need for the initial go (for example, support for
%union
, etc).
And a lot more., (*16)
Contributing