Did you ever feel the need of learning lex and yacc? I did.
I just recently found a Python module for parsing grammars: pyparsing. In contrast to traditional, parser-generating approaches, this framework doesn’t require you to learn a specific toolchain. It also doesn’t generate any code. It’s a class library: You construct your grammar by connecting objects.
When building very basic grammars, it looks very similar to the BNF . Thanks to Python’s operator overloading, it’s possible to compose parse nodes (non-terminals) using operators like + (concatenation), ^ (or) and | (match-first). Here’s what it looks like:
from pyparsing import * IntLiteral = Regex('[\\+\\-]?\\d+').setParseAction(lambda s,l,t: int(t[0])) VariableName = Regex('\\w+') EqualSign = Regex('\\s*=\\s*').suppress() WS = White().suppress() KeyValue = Group(VariableName + EqualSign + IntLiteral)
Strings can now be parsed by calling parseString() on the grammar:
self.assertEquals([['foo', 234]], KeyValue.parseString('foo=234').asList())
For my requirements, this is a very usable approach to parsing. It may not be as fast as a generated parser in C, but it’s easy to learn and takes way less time to write.
Posted by Dominik on April 18, 2008 at 12:32 pm
“arbitrary grammars”? Certainly not! More like “arbitrary context-free grammars” or “arbitrary unambiguous context-free grammars”, eh? Still, cool software.
Posted by guenther on April 18, 2008 at 3:53 pm
Oh thanks, you’re right with that. Corrected.
Posted by Paul McGuire on April 21, 2008 at 10:20 am
Guenther -
Welcome to pyparsing! I’m glad you like the intuitive way to combine elements into more complex expressions.
Not sure whether this goes with or against your RE experience, but try to define your grammar just in terms of the non-whitespace characters. For example, you can parse “foo=42″ or “foo = 42″ with the same grammar Word(alphas) + ‘=’ + Word(nums) – pyparsing skips over whitespace by default. No need for that distracting ‘\s*’ clutter!
If you have questions, post them on the Wiki home page Discussion tab, or on the pyparsing mailing list.
– Paul
Posted by My Software Development Blog » Blog Archive » Parser combinators on May 18, 2008 at 10:20 am
[...] especially as it gives a name to the parser construction technique used in the Pyparsing framework, which I wrote about a month ago. It’s called a “Parser [...]