Parsing in Python

2008 April 18
tags: ,
by guenthernoack

Did you ever feel the need of learning lex and yacc? I did.

I just recently found a Python module for parsing grammars: pyparsing. In contrast to traditional, parser-generating approaches, this framework doesn’t require you to learn a specific toolchain. It also doesn’t generate any code. It’s a class library: You construct your grammar by connecting objects.

When building very basic grammars, it looks very similar to the BNF . Thanks to Python’s operator overloading, it’s possible to compose parse nodes (non-terminals) using operators like + (concatenation), ^ (or) and | (match-first). Here’s what it looks like:

from pyparsing import *

IntLiteral = Regex('[\\+\\-]?\\d+').setParseAction(lambda s,l,t: int(t[0]))
VariableName = Regex('\\w+')
EqualSign = Regex('\\s*=\\s*').suppress()
WS = White().suppress()

KeyValue = Group(VariableName + EqualSign + IntLiteral)

Strings can now be parsed by calling parseString() on the grammar:

self.assertEquals([['foo', 234]],
           KeyValue.parseString('foo=234').asList())

For my requirements, this is a very usable approach to parsing. It may not be as fast as a generated parser in C, but it’s easy to learn and takes way less time to write.

4 Responses leave one →
  1. 2008 April 18

    “arbitrary grammars”? Certainly not! More like “arbitrary context-free grammars” or “arbitrary unambiguous context-free grammars”, eh? Still, cool software.

  2. 2008 April 18

    Oh thanks, you’re right with that. Corrected.

  3. 2008 April 21

    Guenther -

    Welcome to pyparsing! I’m glad you like the intuitive way to combine elements into more complex expressions.

    Not sure whether this goes with or against your RE experience, but try to define your grammar just in terms of the non-whitespace characters. For example, you can parse “foo=42″ or “foo = 42″ with the same grammar Word(alphas) + ‘=’ + Word(nums) – pyparsing skips over whitespace by default. No need for that distracting ‘\s*’ clutter!

    If you have questions, post them on the Wiki home page Discussion tab, or on the pyparsing mailing list.

    – Paul

Trackbacks & Pingbacks

  1. My Software Development Blog » Blog Archive » Parser combinators

Leave a Reply

Note: You can use basic XHTML in your comments. Your email address will never be published.

Subscribe to this comment feed via RSS