Login | Register

Nerd Paradise

We won't tell anyone you're here.
The Forum > Open Mic > Expression Parser
An expression parser. In Python. Because this hasn't been done before on this site.

So why would I reinvent the wheel? We already know that programmers are supposed to be lazy. There's a simple reason, really. I'm insane.

No, really. Ask anyone here. I'm eat flies and rave about my master crazy.



Okay...so as it turns out, there's another reason. When I was writing my raytracer in Python/OpenCL, I got tired of having to hard code scenes into the program. So I wanted a scene file parser. Problem was, I also wanted to be able to easily define where things were located logically: if things appeared in a circle, I'd need sine and cosine functions. Plus multiplication, division...all of the basic operations covered in Blake's article.

Blake's approach was one that I considered, but I also happened upon this parser tutorial. It follows an entirely different approach to parsing which proved to be far easier to extend. The reason for this is that the order of operations is controlled by a very small set of functions that rarely, if ever, need to be modified. I won't go into great detail about how the parser works as the article does a good enough job of that already.

The parser works, at least how I've written it, by building a parse tree of the input. If the parse tree itself is useful then one could easily just return the tree, but that was of no use to me. So each node in the parse tree has an eval function which defines exactly how that node operates. Of course, like Blake's parser, you aren't limited to just actually evaluating the expression. You could render the equation and return the image up the chain instead. But for my purposes, I simply stuck to mathematical evaluation. Here's some examples:

infix("-"110); prefix("-"130)
@method(symbol("-"))
def eval(self):
    # evaluate subtraction
    if self.second is None# Prefix
        return -self.first.eval()
    else# Infix
        return self.first.eval() - self.second.eval()


This code sample defines the '-' operator. infix and prefix are just helper functions to set up the "-" operator's node class. They're in the full code below. You may notice that they take numbers as a second parameter; these numbers actually define the order of operations. Again, just reference the article I linked, it's all explained there. @method is a decorator (which I didn't know was a thing until this parser) that takes the following function and adds it to that node's class. The method function is also in the code below. It's kind of scary, actually.

Then you see an eval function. This is the meat of the operator's definition. eval is the only function that you should need to ever write to define your operators or functions. The two variables that you see, self.first and self.second, are defined when the expression is parsed, using the rules set by infix and prefix. So all we have to do in eval is see how many are set. If self.second is None (null), then this was used in a prefix fashion and we just need to negate self.first.eval(). Otherwise, we just need to define subtraction: self.first.eval() - self.second.eval().

Simple, right? Makes you wonder why I typed so much.

So let's look at a more fun example: exponentiation.

@method(infix_r("**"140))
def eval(self):
    # evaluate exponentiation
    return self.first.eval() ** self.second.eval()


There's a new function here: infix_r. It works similarly to infix, only with right associativity. By that I mean that exponentiation is evaluated right to left. So 2**3**4 is equivalent to 2**(3**4). 2**3*4, however, is still evaluated as (2**3)*4.

After that, it's pretty much the same as before: define exponentiation as it is, self.first.eval() ** self.second.eval().

But Deckmaster, this is boooooringgggggg!

Okay, fine.

@method(function("sin"))
def eval(self):
    if 1 == len(self.first):
        return math.sin(self.first[0].eval())
    else:
        raise SyntaxError("The sine function requires 1 argument")


There is also a helper function called...well...function. I can take full credit for this one, at least!

It ensures that when the parse tree is being built, any given function is followed by parentheses, then a comma-separated list of arguments. Said list is put into a list in self.first. So to check the number of arguments each function receives, we just need to check len(self.first). After that, it's the same as before: return math.sin(self.first[0].eval()).

However, if the number of arguments is incorrect, it's probably good to throw an SyntaxError. The function...function... does not actually verify that the length of the argument list is correct. If you care enough you could probably modify the full code below to make that happen, I just didn't.

@method(function("atan"))
def eval(self):
    if 1 == len(self.first):
        return math.atan(self.first[0].eval())
    elif 2 == len(self.first):
        return math.atan2(self.first[0].eval(), self.first[1].eval())
    else:
        raise SyntaxError("The arc tangent function requires 1-2 arguments")


Being able to have a proper argument list makes letting the arctan function have 1 or 2 arguments pretty easy.

Lastly, I would like to remind you that the language that is defined needs not have such simple evaluation rules all the time. You could even, say...define Euler's totient function!

@method(function("phi"))
def eval(self):
    if 1 == len(self.first):
        num = self.first[0].eval()
        if isinstance(num, (int, long)):
            # Factor the number and compute phi
            phi = num
            if 0 == num % 2:
                phi = phi / 2
                num /= 2
                while 0 == num % 2:
                    num /= 2
            if 0 == num % 3:
                phi = phi / 3 * 2
                num /= 3
                while 0 == num % 3:
                    num /= 3
            i = 0
            while i <= num ** 0.5:
                i += 6
                if 0 == num % (i - 1):
                    phi = phi / (i - 1) * (i - 2)
                    num /= (i - 1)
                    while 0 == num % (i - 1):
                        num /= (i - 1)
                if 0 == num % (i + 1):
                    phi = phi / (i + 1) * i
                    num /= (i + 1)
                    while 0 == num % (i + 1):
                        num /= (i + 1)
            if 1 != num:
                phi = phi / num * (num - 1)
            return phi
        else:
            raise SyntaxError("The coprime counting function requires an integer argument")
    else:
        raise SyntaxError("The coprime counting function requires 1 argument")


Short version: I factor the given number (if it's an integer), then I use the factorization to compute the number of coprime numbers under n. Don't limit yourself to simple functions in your syntax!

There's a couple of other things that I should probably mention about my parser, though. For one thing, you can define constants, just like Blake's:

constant("pi", math.pi)
constant("e", math.e)


Pretty simple. But what good is an expression parser without the ability to store your own variables? Well, it can be a lot of good, but it's even better if you can store variables. So you can in mine, too!

The two functions to remember for that are:
set_variable(name, value)
del_variable(name)


I am confident that you can figure out what they do. If you're letting the user pick the names of what variables to set, you might not want to let them overwrite function names and the like. So you can always check exp.keywords, which is automatically populated with the keywords of your syntax when you define the syntax.

One last thing, you notice how I just used "exp.keywords"? My expression parser is a class, so you *might* want to make sure to instantiate it before you use it. (This also means that you can set variables in *one* expression parser without affecting others!)

So yeah, that's my expression parser! Well...no it isn't. This is:

class ExpressionParser:
    class symbol_base:
        id = None
        value = None
        first = second = None

        def nud(self):
            raise SyntaxError("Syntax error (%r)." % self.id)

        def led(self, left):
            raise SyntaxError("Unknown operator (%r)." % self.id)

        def eval(self):
            raise SyntaxError("Evaluation procedure unknown for (%r)." % self.id)

        def __repr__(self):
            if self.id == "(name)" or self.id == "(literal)":
                return "(%s %s)" % (self.id[1:-1], self.value)
            out = [self.id, self.first, self.second, self.third]
            out = map(str, filter(None, out))
            return "(" + " ".join(out) + ")"

    symbol_table = {}
    variables = {}
    functions = {}
    keywords = set()

    def __init__(self):
        import math
        
        me = self
        def symbol(id, bp=0):
            try:
                s = self.symbol_table[id]
            except KeyError:
                self.keywords.add(id) # Add this symbol as a keyword
                class s(ExpressionParser.symbol_base):
                    pass
                s.__name__ = "symbol-" + id # for debugging
                s.id = id
                s.value = None
                s.lbp = bp
                me.symbol_table[id] = s
            else:
                s.lbp = max(bp, s.lbp)
            return s

        # helpers

        def infix(id, bp):
            def led(self, left):
                self.first = left
                self.second = me.to_parse_tree(bp)
                return self
            s = symbol(id, bp)
            s.led = led
            return s

        def infix_r(id, bp):
            def led(self, left):
                self.first = left
                self.second = me.to_parse_tree(bp-1)
                return self
            s = symbol(id, bp)
            s.led = led
            return s

        def prefix(id, bp):
            def nud(self):
                self.first = me.to_parse_tree(bp)
                return self
            s = symbol(id)
            s.nud = nud
            return s

        def postfix(id, bp):
            def led(self, left):
                self.first = left
                return self
            s = symbol(id, bp)
            s.led = led
            return s

        def function(id):
            def nud(self):
                # Ensure that the next token is a (
                advance("(")
                self.first = []
                while me.token.id != ")":
                    self.first.append(me.to_parse_tree())
                    if "," != me.token.id:
                        break
                    advance(",")
                advance(")")
                return self
            try:
                # Use the functions table, not the symbol_table
                s = self.functions[id]
            except KeyError:
                self.keywords.add(id) # Add this function name as a keyword
                class s(ExpressionParser.symbol_base):
                    pass
                s.__name__ = "symbol-" + id # for debugging
                s.id = id
                s.value = None
                s.lbp = 0
                me.functions[id] = s
            s.nud = nud
            return s

        def constant(id, value):
            if id not in self.variables:
                self.keywords.add(id) # This constant is now a keyword
                self.variables[id] = value

        def advance(id=None):
            if id:
                if not me.token or me.token.id != id:
                    raise SyntaxError("Expected %r" % id)
            me.token = me.next()

        def method(s):
            # decorator
            assert issubclass(s, ExpressionParser.symbol_base)
            def bind(fn):
                setattr(s, fn.__name__, fn)
            return bind

        # Constants
        constant("pi", math.pi)
        constant("e", math.e)

        # Syntax

        # Addition
        infix("+"110); prefix("+"130)
        @method(symbol("+"))
        def eval(self):
            # evaluate addition
            if self.second is None# Prefix
                return self.first.eval()
            else# Infix
                return self.first.eval() + self.second.eval()

        # Subtraction and negation
        infix("-"110); prefix("-"130)
        @method(symbol("-"))
        def eval(self):
            # evaluate subtraction
            if self.second is None# Prefix
                return -self.first.eval()
            else# Infix
                return self.first.eval() - self.second.eval()

        # Multiplication
        @method(infix("*"120))
        def eval(self):
            # evaluate multiplication
            return self.first.eval() * self.second.eval()

        # Division
        @method(infix("/"120))
        def eval(self):
            # evaluate division
            a = self.first.eval()
            b = self.second.eval()
            if isinstance(a, (int, long)) and isinstance(a, (int, long)) and 0 != a % b:
                return float(a) / float(b)
            else:
                return a / b

        # Floor division
        @method(infix("//"120))
        def eval(self):
            # evaluate floor division
            return int(self.first.eval() // self.second.eval())

        # Exponentiation
        @method(infix_r("**"140))
        def eval(self):
            # evaluate exponentiation
            return self.first.eval() ** self.second.eval()

        # Factorial
        @method(postfix("!"140))
        def eval(self):
            # evaluate factorial
            val = self.first.eval()
            if isinstance(val, (int, long)) and val >= 0:
                return math.factorial(val)
            else:
                raise SyntaxError("Factorial is only defined for nonnegative integer values")

        # Absolute value function
        @method(function("abs"))
        def eval(self):
            if 1 == len(self.first):
                return abs(self.first[0].eval())
            else:
                raise SyntaxError("The absolute value function requires 1 argument")

        # Square root function
        @method(function("sqrt"))
        def eval(self):
            if 1 == len(self.first):
                return math.sqrt(self.first[0].eval())
            else:
                raise SyntaxError("The square root function requires 1 argument")

        # Logarithm (base 10) function
        @method(function("log"))
        def eval(self):
            if 1 == len(self.first):
                return math.log10(self.first[0].eval())
            else:
                raise SyntaxError("The logarithm function requires 1 argument")

        # Natural Logarithm function
        @method(function("ln"))
        def eval(self):
            if 1 == len(self.first):
                return math.log(self.first[0].eval())
            else:
                raise SyntaxError("The natural logarithm function requires 1 argument")

        # Logarithm (base 2) function
        @method(function("log2"))
        def eval(self):
            if 1 == len(self.first):
                return math.log(self.first[0].eval(), 2)
            else:
                raise SyntaxError("The logarithm (base 2) function requires 1 argument")

        # Sine function
        @method(function("sin"))
        def eval(self):
            if 1 == len(self.first):
                return math.sin(self.first[0].eval())
            else:
                raise SyntaxError("The sine function requires 1 argument")

        # Cosine function
        @method(function("cos"))
        def eval(self):
            if 1 == len(self.first):
                return math.cos(self.first[0].eval())
            else:
                raise SyntaxError("The cosine function requires 1 argument")

        # Tangent function
        @method(function("tan"))
        def eval(self):
            if 1 == len(self.first):
                return math.tan(self.first[0].eval())
            else:
                raise SyntaxError("The tangent function requires 1 argument")

        # Arc sine function
        @method(function("asin"))
        def eval(self):
            if 1 == len(self.first):
                return math.asin(self.first[0].eval())
            else:
                raise SyntaxError("The arc sine function requires 1 argument")

        # Arc sine function
        @method(function("arcsin"))
        def eval(self):
            if 1 == len(self.first):
                return math.asin(self.first[0].eval())
            else:
                raise SyntaxError("The arc sine function requires 1 argument")

        # Arc cosine function
        @method(function("acos"))
        def eval(self):
            if 1 == len(self.first):
                return math.acos(self.first[0].eval())
            else:
                raise SyntaxError("The arc cosine function requires 1 argument")

        # Arc cosine function
        @method(function("arccos"))
        def eval(self):
            if 1 == len(self.first):
                return math.acos(self.first[0].eval())
            else:
                raise SyntaxError("The arc cosine function requires 1 argument")

        # Arc tangent function
        @method(function("atan"))
        def eval(self):
            if 1 == len(self.first):
                return math.atan(self.first[0].eval())
            elif 2 == len(self.first):
                return math.atan2(self.first[0].eval(), self.first[1].eval())
            else:
                raise SyntaxError("The arc tangent function requires 1-2 arguments")

        # Arc tangent function
        @method(function("arctan"))
        def eval(self):
            if 1 == len(self.first):
                return math.atan(self.first[0].eval())
            elif 2 == len(self.first):
                return math.atan2(self.first[0].eval(), self.first[1].eval())
            else:
                raise SyntaxError("The arc tangent function requires 1-2 arguments")

        # Int function
        @method(function("int"))
        def eval(self):
            if 1 == len(self.first):
                return int(self.first[0].eval())
            else:
                raise SyntaxError("The int function requires 1 argument")

        # Floor function
        @method(function("floor"))
        def eval(self):
            if 1 == len(self.first):
                return int(self.first[0].eval()) # Int automatically floors
            else:
                raise SyntaxError("The floor function requires 1 argument")

        # Ceil function
        @method(function("ceil"))
        def eval(self):
            if 1 == len(self.first):
                return int(math.ceil(self.first[0].eval()))
            else:
                raise SyntaxError("The ceil function requires 1 argument")

        # Round function
        @method(function("round"))
        def eval(self):
            if 1 == len(self.first):
                return int(round(self.first[0].eval()))
            else:
                raise SyntaxError("The round function requires 1 argument")

        # Permutation function
        @method(function("nPr"))
        def eval(self):
            if 2 == len(self.first):
                n = self.first[0].eval()
               &
[Quote] [Link]
So, fyi, the actual post is...kind of a bit longer. Turns out there's a length limit on displaying threads? Mainly just some code got cut off, you can always hit quote if you want to see it.

(Yes, a double post. Only so that this message is actually visible, because otherwise it'd go at the end of the last post and, well...get cut off.)

Edit: You can also find the source code here.
[Quote] [Link]
The Forum > Open Mic > Expression Parser
Current Date: 14 Vigeo 5:5Current Time: 12.3.77Join us in IRC...
Server: irc.esper.net
Channel: #nerdparadise
Your IP: 54.204.94.228Browser: UnknownBrowser Version: 0