一个非常简单的解析库,基于 Top-Down 算法。
项目描述
解析器
该库旨在提供一种使用自 顶向下解析算法在 Python 中编写简单词法分析器/解析器的有效方法。
代码在GitHub 上维护,文档在ReadTheDocs上可用。
其他 python 库提供解析/词法分析工具(参见http://nedbatchelder.com/text/python-parsers.html获取一些示例);tdparser 的显着特点是:
避免基于文档字符串的语法定义
提供通用解析器结构,能够处理任何语法
不生成代码
让用户决定解析结果的性质:抽象语法树、最终表达式……
例子
下面是一个简单算术解析器的定义:
import re
from tdparser import Lexer, Token
class Integer(Token):
def __init__(self, text):
self.value = int(text)
def nud(self, context):
"""What the token evaluates to"""
return self.value
class Addition(Token):
lbp = 10 # Precedence
def led(self, left, context):
"""Compute the value of this token when between two expressions."""
# Fetch the expression to the right, stoping at the next boundary
# of same precedence
right_side = context.expression(self.lbp)
return left + right_side
class Substraction(Token):
lbp = 10 # Same precedence as addition
def led(self, left, context):
return left - context.expression(self.lbp)
def nud(self, context):
"""When a '-' is present on the left of an expression."""
# This means that we are returning the opposite of the next expression
return - context.expression(self.lbp)
class Multiplication(Token):
lbp = 20 # Higher precedence than addition/substraction
def led(self, left, context):
return left * context.expression(self.lbp)
lexer = Lexer(with_parens=True)
lexer.register_token(Integer, re.compile(r'\d+'))
lexer.register_token(Addition, re.compile(r'\+'))
lexer.register_token(Substraction, re.compile(r'-'))
lexer.register_token(Multiplication, re.compile(r'\*'))
def parse(text):
return lexer.parse(text)
使用它返回预期值:
>>> parse("1+1")
2
>>> parse("1 + -2 * 3")
-5
添加新令牌很简单:
class Division(Token):
lbp = 20 # Same precedence as Multiplication
def led(self, left, context):
return left // context.expression(self.lbp)
lexer.register_token(Division, re.compile(r'/'))
并使用它:
>>> parse("3 + 12 / 3")
7
让我们添加幂运算符:
class Power(Token):
lbp = 30 # Higher than mult
def led(self, left, context):
# We pick expressions with a lower precedence, so that
# 2 ** 3 ** 2 computes as 2 ** (3 ** 2)
return left ** context.expression(self.lbp - 1)
lexer.register_token(Power, re.compile(r'\*\*'))
并使用它:
>>> parse("2 ** 3 ** 2")
512