What are the three examples of interpreter?
Some say that “everything boils down to ones and zeros”—but do we really understand how our programs get translated into those bits? Show
Compilers and interpreters both take a raw string representing a program, parse it, and make sense of it. Though interpreters are the simpler of the two, writing even a very simple interpreter (that only does addition and multiplication) will be instructive. We’ll focus on what compilers and interpreters have in common: lexing and parsing the input. The Dos and Don’ts of Writing Your Own InterpreterReaders may wonder What’s wrong with a regex? Regular expressions are powerful, but source code grammars aren’t simple enough to be parsed by them. Neither are domain-specific languages (DSLs), and a client might need a custom DSL for authorization expressions, for example. But even without applying this skill directly, writing an interpreter makes it much easier to appreciate the effort behind many programming languages, file formats, and DSLs. Writing parsers correctly by hand can be challenging with all the edge cases involved. That’s why there are popular tools, like ANTLR, that can generate parsers for many popular programming languages. There are also libraries called parser combinators, that enable developers to write parsers directly in their preferred programming languages. Examples include FastParse for Scala and Parsec for Python. We recommend that, in a professional context, readers use such tools and libraries to avoid reinventing the wheel. Still, understanding the challenges and possibilities of writing an interpreter from scratch will help developers leverage such solutions more effectively. Overview of Interpreter ComponentsAn interpreter is a complex program, so there are multiple stages to it:
We won’t build a specific, integrated interpreter here. Instead, we’ll explore each of these parts and their common issues with separate examples. In the end, the user code will look like this:
Following the three stages, we’ll expect this code to
calculate a final value and print
Specifically, the code here is written in Scala3 optional-braces syntax (a Pythonlike, indentation-based syntax). But none of the approaches are Scala-specific, and Scala is similar to many other languages: Readers will find it straightforward to convert these code samples into other languages. Barring that, the examples can be run online using Scastie. Lastly, the Lexer, Parser, and Interpreter sections have different example grammars. As the corresponding GitHub repo shows, the dependencies in later examples change slightly to implement these grammars, but the overall concepts stay the same. Interpreter Component 1: Writing a LexerLet’s say we want to lex this string:
Whitespace between tokens will be skipped in this example. At this stage, expressions don’t have to make sense; the lexer simply converts the input string into a list of tokens. (The job of “making sense of tokens” is left to the parser.) We’ll use this code to represent a token:
Every token has a type, textual representation, and position in the original input. The position can help end users of the lexer with debugging. The This will be the output of our lexer:
Let’s examine the implementation:
We start with an empty list of tokens, then we go through the string and add tokens as they come. We use the lookahead character to decide the type of the next token. Note that the lookahead character is not always the furthest character ahead being examined. Based on the lookahead, we know what the token looks like and use If the lookahead is whitespace, we skip it. Single letter tokens are trivial; we add them and increment the index. For integers, we only need to take care of the index. Now we get to something a bit complicated: identifiers versus literals. The rule is that we take the longest possible match and check if it’s a literal; if it’s not, it’s an identifier. Take care when handling operators like With that, our lexer has produced a list of tokens. Interpreter Component 2: Writing a ParserWe have to give some structure to our tokens—we can’t do much with a list alone. For example, we need to know: Which expressions are nested? Which operators are applied in what order? Which scoping rules apply, if any? A tree structure supports nesting and ordering. But first, we have to define some rules for constructing trees. We would like our parser to be unambiguous—to always return the same structure for a given input. Note that the following parser doesn’t use the previous lexer example. This one is for adding numbers, so its grammar has only two tokens,
An equivalent, using a pipe character (
Either way, we have two rules: one that says that we can sum two Rules are usually specified with a formal grammar. A formal grammar consists of: The rules themselves, as shown above A starting rule (the first rule specified, per convention) Two types of symbols to define the rules with: Terminals: the “letters” (and other characters) of our language—the irreducible symbols that make up tokens Nonterminals: intermediate constructs used for parsing (i.e., symbols that can be replaced) Only a nonterminal can be on the left side of a rule; the right side can have both terminals and nonterminals. In the example above, the terminals are There are many ways we could implement this parser. Here, we’ll use a “recursive descent” parsing technique. It’s the most common type because it’s the simplest to understand and implement. A recursive descent parser uses one function for each nonterminal in the grammar. It starts from the starting rule and goes down from there (hence “descent”), figuring out which rule to apply in each function. The “recursive” part is vital because we can nest nonterminals recursively! Regexes can’t do that: They can’t even handle balanced parentheses. So we need a more powerful tool. A parser for the first rule would look something like this (full code):
The Grammar AmbiguityThe first issue is the ambiguity of our grammar, which may not be apparent at first glance:
Given the input That’s why we need to introduce some asymmetry:
The set of expressions we can represent with this grammar hasn’t changed since its first version. Only now it is unambiguous: The parser always goes left. Just what we needed! This makes our Left-recursive RulesUnfortunately, the above fix doesn’t solve our other problem, left recursion:
We have infinite recursion here. If we were to step into this function, we’d eventually get a stack-overflow error. But parsing theory can help! Suppose
we have a grammar like this, where
We can rewrite this grammar as:
There, Let’s take the current revision of our grammar:
Following the method of rewriting parsing rules detailed above, with
Now the grammar is OK, and we can parse it with a recursive descent parser. Let’s see how such a parser would look for this latest iteration of our grammar:
Here we use the Also, if we switch to a streaming lexer, we wouldn’t have an in-memory list but an iterator, so we need a marker to know when we come to the end of the input. When we come to the end, the Walking through the code, we can see that an expression can be just a number. If there’s nothing left, the next token wouldn’t be a If the input string has more tokens, then these would have to look like Generating an ASTNow that we successfully parsed our expression, it’s hard to do anything with it as is. We could put some callbacks in our parser, but that would be very cumbersome and unreadable. Instead, we will return an AST, a tree representing the input expression:
This resembles our rules, using simple data classes. Our parser now returns a useful data structure:
For Simplifying RulesOur
It’s hard to recognize the pattern it represents in our grammar just by looking at it. It turns out that we can replace this recursion with a simpler construct:
This construct simply means Now our full grammar looks like this:
And our AST looks nicer:
The resulting parser is the same length but simpler to understand and use. We’ve eliminated We didn’t even need the Consider that if we had multiple possible operators, like
In this case, the AST then needs
Note that the Interpreter Component 3: Writing an InterpreterOur interpreter will make use of our lexer and parser to get the AST of our input expression and then evaluate that AST whichever way we want. In this case, we’re dealing with numbers, and we want to evaluate their sum. In the implementation of our interpreter example, we will use this simple grammar:
And this AST:
(We’ve covered how to implement a lexer and a parser for similar grammars, but any readers who get stuck can peruse the lexer and parser implementations for this exact grammar in the repo.) Now we’ll see how to write an interpreter for the above grammar:
If we parsed our input into an AST without encountering an error, we’re sure that we’ll always have at least one The note from the beginning about the left associativity of But if we want to go beyond interpreting plus and minus operators, there’s another rule to define. PrecedenceWe know how to parse a simple expression like Most people agree on the convention that multiplication has higher precedence than addition. But the parser doesn’t know that. We can’t just evaluate it as This means that we need to evaluate multiplication first. Multiplication needs to be further from the root of the AST to force it to be evaluated before addition. For this, we need to introduce yet another layer of indirection. Fixing a Naive Grammar From Start to FinishThis is our original, left-recursive grammar, which has no precedence rules:
First, we give it rules of precedence and remove its ambiguity:
Then it gets non-left-recursive rules:
The result is a beautifully expressive AST:
This leaves us with a concise interpreter implementation:
As before, the ideas in the requisite lexer and grammar were covered earlier, but readers can find them in the repo if needed. Next Steps in Writing InterpretersWe didn’t cover this, but error handling and reporting are crucial features of any parser. As developers, we know how frustrating it can be when a compiler produces confusing or misleading errors. It’s an area that has many interesting problems to solve, like giving correct and precise error messages, not deterring the user with more messages than necessary, and recovering gracefully from errors. It’s up to the developers writing an interpreter or compiler to ensure their future users have a better experience. In walking through our example lexers, parsers, and interpreters, we only scratched the surface of the theories behind compilers and interpreters, which cover topics like:
For further reading, I recommend these resources:
What are the examples of interpreters?An Interpreter directly executes instructions written in a programming or scripting language without previously converting them to an object code or machine code. Examples of interpreted languages are Perl, Python and Matlab. Following are some interesting facts about interpreters and compilers.
What are the three types of interpreters?The three basic interpretation modes are simultaneous interpretation (SI), consecutive interpretation, and whispered interpretation. However, modern linguists suggest that there are more than simultaneous interpretation, consecutive interpretation, and whispered interpretation to interpretation modes.
What are 3 popular examples of interpreted languages?Examples of common interpreted languages are PHP, Ruby, Python, and JavaScript.
Is basic an example of interpreter?A self-interpreter is a programming language interpreter written in a programming language which can interpret itself; an example is a BASIC interpreter written in BASIC.
|