Top-down parsing LL The easiest way of parsing something according to a grammar in use today is called LL parsing or top-down parsing. It works like this: for each production find out which terminals the production can start with. This is called the start set. Then, when parsing, you just start with the start symbol and compare the start sets of the different productions against the first piece of input to see which of the productions have been used. Of course, this can only be done if no two start sets for one symbol both contain the same terminal. If they do there is no way to determine which production to choose by looking at the first terminal on the input.
|Published (Last):||5 January 2019|
|PDF File Size:||13.44 Mb|
|ePub File Size:||13.92 Mb|
|Price:||Free* [*Free Regsitration Required]|
Top-down parsing LL The easiest way of parsing something according to a grammar in use today is called LL parsing or top-down parsing. It works like this: for each production find out which terminals the production can start with. This is called the start set. Then, when parsing, you just start with the start symbol and compare the start sets of the different productions against the first piece of input to see which of the productions have been used.
Of course, this can only be done if no two start sets for one symbol both contain the same terminal. If they do there is no way to determine which production to choose by looking at the first terminal on the input. The number in the parenthesis tells you the maximum number of terminals you may have to look at at a time to choose the right production at any point in the grammar. This is only possible if all symbols have only one production, and if they only have one production the language can only have one string.
In other words: LL 0 grammars are not interesting. The most common and useful kind of LL grammar is LL 1 where you can always choose the right production by looking at only the first terminal on the input at any given time. With LL 2 you have to look at two symbols, and so on. There exist grammars that are not LL k grammars for any fixed value of k at all, and they are sadly quite common. For the symbol D this is easy: all productions have a single digit as their start set the one they produce and the D symbol has the set of all ten digits as its start set.
This means that we have at best an LL 1 grammar, since in this case we need to look at one terminal to choose the right production. With DL we run into trouble. Both productions start with D and thus both have the same start set. This means that one cannot see which production to choose by looking at just the first terminal of the input.
However, we can easily get round this problem by cheating: if the second terminal on input is not a digit we must have used the first production, but if they both are digits we must have used the second one. In other words, this means that this is at best an LL 2 grammar. I actually simplified things a little here. The FN symbol turns out to be even worse, since both productions have all digits as their start set.
Somewhat surprisingly perhaps, the S symbol is easy. If not, the second one was used. An LL transformation example However, there is no need to despair. Most grammars that are not LL k can fairly easily be converted to LL 1 grammars. DR then has two productions: D DR more digits or no more digits.
This technique collects input until it finds that it can reduce an input sequence with a symbol. We start by reading 3 from the input: 3 and then we look to see if we can reduce it to the symbol it was produced from. And indeed we can, it was produced from the D symbol, which we replace the 3 with.
The grammar is ambiguous, which means that we can reduce further to FN, which would be wrong. For simplicity we just skip the wrong steps here, but an unambiguous grammar would not allow these wrong choices. After that we read the. We then reduce that to a D and read the next character, which is 4.
D DL DL. DL" sequence and do a reduction. DL FN S As you may have noted we could often choose whether to do a reduction now or wait until we had more symbols and then do a different reduction. LL or LR? And, if you have to debug a parser, looking at a recursive-descent parser a common way to program an LL parser is much simpler than the tables of a LALR parser. Left factoring is necessary because LL parsing requires selecting an alternative based on a fixed number of input tokens. Left recursion is problematic because a lookahead token of a rule is always in the lookahead token on that same rule.
Everything in set A is in set A This causes the rule to recurse forever and ever and ever and ever You just have to be in the right "LL" mindset, which usually involves watching 8 hours of Dr.
Who before writing the grammar FC6D86D3 scruz. More information John Aycock has developed an unusually nice and simple to use parsing framework in Python called SPARK , which is described in his very readable paper. Beware, though, that this is a rather advanced and mathematical book. Henry Baker has written an article about parsing in Common Lisp , which presents a simple, high-performant and very convenient framework for parsing. The approach is similar to that of compiler-compilers, but instead relies on the very powerful macro system of Common Lisp.
Another can be found in the ISO standard. Acknowledgements Thanks to: Jelks Cabaniss , for encouraging me to turn the news article into a web article, and for providing very useful criticism of the article once it appeared in web form.
I have asked for permission to quote this, but have received no reply, unfortunately. Dave Pawson for correcting a bad link. Last update , by Lars M.
BNF and EBNF: What are they and how do they work?
Our tree will have a root: one non-terminal representing our entire document. The root will contain other non-terminals that will contain other non-terminals and so on. The picture below show how we can go from a stream of tokens or terminals to an AST, which groups terminals into a hierarchy of non-terminals. We have seen that non-terminals represent structures at different levels. Some of them can contain other statements.
EBNF: How to Describe the Grammar of a Language
Extended Backus–Naur form