Chapter 2. Grammars
The
idea of a grammar for a language has been known in India since the time
of Panini (about 5th century B.C). Panini gave a grammar for the
Sanskrit language. His work on Sanskrit had about 4000 rules (Sutras).
From that, it is clear that the concept of recursion was known to the
Indians and was used by them in very early times.
In 1959, Noam Chomsky tried to give a mathematical
definition for grammar. The motivation was to give a formal definition
for grammar for English sentences. He defined four types of grammar
viz., type 0, type 1, type 2, and type 3. At the same time, the
programming language ALGOL was being considered. It was considered as a
block-structured language and a grammar was required which could
describe all syntactically correct programs. The definition given was
called Backus Normal Form or Backus-Naur Form (BNF). This definition was
found to be the same as the definition given by Chomsky for type 2
grammar.
2.1. Definitions and Classification of Grammars
We now formally define the four types of grammars:
Definition 2.1
A phrase-structure grammar or a type 0 grammar is a 4-tuple G = (N, T, P, S), where
N is a finite set of non-terminal symbols called the non-terminal
alphabet, T is a finite set of terminal symbols called the terminal
alphabet, S ∊ N is the start symbol and P is a set of productions (also called production rules or simply rules) of the form u → υ, where u ∊ (N ∪ T)*N(N ∪ T)* and υ ∊ (N ∪ T)*.
|
2.2. Ambiguity
Ambiguity in CFL is an important concept. It has
applications to compilers. Generally, when a grammar is written for an
expression or a programming language, we expect it to be unambiguous and
during compiling, a unique code is generated. Consider the following
English statement: “They are flying planes.” It can be parsed in two
different ways.
In Figure 2.7, ‘They’ refers to the planes, and in Figure 2.8
‘They’ refers to the persons on the plane. The ambiguity arises because
we are able to have two different parse trees for the same sentence.
2.3. Simplification of CFGs
Context-free grammars can be put in simple forms. The
simplification is done on productions and symbols of the grammar. Such
simplified grammars should be equivalent to the original grammars
started with. Such simplifications on CFGs are important as CFGs have
wide applications in compilers.
A given CFG may have rules and symbols which do not
contribute ultimately to its language. Hence, one can modify the grammar
by removing such rules and symbols. Another issue is the presence of ε-rules in the productions. One may want to have a ε-free
set of context-free rules in the grammar whenever the underlying CFL
does not contain the empty string. We also give a simplified CFG that
has no unit rules, which are of the form A → B where A and B are non-terminal symbols.
2.4. Normal Forms
We have seen in Section 2.3,
how a given CFG can be simplified. In this section, we see different
normal forms of CFG i.e., one can express the rules of the CFG in a
particular form. These normal form grammars are easy to handle and are
useful in proving results. The most popular normal forms are Weak
Chomsky Normal Form (WCNF), Chomsky Normal Form (CNF), Strong Chomsky
Normal Form (SCNF), and Greibach Normal Form (GNF).
Weak Chomsky Normal Form
Definition 2.16
Let G = (N, T, P, S) be a CFG. If each rule in P is of the form A → Δ , A → a or A → ε, where A ∊ N, Δ ∊ N+, a ∊ T, then G is said to be in WCNF.
|
Problems and Solutions
Problem 1. | Give CFG for the following:
a. {anbn|n ≥ 1}
| ||||||||||||||||
Solution. | G = (N, T, P, S)
N = {S}, T = {a, b}, P consists of the following rules:
Whenever rule 1 is applied, one ‘a’ is generated on the left and one ‘b’; on the right. The derivation terminates by using rule 2. A sample derivation and derivation tree are given below (Figure 2.12).
Figure 2.12. A sample derivation tree
| ||||||||||||||||
b. | {anbmcn|n, m ≥ 1} | ||||||||||||||||
Solution. | The grammar G = (N, T, P, S), N = {S, A}, T = {a, b, c} P is given as follows:
Rule 1 and 2 generate equal number of a’s and c’s; rule 2 makes sure at least one a and one c are generated. Rule 3 and 4 generate b’s in the middle. Rule 4 makes sure at least one b is generated. It is to be noted that a’s and c’s are generated first and b’s afterwards.
| ||||||||||||||||
c. | {anbncm|n, m ≥ 1} | ||||||||||||||||
Solution. | G = (N, T, P, S), N = {S, A, B}, T = {a, b, c}
P is given as follows:
Rule 2 and 3 generate equal number of a’s and b’s. Rule 4 and 5 generate c’s. Rule 1 is applied first so that equal number of a’s and b’s are followed by a string of c’s.
In the following solutions, only the rules are given.
Capital letters stand for non-terminals and small letters stand for
terminals.
| ||||||||||||||||
d. | {anbncmdm|n, m ≥ 1} | ||||||||||||||||
Solution. |
| ||||||||||||||||
e. | {anbmcmdn|n, m ≥ 1} | ||||||||||||||||
Solution. |
| ||||||||||||||||
f. | {anbm|n, m ≥ 1, n > m} | ||||||||||||||||
Solution. |
| ||||||||||||||||
g. | {anbm|n, m ≥ 1,n ≠ m} | ||||||||||||||||
Solution. | {anbm|n, m ≥ 1,n ≠ m} = {anbm|n, m ≥ 1,n > m} ∪ {anbm|n, m ≥ 1, m > n} Rules are:
Rules 1, 2, 3, and 4 generate more a’s than b’s. Rules 1, 5, 6, and 7 generate more b’s than a’s.
| ||||||||||||||||
h. | {wcwR|w ∊ {a,b}*} | ||||||||||||||||
Solution. | Rules are:
| ||||||||||||||||
i. | {w|w ∊ {(,)}+, w is a well-formed string of parenthesis} | ||||||||||||||||
Solution. | {w|w ∊ {(,)}+, w is a well-formed string of parenthesis} Rules are:
The following grammar generates the same language plus the empty string:
| ||||||||||||||||
Problem 2. | Give regular grammars for
a. {an|n ≥ 1}
| ||||||||||||||||
Solution. | G = (N, T, P, S), N = {S}, T = {a}
P has the following rules:
| ||||||||||||||||
b. | {anbm|n, m ≥ 1} | ||||||||||||||||
Solution. | We give below the rules only. Capital letters stand for non-terminals.
| ||||||||||||||||
c. | {a2n|n ≥ 1} | ||||||||||||||||
Solution. |
| ||||||||||||||||
d. | {(ab)n|n ≥ 1} | ||||||||||||||||
Solution. |
| ||||||||||||||||
e. | {anbmcp|n, m, p ≥ 1} | ||||||||||||||||
Solution. |
| ||||||||||||||||
f. | {(abc)n|n ≥ 1} | ||||||||||||||||
Solution. |
|
No comments:
Post a Comment