Saturday, 26 January 2013

TOC Chapter 2

Chapter 2. Grammars

The idea of a grammar for a language has been known in India since the time of Panini (about 5th century B.C). Panini gave a grammar for the Sanskrit language. His work on Sanskrit had about 4000 rules (Sutras). From that, it is clear that the concept of recursion was known to the Indians and was used by them in very early times.
In 1959, Noam Chomsky tried to give a mathematical definition for grammar. The motivation was to give a formal definition for grammar for English sentences. He defined four types of grammar viz., type 0, type 1, type 2, and type 3. At the same time, the programming language ALGOL was being considered. It was considered as a block-structured language and a grammar was required which could describe all syntactically correct programs. The definition given was called Backus Normal Form or Backus-Naur Form (BNF). This definition was found to be the same as the definition given by Chomsky for type 2 grammar.

2.1. Definitions and Classification of Grammars

We now formally define the four types of grammars:

Definition 2.1

A phrase-structure grammar or a type 0 grammar is a 4-tuple G = (N, T, P, S), where N is a finite set of non-terminal symbols called the non-terminal alphabet, T is a finite set of terminal symbols called the terminal alphabet, SN is the start symbol and P is a set of productions (also called production rules or simply rules) of the form uυ, where u ∊ (NT)*N(NT)* and υ ∊ (NT)*.

2.2. Ambiguity

Ambiguity in CFL is an important concept. It has applications to compilers. Generally, when a grammar is written for an expression or a programming language, we expect it to be unambiguous and during compiling, a unique code is generated. Consider the following English statement: “They are flying planes.” It can be parsed in two different ways.
In Figure 2.7, ‘They’ refers to the planes, and in Figure 2.8 ‘They’ refers to the persons on the plane. The ambiguity arises because we are able to have two different parse trees for the same sentence.

2.3. Simplification of CFGs

Context-free grammars can be put in simple forms. The simplification is done on productions and symbols of the grammar. Such simplified grammars should be equivalent to the original grammars started with. Such simplifications on CFGs are important as CFGs have wide applications in compilers.
A given CFG may have rules and symbols which do not contribute ultimately to its language. Hence, one can modify the grammar by removing such rules and symbols. Another issue is the presence of ε-rules in the productions. One may want to have a ε-free set of context-free rules in the grammar whenever the underlying CFL does not contain the empty string. We also give a simplified CFG that has no unit rules, which are of the form AB where A and B are non-terminal symbols.

2.4. Normal Forms

We have seen in Section 2.3, how a given CFG can be simplified. In this section, we see different normal forms of CFG i.e., one can express the rules of the CFG in a particular form. These normal form grammars are easy to handle and are useful in proving results. The most popular normal forms are Weak Chomsky Normal Form (WCNF), Chomsky Normal Form (CNF), Strong Chomsky Normal Form (SCNF), and Greibach Normal Form (GNF).

Weak Chomsky Normal Form

Definition 2.16

Let G = (N, T, P, S) be a CFG. If each rule in P is of the form A → Δ , Aa or Aε, where AN, Δ ∊ N+, aT, then G is said to be in WCNF.

Problems and Solutions

Problem 1.Give CFG for the following:
a. {anbn|n ≥ 1}
Solution.G = (N, T, P, S)
N = {S}, T = {a, b}, P consists of the following rules:
{1.SaSb, and 2.Sab}

Whenever rule 1 is applied, one ‘a’ is generated on the left and one ‘b’; on the right. The derivation terminates by using rule 2. A sample derivation and derivation tree are given below (Figure 2.12).
Figure 2.12. A sample derivation tree

SaSb
 aaSbb
 aaaSbbb
 aaaaSbbbb
 aaaaa bbbbb
b.{anbmcn|n, m ≥ 1}
Solution.The grammar G = (N, T, P, S), N = {S, A}, T = {a, b, c} P is given as follows:
  1. SaSc
  2. SaAc
  3. AbA
  4. Ab
Rule 1 and 2 generate equal number of a’s and c’s; rule 2 makes sure at least one a and one c are generated. Rule 3 and 4 generate b’s in the middle. Rule 4 makes sure at least one b is generated. It is to be noted that a’s and c’s are generated first and bs afterwards.
c.{anbncm|n, m ≥ 1}
Solution.G = (N, T, P, S), N = {S, A, B}, T = {a, b, c}
P is given as follows:
  1. SAB
  2. AaAb
  3. Aab
  4. BcB
  5. Bc
Rule 2 and 3 generate equal number of as and bs. Rule 4 and 5 generate cs. Rule 1 is applied first so that equal number of as and bs are followed by a string of c’s.
In the following solutions, only the rules are given. Capital letters stand for non-terminals and small letters stand for terminals.
d.{anbncmdm|n, m ≥ 1}
Solution.
  1. SAB
  2. AaAb
  3. Aab
  4. BcBd
  5. Bcd
e.{anbmcmdn|n, m ≥ 1}
Solution.
  1. SaSd
  2. SaAd
  3. AbAc
  4. Abc
f.{anbm|n, m ≥ 1, n > m}
Solution.
  1. SaSb
  2. SaAb
  3. AaA
  4. Aa
g.{anbm|n, m ≥ 1,nm}
Solution.{anbm|n, m ≥ 1,nm} = {anbm|n, m ≥ 1,n > m} ∪ {anbm|n, m ≥ 1, m > n} Rules are:
  1. SaSb
  2. SaAb
  3. AaA
  4. Aa
  5. SaBb
  6. BbB
  7. Bb
Rules 1, 2, 3, and 4 generate more as than bs. Rules 1, 5, 6, and 7 generate more bs than a’s.
h.{wcwR|w ∊ {a,b}*}
Solution.Rules are:
  1. SaSa
  2. SbSb
  3. Sc
i.{w|w ∊ {(,)}+, w is a well-formed string of parenthesis}
Solution.{w|w ∊ {(,)}+, w is a well-formed string of parenthesis} Rules are:
  1. SSS
  2. S → (S)
  3. S → ( )
The following grammar generates the same language plus the empty string:
  1. SSaSbS
  2. Sε
Problem 2.Give regular grammars for
a. {an|n ≥ 1}
Solution.G = (N, T, P, S), N = {S}, T = {a}
P has the following rules:
  1. SaS
  2. Sa
b.{anbm|n, m ≥ 1}
Solution.We give below the rules only. Capital letters stand for non-terminals.
  1. SaS
  2. SaA
  3. AbA
  4. Ab
c.{a2n|n ≥ 1}
Solution.
  1. SaA
  2. AaS
  3. Aa
d.{(ab)n|n ≥ 1}
Solution.
  1. SaA
  2. AbS
  3. Ab
e.{anbmcp|n, m, p ≥ 1}
Solution.
  1. SaS
  2. SaA
  3. AbA
  4. AbB
  5. BcB
  6. Bc
f.{(abc)n|n ≥ 1}
Solution.
  1. SaA
  2. AbB
  3. BcS
  4. Bc

Exercises

1.Find a CFG for the languages over {0, 1} consisting of those strings in which the ratio of the number of 1’s to the number of 0’s is three to two.
2.Define CFGs that generate the following languages.
  1. The set of odd length string S in {0, 1}* with middle symbol 1
  2. {aibjck | j > i + k}.
3.Prove that the following CFGs do not generate
L = {x ∊{0, 1}* | #0(x) = #1(x)}
  1. SS01S | S10S | ε
  2. S → 0S1 | 1S0 | 01S | S01 | S10 | ε.
4.Show that the CFG SSS |a| b is ambiguous.
5.Convert the following to CNF.
SABA
AaA | ε
BbB | ε
6.Which of the following grammars are ambiguous? Are the languages generated inherently ambiguous?
    1. Sε
    2. SaSb
    3. SSS
    1. Sε
    2. SaSb
    3. SbSa
    4. SSS
    1. SbS
    2. SSb
    3. Sε
    1. SSaSa
    2. Sb
    1. SSb
    2. SaSb
    3. SSa
    4. Sa
    1. Sa
    2. SaaS
    3. SaaaS
    1. SA
    2. SaSb
    3. SbS
    4. AAa
    5. Aa
    1. SAA
    2. AAAA
    3. AbA
    4. AAb
    5. Aa
7.Consider L2 where L = {wwR | w ∊ {a, b}*}. Give an argument that L2 is inherently ambiguous.
8.Give an unambiguous grammar for
L = {w | w ∊ {a, b}+, w has equal number of a’s and b’s}
9.Consider the grammars over the terminal alphabet Σ = {a, b} with rules
  1. SwSS
    Sa
    Sb
  2. SaSS
    SbSS
    Sw
where w is some string over Σ. Show that each of these grammars is always ambiguous whatever w may be.
10.Consider the grammar SaS|aSbS|ε. Prove that the grammar generates all and only the strings of a’s and b’s such that every prefix has at least as many a’s as b’s. Show that the grammar is ambiguous. Find an equivalent unambiguous grammar.
11.Consider the following grammar. E → +EE| * EE| − EE|x|y. Show that the grammar is unambiguous. What is the language generated?
12.Which of the following CFLs you think are inherently ambiguous? Give arguments.
  1. {aibjckdl|i, j, k, l ≥ 1,i ≠ j and kl}
  2. {aibjckdl|i, j, k, l ≥ 1,i = k or j = l}
  3. {aibjckdl|i, j, k, l ≥ 1, i = j and k = l or i = l and j = k}
  4. {aibjckdl|i, j, k, l ≥ 1,i ≠ j or kl}
13.Remove the useless symbols from the following CFGs.
    1. SABB
    2. SCAC
    3. Aa
    4. BBc
    5. BABB
    6. CbB
    7. Ca
    1. SaSASb
    2. SSaa
    3. SAA
    4. AcaA
    5. AAc
    6. Bbca
    7. Aε
14.Consider the following CFGs. Construct equivalent CFGs without ε-productions.
  1. SbEf
    EbEc
    EGGc
    Gb
    GKL
    KcKd
    Kε
    LdLe
    Lε
  2. SeSe
    SGH
    GcGb
    Gε
    HJHd
    Hε
    JbJ
    Jf
15.Remove unit production from the following grammar:
  1. ScBA
  2. SB
  3. AcB
  4. AAbbS
  5. Baaa
16.Find a CFG with six productions (including ε productions) equivalent to the following grammar:
Sb | bHF | bH | bF
HbHc | bc
FdFe | de | G
GdG |d
17.Given two grammars G and G
G
a. SEFG
b. EbEc
c. Eε
d. FcFd
e. Fε
f. GdGb
g. Gε
G
1′ SEQ
2′ QFG
(b)–(g) as in G′.
Give an algorithm for converting a derivation in G′ of a terminal string into a derivation in G of the same terminal string.
18.Let G and G′ be CFGs where
G
SbSc
Sbc
SbSSc
G
SbJ
JbJc
Jc
JbJbJc
  1. Give an algorithm for converting a derivation in G of a terminal string into a derivation in G′ of the same terminal string
  2. Give an algorithm for converting a derivation in G′ of a terminal string into a derivation in G.
19.Suppose G is a CFG and wL(G) and | w | = n. How long is a derivation of w in G if:
  1. G is in CNF
  2. G is in GNF
20.Show that every CFL without ε is generated by a CFG whose productions are of the form Aa, AaB and AaBC.
21.An operator CFG represents an ε-free context-free grammar such that no production has two consecutive non-terminals on its right-hand side. That is, an ε-free CFG, G = (N, T, P, S), is an operator CFG if for all pP, rhs(p) ∉ (N ∪ T)*NN(N ∪ T)*. Prove that for every CFG G there exists an operator CFG, G′, such that L(G) = L(G′).
22.Find CNF and GNF equivalent to the following grammars:
  1. SSS
    SSS
    S → ⌍ S
    S → (S)
    Sp
    Sq
  2. EE + E
    EE * E
    E → (E)
    Ea
23.A grammar G = (N, T, P, S) is said to be self-embedding if there exists a non-terminal AN such that , α, β ∊ (N ∪ T)+. If a grammar is non-self-embedding, it generates a regular set. Prove.
24.A CFG G = (N, T, P, S) is said to be invertible if Aα and Bα implies A = B. For each CFG G, there is an invertible CFG G′ such that L(G) = L(G′). Prove.
25.For each CFG G = (N, T, P, S) there is an invertible CFG, G′= (N′, T, P′, S′) such that L(G) = L(G′). Moreover
  1. Aε is in P′ if and only if εL(G) and A = S
  2. S′ does not appear on the right-hand side of any rule in P
26.Let L be a CFL. Then show that L can be generated by a CFG G = (N, T, P, S) with the rules of the following forms:
Sε
Aa, aT
AB, BN− {S}
AαBCβ, α, β ∊ ((N ∪ T) {S})*, B, C ∊ (NT) {S} and either BT or CT.


No comments:

Post a Comment