Real Computer Science begins where we almost stop reading ...: TOC Chapter 2

Saturday, 26 January 2013

TOC Chapter 2

Chapter 2. Grammars

The idea of a grammar for a language has been known in India since the time of Panini (about 5th century B.C). Panini gave a grammar for the Sanskrit language. His work on Sanskrit had about 4000 rules (Sutras). From that, it is clear that the concept of recursion was known to the Indians and was used by them in very early times.

In 1959, Noam Chomsky tried to give a mathematical definition for grammar. The motivation was to give a formal definition for grammar for English sentences. He defined four types of grammar viz., type 0, type 1, type 2, and type 3. At the same time, the programming language ALGOL was being considered. It was considered as a block-structured language and a grammar was required which could describe all syntactically correct programs. The definition given was called Backus Normal Form or Backus-Naur Form (BNF). This definition was found to be the same as the definition given by Chomsky for type 2 grammar.

2.1. Definitions and Classification of Grammars

We now formally define the four types of grammars:

Definition 2.1

A phrase-structure grammar or a type 0 grammar is a 4-tuple G = (N, T, P, S), where N is a finite set of non-terminal symbols called the non-terminal alphabet, T is a finite set of terminal symbols called the terminal alphabet, S ∊ N is the start symbol and P is a set of productions (also called production rules or simply rules) of the form u → υ, where u ∊ (N ∪ T)*N(N ∪ T)* and υ ∊ (N ∪ T)*.

2.2. Ambiguity

Ambiguity in CFL is an important concept. It has applications to compilers. Generally, when a grammar is written for an expression or a programming language, we expect it to be unambiguous and during compiling, a unique code is generated. Consider the following English statement: “They are flying planes.” It can be parsed in two different ways.

In Figure 2.7, ‘They’ refers to the planes, and in Figure 2.8 ‘They’ refers to the persons on the plane. The ambiguity arises because we are able to have two different parse trees for the same sentence.

2.3. Simplification of CFGs

Context-free grammars can be put in simple forms. The simplification is done on productions and symbols of the grammar. Such simplified grammars should be equivalent to the original grammars started with. Such simplifications on CFGs are important as CFGs have wide applications in compilers.

A given CFG may have rules and symbols which do not contribute ultimately to its language. Hence, one can modify the grammar by removing such rules and symbols. Another issue is the presence of ε-rules in the productions. One may want to have a ε-free set of context-free rules in the grammar whenever the underlying CFL does not contain the empty string. We also give a simplified CFG that has no unit rules, which are of the form A → B where A and B are non-terminal symbols.

2.4. Normal Forms

We have seen in Section 2.3, how a given CFG can be simplified. In this section, we see different normal forms of CFG i.e., one can express the rules of the CFG in a particular form. These normal form grammars are easy to handle and are useful in proving results. The most popular normal forms are Weak Chomsky Normal Form (WCNF), Chomsky Normal Form (CNF), Strong Chomsky Normal Form (SCNF), and Greibach Normal Form (GNF).

Weak Chomsky Normal Form

Definition 2.16

Let G = (N, T, P, S) be a CFG. If each rule in P is of the form A → Δ , A → a or A → ε, where A ∊ N, Δ ∊ N⁺, a ∊ T, then G is said to be in WCNF.

Problems and Solutions

Problem 1.

Give CFG for the following:

a. {aⁿbⁿ|n ≥ 1}

Solution.

G = (N, T, P, S)

N = {S}, T = {a, b}, P consists of the following rules:

{1.S → aSb, and 2.S → ab}

Whenever rule 1 is applied, one ‘a’ is generated on the left and one ‘b’; on the right. The derivation terminates by using rule 2. A sample derivation and derivation tree are given below (Figure 2.12).

Figure 2.12. A sample derivation tree

S	⇒	aSb
	⇒	aaSbb
	⇒	aaaSbbb
	⇒	aaaaSbbbb
	⇒	aaaaa bbbbb

{aⁿb^mcⁿ|n, m ≥ 1}

Solution.

The grammar G = (N, T, P, S), N = {S, A}, T = {a, b, c} P is given as follows:

S → aSc
S → aAc
A → bA
A → b

Rule 1 and 2 generate equal number of a’s and c’s; rule 2 makes sure at least one a and one c are generated. Rule 3 and 4 generate b’s in the middle. Rule 4 makes sure at least one b is generated. It is to be noted that a’s and c’s are generated first and b’s afterwards.

{aⁿbⁿc^m|n, m ≥ 1}

Solution.

G = (N, T, P, S), N = {S, A, B}, T = {a, b, c}

P is given as follows:

S → AB
A → aAb
A → ab
B → cB
B → c

Rule 2 and 3 generate equal number of a’s and b’s. Rule 4 and 5 generate c’s. Rule 1 is applied first so that equal number of a’s and b’s are followed by a string of c’s.

In the following solutions, only the rules are given. Capital letters stand for non-terminals and small letters stand for terminals.

{aⁿbⁿc^md^m|n, m ≥ 1}

Solution.

S → AB
A → aAb
A → ab
B → cBd
B → cd

{aⁿb^mc^mdⁿ|n, m ≥ 1}

Solution.

S → aSd
S → aAd
A → bAc
A → bc

{aⁿb^m|n, m ≥ 1, n > m}

Solution.

S → aSb
S → aAb
A → aA
A → a

{aⁿb^m|n, m ≥ 1,n ≠ m}

Solution.

{aⁿb^m|n, m ≥ 1,n ≠ m} = {aⁿb^m|n, m ≥ 1,n > m} ∪ {aⁿb^m|n, m ≥ 1, m > n} Rules are:

S → aSb
S → aAb
A → aA
A → a
S → aBb
B → bB
B → b

Rules 1, 2, 3, and 4 generate more a’s than b’s. Rules 1, 5, 6, and 7 generate more b’s than a’s.

{wcw^R|w ∊ {a,b}*}

Solution.

Rules are:

S → aSa
S → bSb
S → c

{w|w ∊ {(,)}⁺, w is a well-formed string of parenthesis}

Solution.

{w|w ∊ {(,)}⁺, w is a well-formed string of parenthesis} Rules are:

S → SS
S → (S)
S → ( )

The following grammar generates the same language plus the empty string:

S → SaSbS
S → ε

Problem 2.

Give regular grammars for

a. {aⁿ|n ≥ 1}

Solution.

G = (N, T, P, S), N = {S}, T = {a}

P has the following rules:

S → aS
S → a

{aⁿb^m|n, m ≥ 1}

Solution.

We give below the rules only. Capital letters stand for non-terminals.

S → aS
S → aA
A → bA
A → b

{a²ⁿ|n ≥ 1}

Solution.

S → aA
A → aS
A → a

{(ab)ⁿ|n ≥ 1}

Solution.

S → aA
A → bS
A → b

{aⁿb^mc^p|n, m, p ≥ 1}

Solution.

S → aS
S → aA
A → bA
A → bB
B → cB
B → c

{(abc)ⁿ|n ≥ 1}

Solution.

S → aA
A → bB
B → cS
B → c

Exercises

Find a CFG for the languages over {0, 1} consisting of those strings in which the ratio of the number of 1’s to the number of 0’s is three to two.

Define CFGs that generate the following languages.

The set of odd length string S in {0, 1}* with middle symbol 1
{aⁱb^jc^k | j > i + k}.

Prove that the following CFGs do not generate

L = {x ∊{0, 1}* | #₀(x) = #₁(x)}

S → S01S | S10S | ε
S → 0S1 | 1S0 | 01S | S01 | S10 | ε.

Show that the CFG S → SS |a| b is ambiguous.

Convert the following to CNF.

S → ABA

A → aA | ε

B → bB | ε

Which of the following grammars are ambiguous? Are the languages generated inherently ambiguous?

1. S → ε
2. S → aSb
3. S → SS
1. S → ε
2. S → aSb
3. S → bSa
4. S → SS
1. S → bS
2. S → Sb
3. S → ε
1. S → SaSa
2. S → b
1. S → Sb
2. S → aSb
3. S → Sa
4. S → a
1. S → a
2. S → aaS
3. S → aaaS
1. S → A
2. S → aSb
3. S → bS
4. A → Aa
5. A → a
1. S → AA
2. A → AAA
3. A → bA
4. A → Ab
5. A → a

Consider L² where L = {ww^R | w ∊ {a, b}*}. Give an argument that L² is inherently ambiguous.

Give an unambiguous grammar for

L = {w | w ∊ {a, b}⁺, w has equal number of a’s and b’s}

Consider the grammars over the terminal alphabet Σ = {a, b} with rules

S → wSS

S → a

S → b
S → aSS

S → bSS

S → w

where w is some string over Σ. Show that each of these grammars is always ambiguous whatever w may be.

10.

Consider the grammar S → aS|aSbS|ε. Prove that the grammar generates all and only the strings of a’s and b’s such that every prefix has at least as many a’s as b’s. Show that the grammar is ambiguous. Find an equivalent unambiguous grammar.

11.

Consider the following grammar. E → +EE| * EE| − EE|x|y. Show that the grammar is unambiguous. What is the language generated?

12.

Which of the following CFLs you think are inherently ambiguous? Give arguments.

{aⁱb^jc^kd^l|i, j, k, l ≥ 1,i ≠ j and k ≠ l}
{aⁱb^jc^kd^l|i, j, k, l ≥ 1,i = k or j = l}
{aⁱb^jc^kd^l|i, j, k, l ≥ 1, i = j and k = l or i = l and j = k}
{aⁱb^jc^kd^l|i, j, k, l ≥ 1,i ≠ j or k ≠ l}

13.

Remove the useless symbols from the following CFGs.

1. S → ABB
2. S → CAC
3. A → a
4. B → Bc
5. B → ABB
6. C → bB
7. C → a
1. S → aSASb
2. S → Saa
3. S → AA
4. A → caA
5. A → Ac
6. B → bca
7. A → ε

14.

Consider the following CFGs. Construct equivalent CFGs without ε-productions.

S → bEf

E → bEc

E → GGc

G → b

G → KL

K → cKd

K → ε

L → dLe

L → ε
S → eSe

S → GH

G → cGb

G → ε

H → JHd

H → ε

J → bJ

J → f

15.

Remove unit production from the following grammar:

S → cBA
S → B
A → cB
A → AbbS
B → aaa

16.

Find a CFG with six productions (including ε productions) equivalent to the following grammar:

S → b | bHF | bH | bF

H → bHc | bc

F → dFe | de | G

G → dG |d

17.

Given two grammars G and G′

G

a. S → EFG

b. E → bEc

c. E → ε

d. F → cFd

e. F → ε

f. G → dGb

g. G → ε

G′

1′ S → EQ

2′ Q → FG

(b)–(g) as in G′.

Give an algorithm for converting a derivation in G′ of a terminal string into a derivation in G of the same terminal string.

18.

Let G and G′ be CFGs where

G

S → bSc

S → bc

S → bSSc

G′

S → bJ

J → bJc

J → c

J → bJbJc

Give an algorithm for converting a derivation in G of a terminal string into a derivation in G′ of the same terminal string
Give an algorithm for converting a derivation in G′ of a terminal string into a derivation in G.

19.

Suppose G is a CFG and w ∊ L(G) and | w | = n. How long is a derivation of w in G if:

G is in CNF
G is in GNF

20.

Show that every CFL without ε is generated by a CFG whose productions are of the form A → a, A → aB and A → aBC.

21.

An operator CFG represents an ε-free context-free grammar such that no production has two consecutive non-terminals on its right-hand side. That is, an ε-free CFG, G = (N, T, P, S), is an operator CFG if for all p ∊ P, rhs(p) ∉ (N ∪ T)*NN(N ∪ T)*. Prove that for every CFG G there exists an operator CFG, G′, such that L(G) = L(G′).

22.

Find CNF and GNF equivalent to the following grammars:

S → S ∧ S

S → S ∨ S

S → ⌍ S

S → (S)

S → p

S → q
E → E + E

E → E * E

E → (E)

E → a

23.

A grammar G = (N, T, P, S) is said to be self-embedding if there exists a non-terminal A ∊ N such that

, α, β ∊ (N ∪ T)⁺. If a grammar is non-self-embedding, it generates a regular set. Prove.

24.

A CFG G = (N, T, P, S) is said to be invertible if A → α and B → α implies A = B. For each CFG G, there is an invertible CFG G′ such that L(G) = L(G′). Prove.

25.

For each CFG G = (N, T, P, S) there is an invertible CFG, G′= (N′, T, P′, S′) such that L(G) = L(G′). Moreover

A → ε is in P′ if and only if ε ∊ L(G) and A = S′
S′ does not appear on the right-hand side of any rule in P′

26.

Let L be a CFL. Then show that L can be generated by a CFG G = (N, T, P, S) with the rules of the following forms:

S	→	ε
A	→	a, a ∊ T
A	→	B, B ∊ N− {S}
A	→	αBCβ, α, β ∊ ((N ∪ T) — {S})*, B, C ∊ (N ∪ T) — {S} and either B ∊ T or C ∊ T.

Real Computer Science begins where we almost stop reading ...

Saturday, 26 January 2013

TOC Chapter 2

Chapter 2. Grammars

2.1. Definitions and Classification of Grammars

Definition 2.1

2.2. Ambiguity

2.3. Simplification of CFGs

2.4. Normal Forms

Weak Chomsky Normal Form

Definition 2.16

Problems and Solutions

Figure 2.12. A sample derivation tree

Exercises

No comments:

Post a Comment