Chapter 8. Context-Free Grammars–Properties and Parsing
Pumping
lemma for regular sets presented in an earlier chapter was used to
prove that some languages are non-regular. Now, we give another pumping
lemma for context-free languages (CFL) whose application will be to show
that some languages are non-context-free. The idea behind this lemma is
that longer strings in a CFL, have substrings which can be pumped to
get infinite number of strings in the language.
8.1. Pumping Lemma for CFL
Theorem 8.1
Let L be a CFL. Then there
exists a number k (pumping length) such that if w is a string in L of
length at least ‘k,’ then w can be written as w = uυxyz satisfying the following conditions:
Proof Let G be a context-free grammar (CFG) in Chomsky normal form (CNF) generating L. Let ‘n’ be the number of non-terminals of G. Take k = 2n. Let ‘s’ be a string in L such that |s| ≥ k. Any parse tree in G for s must be of depth at least n. This can be seen as follows:
If the parse tree has depth n, it has no path of length greater than n; then the maximum length of the word derived is 2n−1. This statement can be proved by induction. If n = 1, the tree has structure . If n = 2, the tree has the structure . Assuming that the result holds upto i − 1, consider a tree with depth i. No path in this tree is of length greater than i. The tree has the structure as in the figure below.
T1 and T2 have depth i − 1 and the maximum length of the word derivable in each is 2i−2, and so the maximum length of the string derivable in T is 2i−2 + 2i−2 = 2i−1.
Choose a parse tree for s that has the least number of nodes. Consider the longest path in this tree. This path is of length at least ‘n + 1.’ Then, there must be at least n
+ 1 occurrences of non-terminals along this path. Consider the nodes in
this path starting from the leaf node and going up towards the root. By
pigeon-hole principle some non-terminal occurring on this path should
repeat. Consider the first pair of occurrences of the non-terminal A (say) which repeats while reading along the path from bottom to top. In Figure 8.1, the repetition of A thus identified allows us to replace the subtree under the second occurrence of the non-terminal A with the subtree under the first occurrence of A. The legal parse trees are given in Figure 8.1.
Figure 8.1. Derivation trees showing pumping property
We divide s as uυxyz as in Figure 8.1(i). Each occurrence of A has a subtree, under it generating a substring of s. The occurrence of A near the root of the tree generates the string ‘υxy’ where the second occurrence of A produces x. Both the occurrences of A produce substrings of s. Hence, one can replace the occurrence of A that produces x by a parse tree that produces υxy as shown in Figure 8.1(ii). Hence, strings of the form uυixyiz, for i > 0 are generated. One can replace the subtree rooted at A which produces ‘υxy’ by a subtree which produced x as in Figure 8.1(iii). Hence, the string ‘uxz’ is generated. In essence,
We have .
Hence, .
Therefore, we have .
Both υ and y simultaneously cannot be empty as we consider the grammar in CNF. The lower A will occur in the left or right subtree. If it occurs in the left subtree, y cannot be ε and if it occurs in the right subtree, υ cannot be ε.
The length of υxy is at most k, because the first occurrence of A generates υxy and the next occurrence generates x. The number of non-terminal occurrences between these two occurrences of A is less than n +1. Hence, length of υxy is at most 2n(= k). Hence the proof.
|
One
can use pumping lemma for showing that some languages are not
context-free. The method of proof will be similar to that of application
of pumping lemma for regular sets.
'
8.2. Closure Properties of CFL
In this section, we investigate the closure of CFLs
under some operations like union, intersection, difference,
substitution, homomorphism, and inverse homomorphism etc. The first
result that we will prove is closure under substitution, using which we
establish closure under union, catenation, catenation closure,
catenation +, and homomorphism.
Theorem 8.2
Let L be a CFL over TΣ and σ be a substitution on T such that σ(a) is a CFL for each a in T. Then σ(L) is a CFL.
Proof Let G = (N, T, P, S) be a CFG generating L. Since σ(a) is a CFL, let Ga = (Na, Ta, Pa, Sa) be a CFG generating σ(a) for each a ∊ T. Without loss of generality, Na ∩ Nb = φ and Na ∩ N = φ for a ≠ b, a, b ∊T. We now construct a CFG G′= (N′, T′, P′, S′) which generates σ (L) as follows:
Any derivation tree of G′ will typically look as in the following figure (Figure 8.2).
Figure 8.2. A derivation tree showing a string obtained by substitution
Here ab... k is a string of L and xaxb... xk is a string of σ(L). To understand the working of G′ producing σ(L), we have the following discussion:
A string w is in L(G′) if and only if w is in σ(L). Suppose w is in σ(L). Then, there is some string x = a1... ak in L and strings xi in σ(ai), 1 ≤ i ≤ k, such that w = x1 ... xk. Clearly from the construction of G′, Sai ... Sak is generated (for a1 ... ak ∊ L). From each Sai, xis are generated where xi ∊ σ(ai). This becomes clear from the above picture of derivation tree. Since G′ includes productions of Gai, x1 ... xk belongs to σ(L).
Conversely for w ∊ σ(L), we have to understand the proof with the help of the parse tree constructed above. That is, the start symbol of G and G′ are S. All the non-terminals of G, Ga’s are disjoint. Starting from S, one can use the productions of G′ and G and reach w = Sa1 ... Sak and w′ = a1 ... ak, respectively. Hence, whenever w has a parse tree T, one can identity a string a1a2 ... ak in L(G) and string in σ(ai) such that x1... xk ∊ σ(L). Since x1... xk is a string formed by substitution of strings xi’s for ai’s, we conclude w ∊ σ(L).
|
8.3. Decidability Results for CFL
The three decision problems that we studied for
regular languages are emptiness, finiteness, and membership problems.
The same can be studied for CFLs also. The discussion of the results
under this section is based on either the representation of a CFL as in a
PDA form or in simplified CFG form. Hence, we will be using CFG in CNF
or a PDA which accepts by empty stack or final state.
Theorem 8.8
Given a CFL L, there exists an algorithm to test whether L is empty, finite or infinite.
Proof To test whether L is empty, one can see whether the start symbol S of the CFG G = (N, T, S, P) which generates L is useful or not. If S is a useful symbol, then L ≠ φ.
To see whether the given CFL L is infinite, we have the following discussion. By pumping lemma for CFL, if L contains a word of length t, with |t| > k for a constant k (pumping length), then clearly L is infinite.
Conversely, if L is infinite it satisfies the conditions of the pumping lemma, otherwise L is finite. Hence, we have to test whether L contains a word of length greater than k.
|
8.4. SubFamilies of CFL
Definition 8.1
A CFG G = (N, T, P, S) is said to be linear if all rules are of the form A → x By or A → x, x, y ∊ T*, A, B ∊ N. i.e., the right-hand side consists of at most one non-terminal.
|
8.5. Parikh Mapping and Parikh’s Theorem
8.6. Self-embedding Property
In this section, we consider the self-embedding
property which makes CFL more powerful than regular sets. Pumping lemma
for CFL makes use of this property. By this property, it is possible to
pump equally on both sides of a substring which is lacking in regular
sets.
Definition 8.17
Let G = (N, T, P, S) be a CFG. A non-terminal A ∊ N is said to be self-embedding, if where x, y ∊ (N ∪ T)+. A grammar G is self-embedding if it has a self-embedding non-terminal.
|
8.7. Homomorphic Characterization
Earlier, we saw that the family of CFL is a full AFL. For an AFL F, if there exists a language L0 ∊ F, such that any language L in F can be obtained from L0 by means of some of these six operations, then L0 is called a generator for the AFL F. Any regular set is a generator for the family of regular sets. Let R be any regular set. Any other regular set R′ can be got by (Σ*∪ R) ∩ R′. Next, we show that the Dyck set is a generator for the family of CFL.
Definition 8.18
Consider the CFG
,
, n ≥ 1. The language L(G) is called the Dyck set over T and usually denoted by Dn.
|
Problems and Solutions
1. | Prove that the following languages are not context-free.
| ||||||||||||||||||
Solution. |
| ||||||||||||||||||
2. | Which of the following sets are context-free and which are not? Justify your answer.
a. L1 = {anbmck|n, m, k ≥ 1 and 2n = 3k, or 5n = 7m}.
| ||||||||||||||||||
Solution. |
This CFG generates L1.
| ||||||||||||||||||
b. | L2 = {aibjckdl, i, j, k, l ≥ 1, i = l, j = k}. | ||||||||||||||||||
Solution. | L2 is CFL generated by:
| ||||||||||||||||||
c. | L3 = {x ∊ {a, b, c}* |# ax = #bx = #cx}. | ||||||||||||||||||
Solution. | L3 is not context-free.
Suppose L3 is context-free.
Then, since the family of CFL is closed under intersection with regular sets L3 ∩ a* b* c* is regular.
This is {an bn cn |n ≥ 0}.
We have shown that this is not context-free
L3 is not context-free. | ||||||||||||||||||
d. | L4 = {ambn|n, m ≥ 0, 5m − 3n = 24}. | ||||||||||||||||||
Solution. | It can be seen that:
S → a3Sb5, S → a6b2 which generates L4.
| ||||||||||||||||||
e. | L5 = {ambn |n ≠ m}. | ||||||||||||||||||
Solution. | Worked out earlier in Chapter 2. | ||||||||||||||||||
3. | Let NL2 be the set of non-context-free languages. Determine whether or not.
a. NL2 is closed under union.
| ||||||||||||||||||
Solution. | No.
| ||||||||||||||||||
b. | NL2 is closed under complementation. | ||||||||||||||||||
Solution. | No.
| ||||||||||||||||||
c. | NL2 is closed under intersection. | ||||||||||||||||||
Solution. | No.
| ||||||||||||||||||
d. | NL2 is closed under catenation. | ||||||||||||||||||
Solution. | No.
| ||||||||||||||||||
e. | NL2 is closed under Kleene closure. | ||||||||||||||||||
Solution. | No.
is CFL.
is CFL.
| ||||||||||||||||||
4. | Is the language {xmyn|m, n ∊ N, m ≤ n ≤ 2m} context-free? Justify your answer. | ||||||||||||||||||
Solution. | Yes.
| ||||||||||||||||||
5. | Is the union of a collection of context-free languages always context-free? Justify your answer. | ||||||||||||||||||
Solution. | Finite union of CFL is CFL.
L1, L2, L3, ..., Lk are CFL.
Let Li be generated by Gi = (Ni, T, Pi, Si) and let Ni ∩ Nj = φ for i ≠ j
Infinite union of CFL is not CFL
Li = {ai|i is a particular prime}.
Li are CFL.
But is not a CFL.
| ||||||||||||||||||
6. | Consider the following context-free grammar:
For strings abaab and bab, bbb.
Construct the CYK parsing table.
Are these strings in L(G)?
| ||||||||||||||||||
Solution. |
|
No comments:
Post a Comment