Saturday 26 January 2013

TOC Chapter 8

Chapter 8. Context-Free Grammars–Properties and Parsing

Pumping lemma for regular sets presented in an earlier chapter was used to prove that some languages are non-regular. Now, we give another pumping lemma for context-free languages (CFL) whose application will be to show that some languages are non-context-free. The idea behind this lemma is that longer strings in a CFL, have substrings which can be pumped to get infinite number of strings in the language.

8.1. Pumping Lemma for CFL

Theorem 8.1

Let L be a CFL. Then there exists a number k (pumping length) such that if w is a string in L of length at least ‘k,’ then w can be written as w = uυxyz satisfying the following conditions:
  1. |υy| > 0
  2. |υxy| ≤ k
  3. For each i0, uυixyizL
Proof Let G be a context-free grammar (CFG) in Chomsky normal form (CNF) generating L. Let ‘n’ be the number of non-terminals of G. Take k = 2n. Let ‘s’ be a string in L such that |s| ≥ k. Any parse tree in G for s must be of depth at least n. This can be seen as follows:
If the parse tree has depth n, it has no path of length greater than n; then the maximum length of the word derived is 2n−1. This statement can be proved by induction. If n = 1, the tree has structure . If n = 2, the tree has the structure . Assuming that the result holds upto i − 1, consider a tree with depth i. No path in this tree is of length greater than i. The tree has the structure as in the figure below.
T1 and T2 have depth i − 1 and the maximum length of the word derivable in each is 2i−2, and so the maximum length of the string derivable in T is 2i−2 + 2i−2 = 2i−1.
Choose a parse tree for s that has the least number of nodes. Consider the longest path in this tree. This path is of length at least ‘n + 1.’ Then, there must be at least n + 1 occurrences of non-terminals along this path. Consider the nodes in this path starting from the leaf node and going up towards the root. By pigeon-hole principle some non-terminal occurring on this path should repeat. Consider the first pair of occurrences of the non-terminal A (say) which repeats while reading along the path from bottom to top. In Figure 8.1, the repetition of A thus identified allows us to replace the subtree under the second occurrence of the non-terminal A with the subtree under the first occurrence of A. The legal parse trees are given in Figure 8.1.
Figure 8.1. Derivation trees showing pumping property

We divide s as uυxyz as in Figure 8.1(i). Each occurrence of A has a subtree, under it generating a substring of s. The occurrence of A near the root of the tree generates the string ‘υxy’ where the second occurrence of A produces x. Both the occurrences of A produce substrings of s. Hence, one can replace the occurrence of A that produces x by a parse tree that produces υxy as shown in Figure 8.1(ii). Hence, strings of the form ixyiz, for i > 0 are generated. One can replace the subtree rooted at A which produces ‘υxy’ by a subtree which produced x as in Figure 8.1(iii). Hence, the string ‘uxz’ is generated. In essence,

We have .
Hence, .
Therefore, we have .
Both υ and y simultaneously cannot be empty as we consider the grammar in CNF. The lower A will occur in the left or right subtree. If it occurs in the left subtree, y cannot be ε and if it occurs in the right subtree, υ cannot be ε.
The length of υxy is at most k, because the first occurrence of A generates υxy and the next occurrence generates x. The number of non-terminal occurrences between these two occurrences of A is less than n +1. Hence, length of υxy is at most 2n(= k). Hence the proof.

One can use pumping lemma for showing that some languages are not context-free. The method of proof will be similar to that of application of pumping lemma for regular sets.
'

8.2. Closure Properties of CFL

In this section, we investigate the closure of CFLs under some operations like union, intersection, difference, substitution, homomorphism, and inverse homomorphism etc. The first result that we will prove is closure under substitution, using which we establish closure under union, catenation, catenation closure, catenation +, and homomorphism.

Theorem 8.2

Let L be a CFL over TΣ and σ be a substitution on T such that σ(a) is a CFL for each a in T. Then σ(L) is a CFL.
Proof Let G = (N, T, P, S) be a CFG generating L. Since σ(a) is a CFL, let Ga = (Na, Ta, Pa, Sa) be a CFG generating σ(a) for each aT. Without loss of generality, NaNb = φ and NaN = φ for ab, a, bT. We now construct a CFG G′= (N′, T′, P′, S′) which generates σ (L) as follows:
1.N′ is the union of , aT and N
2.
3.P′ consists of:
  • all productions in Pa for aT
  • all productions in P, but for each terminal a occurring in any rule of P, is to be replaced by Sa. i.e., in Aα, every occurrence of a (∊ T) in α is replaced by Sa.

Any derivation tree of G′ will typically look as in the following figure (Figure 8.2).
Figure 8.2. A derivation tree showing a string obtained by substitution

Here ab... k is a string of L and xaxb... xk is a string of σ(L). To understand the working of G′ producing σ(L), we have the following discussion:
A string w is in L(G′) if and only if w is in σ(L). Suppose w is in σ(L). Then, there is some string x = a1... ak in L and strings xi in σ(ai), 1 ≤ ik, such that w = x1 ... xk. Clearly from the construction of G′, Sai ... Sak is generated (for a1 ... akL). From each Sai, xis are generated where xiσ(ai). This becomes clear from the above picture of derivation tree. Since G′ includes productions of Gai, x1 ... xk belongs to σ(L).
Conversely for wσ(L), we have to understand the proof with the help of the parse tree constructed above. That is, the start symbol of G and G′ are S. All the non-terminals of G, Ga’s are disjoint. Starting from S, one can use the productions of G′ and G and reach w = Sa1 ... Sak and w′ = a1 ... ak, respectively. Hence, whenever w has a parse tree T, one can identity a string a1a2 ... ak in L(G) and string in σ(ai) such that x1... xkσ(L). Since x1... xk is a string formed by substitution of strings xis for ais, we conclude wσ(L).

8.3. Decidability Results for CFL

The three decision problems that we studied for regular languages are emptiness, finiteness, and membership problems. The same can be studied for CFLs also. The discussion of the results under this section is based on either the representation of a CFL as in a PDA form or in simplified CFG form. Hence, we will be using CFG in CNF or a PDA which accepts by empty stack or final state.

Theorem 8.8

Given a CFL L, there exists an algorithm to test whether L is empty, finite or infinite.
Proof To test whether L is empty, one can see whether the start symbol S of the CFG G = (N, T, S, P) which generates L is useful or not. If S is a useful symbol, then Lφ.
To see whether the given CFL L is infinite, we have the following discussion. By pumping lemma for CFL, if L contains a word of length t, with |t| > k for a constant k (pumping length), then clearly L is infinite.
Conversely, if L is infinite it satisfies the conditions of the pumping lemma, otherwise L is finite. Hence, we have to test whether L contains a word of length greater than k.

8.4. SubFamilies of CFL

In this section, we consider the special cases of CFLs.

Definition 8.1

A CFG G = (N, T, P, S) is said to be linear if all rules are of the form Ax By or Ax, x, yT*, A, BN. i.e., the right-hand side consists of at most one non-terminal.
'

8.5. Parikh Mapping and Parikh’s Theorem

We present in this section a result which connects CFLs to semi-linear sets.

8.6. Self-embedding Property

In this section, we consider the self-embedding property which makes CFL more powerful than regular sets. Pumping lemma for CFL makes use of this property. By this property, it is possible to pump equally on both sides of a substring which is lacking in regular sets.

Definition 8.17

Let G = (N, T, P, S) be a CFG. A non-terminal AN is said to be self-embedding, if where x, y ∊ (NT)+. A grammar G is self-embedding if it has a self-embedding non-terminal.

8.7. Homomorphic Characterization

Earlier, we saw that the family of CFL is a full AFL. For an AFL F, if there exists a language L0F, such that any language L in F can be obtained from L0 by means of some of these six operations, then L0 is called a generator for the AFL F. Any regular set is a generator for the family of regular sets. Let R be any regular set. Any other regular set R′ can be got by (Σ*∪ R) ∩ R′. Next, we show that the Dyck set is a generator for the family of CFL.

Definition 8.18

Consider the CFG
,
, n ≥ 1. The language L(G) is called the Dyck set over T and usually denoted by Dn.

Problems and Solutions

1.Prove that the following languages are not context-free.
  1. L1 = {ap|p is a prime}.
  2. L2 = {a,b}* − {anbn2 | n ≥ 0}
Solution.
  1. L1 = {ap|p is a prime}.
    Suppose L1 is context-free.
    Then, by pumping lemma there exists k such that for all zL1 and |z| ≥ k. z can be written in the form uvxyz such that ixyizL for all i ≥ 0. Consider some p > k. apL1
    ap = uυxyz

    Now u, υ, x, y, za*. Therefore, by pumping lemma:
    uxz(υy)iL1 for all i ≥ 0
    Let |υy| = r
    uxz(ar)iL1 for all i ≥ 0
    or
    z(ar)i − 1L1 for all i ≥ 0
    ap+ r(i − 1)L1 for all i ≥ 0
    Choose i such that p + r(i − 1) is not a prime. Select i − 1 = p. Therefore, i = p + 1.
    ap+ rpL1.
    But, ap + rp = ap (r + 1).
    p(r + 1) is not a prime. So, we come to the conclusion that as where s is not a prime belong to L1. This is a contradiction.
    Therefore, L1 is not context-free.
  2. L2 = {a, b}* − {anbn2| n ≥ 0}.
    Suppose L2 is context-free.
    Since the family of context-free languages is closed under intersection with regular sets. L2a*b* is context-free.
    This contains strings of the form L3 = {anbm| mn2}.
    We shall show this is not context-free. If L3 is context-free, then by pumping lemma there is a constant k which satisfies the conditions of pumping lemma. Choose z = anbm where n > k, mn2.
    z = uυxyz where |υxy| ≤ k, |υy| ≥ 1 such that ixyizL3 for all i ≥ 0. If υ or y consists of both a and b, then by pumping we shall get a string which is not of the form aibj. If υa*, yb*, then we have to show by pumping that we can get a string not in L3. Let υ = ap and y = bq. Then an pbm qL (i = 0).
    Choose m = (np)2 + q. i.e., we started with anb(np)2+q. Then an−pb(np)2+ q−q = a(np)b(np)2L3.
    This is a contradiction.
    L3 is not context-free and hence L2 is not context-free.
2.Which of the following sets are context-free and which are not? Justify your answer.
a. L1 = {anbmck|n, m, k ≥ 1 and 2n = 3k, or 5n = 7m}.
Solution.
SASBD
Aa3 Ac2DcD
A → a3Cc2Dc
CbCBa7Bb5
CbBa7b5

This CFG generates L1.
b.L2 = {aibjckdl, i, j, k, l ≥ 1, i = l, j = k}.
Solution.L2 is CFL generated by:
SaSd
SaAd
AbAc
Abc
c.L3 = {x ∊ {a, b, c}* |# ax = #bx = #cx}.
Solution.L3 is not context-free.
Suppose L3 is context-free.
Then, since the family of CFL is closed under intersection with regular sets L3a* b* c* is regular.
This is {an bn cn |n ≥ 0}.
We have shown that this is not context-free
L3 is not context-free.
d.L4 = {ambn|n, m ≥ 0, 5m − 3n = 24}.
Solution.It can be seen that:
a6b2L4
a9b7L4
a12b12L4
a15b17L4

Sa3Sb5, Sa6b2 which generates L4.
e.L5 = {ambn |nm}.
Solution.Worked out earlier in Chapter 2.
3.Let NL2 be the set of non-context-free languages. Determine whether or not.
a. NL2 is closed under union.
Solution.No.
L1 = {ap|p is a prime}.
L2 = {ap|p is not a prime}.
L1L2 = {an|n ≥ 1} is CFL whereas L1 and L2 are not.
b.NL2 is closed under complementation.
Solution.No.
L = {x|x ∊ {a, b}*} and not of the form ww is CFL.
is not a CFL.
c.NL2 is closed under intersection.
Solution.No.
L1 = {ap|p is a prime}.
L2 = {a2n | n ≥ 0}.
L1 and L2 are not CFL. L1L2 = {a2} is a singleton and hence a CFL.
d.NL2 is closed under catenation.
Solution.No.
L1={ap|p is a prime}.
L2={ap|p is not a prime}.
L1={a2, a3, a5, a7, a11, ...}.
L2={a, a4, a6, a8, a9, a10, a12, ...}.
L1L2={a3, a4, a6, a7, a8, a9, a10, a11, ...}
 =a* − {a, a2, a5} is CFL.
e.NL2 is closed under Kleene closure.
Solution.No.
is CFL.
is CFL.
4.Is the language {xmyn|m, nN, mn ≤ 2m} context-free? Justify your answer.
Solution.Yes.
SxSy
S → xSyy
S → ε
5.Is the union of a collection of context-free languages always context-free? Justify your answer.
Solution.Finite union of CFL is CFL.
L1, L2, L3, ..., Lk are CFL.
Let Li be generated by Gi = (Ni, T, Pi, Si) and let NiNj = φ for ij

generates L1L2 ∪ ... Lk.
Infinite union of CFL is not CFL
Li = {ai|i is a particular prime}.
Li are CFL.
But is not a CFL.
6.Consider the following context-free grammar:
SAA|AS |b
ASA|AS|a

For strings abaab and bab, bbb.
Construct the CYK parsing table.
Are these strings in L(G)?
Solution.

Exercises

1.Consider L = {y ∊ {0, 1}*| |y|0 = |y|1}. Prove or disprove that L is context-free.
2.Let G be the following grammar:
SCF|DE|AB
DBA|0
ESD
AAA|1
BBB|0
CFS
FAB|1

Use CYK algorithm to determine which of the following strings are in L(G).
10110, 0111010, 0110110.
For each of the above strings present the final table, state if the string is in L(G) and if it is, then give the derivation.
3.Let G be defined SAS|b, ASA|a. Construct CYK parsing tables for:
  1. bbaab
  2. ababab
  3. aabba
Are these strings in L(G)?
4.SSS|AA|b, AAS|AA|a
Give CYK parsing tables for:
aabb, and bbaba.

Are these strings in L(G)?
5.For the CFG
SAB |BC
A → BA|a
B → CC|b
CAB|a

Construct CYK parsing tables for:
  1. ababb
  2. bbbaaa
6.Let DCFL be the collection of languages accepted by a deterministic PDA. Given Examples to show that even if L1 and L2 are in DCFL:
L1 · L2 need not be in DCFL
L1L2 need not be in DCFL
7.Given a DPDA M show that it is decidable whether L(M) is a regular set.
8.Let L be in DCFL and R a regular set. Show that it is decidable whether R is contained in L.
9.Using a cardinality argument show that there must be languages that are not context-free.
10.Let LIN be a family of linear languages. Pumping lemma for linear languages can be stated as follows:
Let L ⊆ Σ* be in LIN. Then, there exists a constant p > 0 such that for all words z in L with |z| ≥ p can be expressed as z = uxwyυ for some u, υ, x, y, w ∊ Σ* such that:
  1. |uxyυ| < p
  2. |xy| ≥ 1
  3. for all i ≥ 0, uxiwyiυL
  1. Prove the LIN language pumping lemma.
  2. Using the LIN language pumping lemma, prove that the following languages are not linear.
  1. {aibicjdj|i, j ≥ 1}
  2. {x|x is in {a, b}* and #a(x) = #b(x)}
  3. {aibici|i ≥ 1}
11.Prove or disprove the following claims:
  1. Family of deterministic context-free languages (DCF) is closed under complementation.
  2. Family of DCF is closed under union.
  3. Family of DCF is closed under regular intersection.
  4. Family of DCF is closed under reversal.
  5. Family of LIN is closed under union.
  6. Family of LIN is closed under intersection.
  7. Family of LIN is closed under reversal.
12.A language L consists of all words w over the alphabet {a, b, c, d} which satisfy each of the following conditions:
  1. #a(w) + #b(w) = 2(#c(w) + #d(w)).
  2. aaa is a subword of w but abc is not a subword of w.
  3. The third letter of w is not c.
Prove that L is context-free.
13.Compare the family of minimal linear languages with the family regular languages. Characterize the languages belonging to the intersection of these two families.
14.Prove that there is a linear language which is not generated by any deterministic linear grammar.
15.A Parikh mapping ψ depends on the enumeration of the basic alphabet; another enumeration gives a different mapping ψ′. Prove that if ψ (L) is semi-linear for some ψ, then ψ′(L) is semi-linear for any ψ′.
16.Consider languages over a fixed alphabet T with at least two letters. Prove that, for any natural number n, there is a CFL Ln which is not generated by any type-2 grammar containing fewer than n non-terminals.
17.Consider the grammar G determined by the productions:
X0adX1da|aX0a|aca,
X1bX1b|bdX0db.

Prove that L(G) is not sequential. This shows that not all linear languages are sequential. Conversely, give an Example of a sequential language which is not metalinear.
18.Let G be a CFG with the production SAB, Aa, BAB|b. Run the CYK algorithm for the string aab.
19.
  1. Modify the CYK algorithm to count the number of parse trees of a given string and to construct one if the number is non-zero.
  2. Test your algorithm of part (i) above on the following grammar:
    SST |a
    T → BS
    B → +

    and string a + a + a.
20.Use closure under union to show that the following languages are CFL.
  1. {ambm|mn}
  2. {a, b}* − {anbn|n ≥ 0}
  3. {w ∊ {a, b}*|w = wR}
  4. {ambncpdq|n = q or mp or m + n = p + q}
21.Prove the following stronger version of the pumping lemma.
Let G be a CFG. Then, there are numbers K and k such that any string wL(G) with |w| ≥ K can be re-written as w = uυxyz with |υxy| ≤ k in such a way that either υ or y is non-empty and nxynzL(G) for every n ≥ 0.
22.Show that the class of DCFL is not closed under homomorphism.


No comments:

Post a Comment