Real Computer Science begins where we almost stop reading ...: Intersection and other closures

Monday, 24 June 2013

Intersection and other closures

A class of languages is simply a set of languages with some
  property that determines if a language is in the class (set).

  The class of languages called regular languages is the set of all
  languages that are regular. Remember, a language is regular if it is
  accepted by some DFA, NFA, NFA-epsilon, regular expression or
  regular grammar.

  A single language is a set of strings over a finite alphabet and
  is therefor countable. A regular language may have an infinite
  number of strings. The strings of a regular language can be enumerated,
  written down for length 0, length 1, length 2 and so forth.

  But, the class of regular languages is the size of a power set
  of any given regular language. This comes from the fact that
  regular languages are closed under union. From mathematics we
  know a power set of a countable set is not countable. A set that
  is not countable, like the real numbers, can not be enumerated.

  A class of languages is closed under some operation, op, when for any
  two languages in the class, say L1 and L2, L1 op L2 is in the class.
  This is the definition of "closed."

  Summary. Regular languages are closed under operations: concatenation,
  union, intersection, complementation, difference, reversal,
  Kleene star, substitution, homomorphism and
  any finite combination of these operations.
  All of these operations are "effective" because there is an
  algorithm to construct the resulting language.
  Simple constructions include complementation by interchanging final
  states and non final states in a DFA. Concatenation, union and Kleene
  star are constructed using the corresponding regular expression to
  NFA technique. Intersection is constructed using DeMorgan's theorem
  and difference is constructed using L - M = L intersect complement M.
  Reversal is constructed from a DFA using final states as starting states
  and the starting state as the final state, reversing the direction
  of all transition arcs.

  We have seen that regular languages are closed under union because
  the "+" is regular expressions yields a union operator for regular
  languages. Similarly, for DFA machines, L(M1) union L(M2) is
  L(M1 union M2) by the construction in lecture 6.

  Thus we know the set of regular languages is closed under union.
  In symbolic terms, L1 and L2 regular languages and L3 = L1 union L2
  implies L3 is a regular language.

  The complement of a language is defined in terms of set difference
                    __                     
  from sigma star.  L1 = sigma * - L1. The language L1 bar, written
  as L1 with a bar over it, has all strings from sigma star except
  the strings in L1. Sigma star is all possible strings over the
  alphabet sigma. It turns out a DFA for a language can be made a
  DFA for the complement language by changing all final states to
  not final states and visa versa. (Warning! This is not true for NFA's)
  Thus regular languages are closed under complementation.

  Given union and complementation, by DeMorgan's Theorem,
  L1 and L2 regular languages and L3 = L1 intersect L2 implies
                                  ______________
                                    __       __
  L3 is a regular language.  L3 = ( L1 union L2) .

  The construction of a DFA, M3, such that  L(M3) = L(M1) intersect L(M2)
  is given by:
  Let  M1 = (Q1, sigma, delta1, q1, F1)
  Let  M2 = (Q2, sigma, delta2, q2, F2)
  Let  S1 x S2 mean the cross product of sets S1 and S2, all ordered pairs,
  Then
       M3 = (Q1 x Q2, sigma, delta3, [q1,q2], F1 x F2)
  where [q1,q2] is an ordered pair from Q1 x Q2,
  delta3 is constructed from
       delta3([x1,x2],a) = [delta1(x1,a), delta2(x2,a)] for all a in sigma
  and all [x1,x2] in Q1 x Q2.

  We choose to say the same alphabet is used in both machines, but this
  works in general by picking the alphabet of M3 to be the intersection
  of the alphabets of M1 and m2 and using all 'a' from this set.



  Regular set properties:
  One way to show that an operation on two regular languages produces a
  regular language is to construct a machine that performs the operation. 
  Remember, the machines have to be DFA, NFA or NFA-epsilon type machines 
  because these machines have corresponding regular languages.
  
  Consider two machines M1 and M2 for languages L1(M1) and L2(M2).
  To show that L1 intersect L2 = L3 and L3 is a regular language we
  construct a machine M3 and show by induction that M3 only accepts
  strings that are in both L1 and L2.
  
  M1 = (Q1, Sigma, delta1, q1, F1) with the usual definitions
  M2 = (Q2, Sigma, delta2, q2, F2) with the usual definitions
  
  Now construct M3 = (Q3, Sigma, delta3, q3, F3) defined as
  Q3 = Q1 x Q2   set cross product
  q3 = [q1,q2]   q3 is an element of Q3, the notation means an ordered pair
  Sigma = Sigma = Sigma  we choose to use the same alphabet,
                         else some fix up is required
  F3 = F1 x F2   set cross product
  delta3 is constructed from
     delta3([qi,qj],x) = [delta1(qi,x),delta2(qj,x)]
     as you might expect, this is most easily performed on a DFA
         
  The language L3(M3) is shown to be the intersection of L1(M1) and L2(M2)
  by induction on the length of the input string.
  
  For example:
  
  M1: Q1 = {q0, q1}, Sigma = {a, b}, delta1, q1=q0, F1 = {q1}
  M2: Q2 = {q3, q4, q5}, Sigma = {a, b}, delta2, q2=q3, F2 = {q4,q5}
  
  delta1       | a  | b  |      delta2         | a  | b  |
            ---+----+----+                -----+----+----+
     q0 | q0 | q1 |                  q3 | q3 | q4 |
     ---+----+----+                -----+----+----+
     q1 | q1 | q1 |                  q4 | q5 | q3 |
     ---+----+----+                -----+----+----+
                       q5 | q5 | q5 |
       -----+----+----+
           
  M3 now is constructed as
    Q3 = Q1 x Q2 = {[q0,q3], [q0,q4], [q0,q5], [q1,q3], [q1,q4], [q1,q5]}
    F3 = F1 x F2 = {[q1,q4], [q1,q5]}
    initial state  q3 = [q0,q3]
    Sigma = technically  Sigma_1 intersect Sigma_2
    delta3 is constructed from delta3([qi,qj],x) = [delta1(qi,x),delta2(qj,x)]
  
  delta3        |    a    |    b    | 
       ---------+---------+---------+ 
   [q0,q3] | [q0,q3] | [q1,q4] |    this is a DFA when both 
       ---------+---------+---------+    M1 and M2 are DFA's
   [q0,q4] | [q0,q5] | [q1,q3] | 
       ---------+---------+---------+
   [q0,q5] | [q0,q5] | [q1,q5] |
       ---------+---------+---------+ 
   [q1,q3] | [q1,q3] | [q1,q4] |
       ---------+---------+---------+ 
   [q1,q4] | [q1,q5] | [q1,q3] | 
       ---------+---------+---------+
   [q1,q5] | [q1,q5] | [q1,q5] |
       ---------+---------+---------+
  
  As we have seen before there may be unreachable states.
  In this example [q0,q4] and [q0,q5] are unreachable.
  It is possible for the intersection of L1 and L2 to be empty,
  thus a final state will never be reachable.
  Coming soon, the Myhill-Nerode theorem and minimization to
  eliminate useless states.
Real Computer Science begins where we almost stop reading ...

Monday, 24 June 2013

Intersection and other closures

No comments:

Post a Comment