Real Computer Science begins where we almost stop reading ...: Convert NFA to regular expression

Monday, June 24, 2013

Convert NFA to regular expression

Conversion algorithm from a NFA to a regular expression.
  Start with the transition table for the NFA with the following
  state naming conventions:
     the first state is 1 or q1 or s1 which is the starting state.
     states are numbered consecutively, 1, 2, 3, ... n
  The transition table is a typical NFA where the table entries
  are sets of states and phi the empty set is allowed.
  The set F of final states must be known.

  We call the variable r a regular expression.
  It can be: epsilon, an element of the input sigma, r*, r1+r2, or r1 r2

  We use r   being the regular expression to go from state qi to qj
          ij
 
  We need multiple steps from state qi to qj and use a superscript
  to show the number of steps

   1    2     k     k-1
  r    r     r     r     are just names of different regular expressions
   12   34    1k    ij
                                      2
  We are going to build a table with n  rows and n+1 columns labeled


     |  k=0   |  k=1  |  k=2  | ... | k=n
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |   n
 r   |  r     |  r    |  r    | ... |  r       Only build column n
  11 |   11   |   11  |   11  |     |   11     for  q1 to final state
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |   n      The final regular expression
 r   |  r     |  r    |  r    | ... |  r       is then the union, +, of
  12 |   12   |   12  |   12  |     |   12     the column n
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |   n
 r   |  r     |  r    |  r    | ... |  r
  13 |   13   |   13  |   13  |     |   13 
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  21 |   21   |   21  |   21  |     |
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  22 |   22   |   22  |   22  |     |
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  23 |   23   |   23  |   23  |     |
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  31 |   31   |   31  |   31  |     |
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  32 |   32   |   32  |   32  |     |
 ----+--------+-------+-------+-----+------
     |   0    |   1   |   2   |     |
 r   |  r     |  r    |  r    | ... |
  33 |   33   |   33  |   33  |     |


  ^         2
  |-  Note n  rows,  all pairs of numbers from 1 to n


  Now, build the table entries for the k=0 column:


           /
    0     /  +{ x | delta(q ,x) = q }              i /= j
   r   = /                 i       j
    ij   \
          \  +{ x | delta(q ,x) = q } + epsilon     i = j
           \               i       j


            where delta is the transition table function,
            x is some symbol from sigma
            the q's are states
     0
    r   could be  phi,   epsilon,   a,   0+1,  or  a+b+d+epsilon
     ij
           notice there are no Kleene Star or concatenation in this column



   Next, build the k=1 column:

    1      0     0   *  0      0
   r   =  r   ( r   )  r   +  r    note: all items are from the previous column
    ij     i1    11     1j     ij


   Next, build the k=2 column:

    2      1     1   *  1      1
   r   =  r   ( r   )  r   +  r    note: all items are from the previous column
    ij     i2    22     2j     ij


   Then, build the rest of the k=k columns:

    k      k-1   k-1  *  k-1     k-1
   r   =  r   ( r    )  r    +  r     note: all items are from previous column
    ij     ik    kk      kj      ij

   Finally, for final states p, q, r  the regular expression is

    n      n     n
   r   +  r  +  r
    1p     1q    1r

   Note that this is from a constructive proof that every NFA has a language
   for which there is a corresponding regular expression.


  Some minimization rules for regular expressions are available.
  These can be applied at every step.
  (Actually, extremely long equations, if not applied at every step.)

  Note: phi is the empty set
        epsilon is the zero length string
        0, 1, a, b, c, are symbols in sigma
        x is a variable or regular expression
        ( ... )( ... ) is concatenation
        ( ... ) + ( ... ) is union
        ( ... )*  is the Kleene Closure = Kleene Star

      (phi)(x) = (x)(phi) = phi

      (epsilon)(x) = (x)(epsilon) = x

      (phi) + (x) = (x) + (phi) = x

      x + x  = x

      (epsilon)* = (epsilon)(epsilon) = epsilon

      (x)* + (epsilon) = (x)* = x*

      (x + epsilon)* = x*

      x* (a+b) + (a+b) = x* (a+b)

      x* y + y = x* y

      (x + epsilon)x* = x* (x + epsilon) = x*

      (x+epsilon)(x+epsilon)* (x+epsilon) = x*


  Now for an example:
  Given M=(Q, sigma, delta, q0, F) as

   delta   |  a   |  b   |  c         Q     = { q1, q2}
   --------+------+------+-----       sigma = { a, b, c }
     q1    | {q2} | {q2} | {q1}       q0    = q1
   --------+------+------+-----       F     = { q2}
     q2    | phi  | phi  | phi 
   --------+------+------+-----


Remember r   for k=0, means the regular expression directly from q1 to q1 
          11


       |  k=0        
  -----+-------------
   r   | c + epsilon    from q1 input c goes to q1 (auto add epsilon
    11 |                                 when a state goes to itself)
  -----+-------------
   r   | a + b          from state q1 to q2, either input a or input b
    12 |             
  -----+-------------
   r   | phi            phi means no transition
    21 |             
  -----+-------------
   r   | epsilon        (auto add epsilon, a state can stay in that state)
    22 |             
  -----+-------------



       |  k=0        |  k=1 (building on k=0, using e for epsilon)
  -----+-------------+------------------------------------
   r   | c + epsilon | (c+e)(c+e)* (c+e) + (c+e) = c*
    11 |             |
  -----+-------------+------------------------------------
   r   | a + b       | (c+e)(c+e)* (a+b) + (a+b) = c* (a+b)
    12 |             |
  -----+-------------+------------------------------------
   r   | phi         | phi (c+e)* (c+e) + phi = phi
    21 |             |
  -----+-------------+------------------------------------
   r   | epsilon     | phi (c+e)* (a+b) + e = e
    22 |             |
  -----+-------------+------------------------------------


       |  k=0        |  k=1     | k=2 (building on k=1,using minimized)
  -----+-------------+----------+-------------------------
   r   | c + epsilon | c*       |
    11 |             |          |
  -----+-------------+----------+-------------------------
   r   | a + b       | c* (a+b) | c* (a+b)(e)* (e) + c* (a+b)  only final
    12 |             |          |                              state
  -----+-------------+----------+-------------------------
   r   | phi         | phi      |
    21 |             |          |
  -----+-------------+----------+-------------------------
   r   | epsilon     | e        |
    22 |             |          |
  -----+-------------+----------+-------------------------

   the final regular expression minimizes to  c* (a+b)
Real Computer Science begins where we almost stop reading ...

Monday, June 24, 2013

Convert NFA to regular expression

No comments:

Post a Comment