Monday 24 June 2013

Greibach Normal Form

Greibach Normal Form of a CFG has all productions of the form

     A ->aV  Where 'A' is a variable, 'a' is exactly one terminal and
     'V' is a string of none or more variables.

  Every CFG can be rewritten in Greibach Normal Form.
  This is step 5 in the sequence of "simplification" steps for CFG's.
  Greibach Normal Form will be used to construct a PushDown Automata
  that recognizes the language generated by a Context Free Grammar. 

  Starting with a grammar: G = ( V, T, P, S)
    1a) Eliminate useless variables that can not become terminals
    1b) Eliminate useless variables that can not be reached
    2)  Eliminate epsilon productions
    3)  Eliminate unit productions
    4)  Convert productions to Chomsky Normal Form
    5)  Convert productions to Greibach Normal Form using algorithm below:
    
Re-label all variables such that the names are A1, A2, ... , Am .
The notation A(j) refers to the variable named Aj .
New variables may be created with names B1, B2, ... referred to as B(j) .
Productions may be created and/or removed (a mechanical implementation
may use coloring to track processed, added and deleted productions.)

    Step 1 (repeat until no productions are added, m-1 times at most)
      begin  (m is the number of variables, can change on each repeat)
        for k := 1 to m do
          begin
            for j := 1 to k-1 do
              for each production of the form A(k) -> A(j) alpha do
                begin
                  for all productions A(j) -> beta do
                    add production A(k) -> beta alpha
                  remove production A(k) -> A(j) alpha
                end;
          end;
        for each production of form A(k) -> A(k) alpha do
          begin
            add production b(k) -> alpha  and  B(k) -> alpha B(k)
            remove production A(k) -> A(k) alpha
          end;
        for each production A(k) -> beta where beta does not begin A(K) do
            add production A(k) -> beta B(k)
      end;

    Step 2  (sort productions and delete any duplicates )

            Remove A -> A beta  by creating  B -> alpha B
            Substitute so all rules become Greibach with starting terminal.

    For neatness, sort productions and delete any duplicates

  see book: 2nd Ed. page 271, section 7.1, Exercise 7.1.11 construction

  Example (in file format, input given to program  cykp,
           output file  greibach.pda)

  // g1.g  a test grammar 
  // G = (V, T, P, S)  V={S,A}  T={a,b}  S=S
  //
  start S
  terminal a b ;
  verbose 7      // causes every step of ever loop to be printed
                 // runs a very long time !
  S -> a A S ;
  S -> a ;
  A -> S b A ;
  A -> S S ;
  A -> b a ;
  enddef
  abaa


// greibach.pda , edited, from  cykp < g1.g > g1_cykp.out

start S

terminal a ;
terminal b ;

variable A ;
variable C_AS ;
variable C_T_bA ;
variable S ;
variable T_a ;

A      ->  a  C_AS  C_T_bA ;
A      ->  a  C_AS  S ;
A      ->  a  C_T_bA ;
A      ->  a  S ;
A      ->  b  T_a ;
C_AS   ->  a  C_AS  C_T_bA  S ;
C_AS   ->  a  C_AS  S  S ;
C_AS   ->  a  C_T_bA  S ;
C_AS   ->  a  S  S ;
C_AS   ->  b  T_a  S ;
C_T_bA ->  b  A ;
S      ->  a  C_AS ;
S      ->  a ;
T_a    ->  a ;
enddef

Note that an optional optimization could delete the last rule,
and replace T_a by S.


Push Down Automata M = ( Q, Sigma, Gamma, delta, q0, Z0, F)
  Q = a finite set of states including q0
  Sigma = a finite alphabet of input symbols (on the input tape)
  Gamma = a finite set of push down stack symbols including Z0
  delta = a set of nondeterministic transitions
          from-state tape-input top-of-stack to-state symbol(s)-to-push
  q0 = the initial state, an element of Q
  Z0 = the initial stack contents, an element of Gamma
  F = the set of final accepting states, if empty then accept
      on an empty stack at end of input


 +-------------------------+
 | input string            |
 +-------------------------+
    ^ read, move right       
    |
    |        read or write (pop or push)               
    +-------------------------------------> |Z0 |
    |  +-----+                              |   |
    |  |     |--> accept                    |   | push down
    +--+ FSM |                              |   | stack
       |     |--> reject                    |   |
       +-----+

accepts at end of input string, when in a final state,
and stack contains only Z0. Standard Finite State Machine
we have been using. The only extra part is the
push down stack.

Many parsers use a push down stack. We are covering the
basic abstract, theory, that can prove parsers work.

No comments:

Post a Comment