Sunday, 23 June 2013

Concurrency Control in Database

Why we want to run transactions concurrently?
Concurrent or overlapping execution of transactions are efficient

How we ensure the correctness of the concurrent transactions?
Concurrency control(Serializability) and Recovery are two criteria that ensure the correctness of concurrent transactions

Why concurrency control is needed
Lost Update : update of some data by one transaction is lost by update from another transaction
Dirty Read or temporary update problem : one transaction updates the value of common data and aborts before it can revert the changes transaction 2 reads the value of updated variable.
incorrect summary problem: transaction reads the data while another transaction is still changing the data

Why recovery is needed
In any kind of problem like hardware malfunction, software error exceptions or violating the concurrency property, deadlock recovery of transaction is needed

Transaction  states
  • begin transaction marks the beginning of transaction. 
  • end_transaction specifies transaction execution is complete and system check whether changes can be permanently applied.
  • rollback or abort for unsuccessful end of tranaction
Fig. transaction states
  • At commit point all transaction operations have been logged and new entry is done in log 'commit T' stating that all transaction operation permanently logged
  • before writing commit T the complete log should be written to disk from buffers
  • Rollback :  when commit T statement  is not found in log, its rollbacked

How recoverability is implemented
System log is kept on disk which logs transaction like write old value new value read.
Protocol that do not provide cascading rollback do not need to keep read entry

Recoverable Schedule : If T2 reads a data item written by T1 commit operation of T1 should appear before commit operation of  T2.

Cascadeless Schedule: If T2 reads a data item written by T1 commit operation of T1 should appear before read operation of T2.
Strict Schedule : If a write operation of T1 precedes a conflicting operation of T2 (either read or write), then the commit event of T1 also precedes that conflicting operation of T2.

Fig. schedules recoverability
pattern for recoverable schedule would be like
Fig. simple pattern for recoverability schedules on vertical time line
What is Serializability
If executing interleaved transaction results in same outcome as serial schedule(running transaction in some sequnence) then they are considered serializable. this schedule is type of nonserial schedule.

Serializabality might be compromised in some cases but recoverability compromise would mean violating database integrity. Isolation levels decide tradeoff between correctness and concurrency

Isolation level
when the changes made by one operation becomes visible to another operation
most relaxed ACID property
From more stringent to relaxed isolation levels
Serializable as if transactions execute in complete isolation, in serial fashion. read locks released immediately but write locks released at end of transaction
Read Uncommited Dirty reads allowed that is another transaction reads the data value but then first transaction can revert so other transaction has incorrect value
Repeatable read: A phantom read occurs when range of rows is read by one transaction and another transaction inserts or delete a record in between rows returned by same query is different at two points in time for a transaction that is another transaction updates and commits in between
Read commited(Non repeatable read):prevents dirty read that is reads commited data only.
UPDATE or DELETE of any of the rows read by earlier transaction

Dirty Read
Lost Update

Phantom Records
Read uncommitted

Read committed

Repeatable read



Types of serializability

View and conflict serializability

  • conflict is subset of view serializability
  • Conflict is widely utilized because it is easier to determine and covers a substantial portion of the view serializable
Equivalence to serial schedule such that

In view serializable, two schedules write and read the same data values.
and In conflict seriablizable, same set of respective chronologically ordered pairs of conflicting operations.

Conflict serializable
In conflict serializabability two schedules are conflict equivalent and we can reorder the non conflicting operation to get the serial schedule
Conflicting operation
1) they are upon same data item
2)At least one of them is write
3) they are from different transactions
Non commutative that is their orders matter

Testing conflict serializability
Test is through precedence graph. The acyclic preced graph shows conflict serialiczable schedule and topological sorting of that graph gives the serializable schedules
cycle of commited transaction can be prevented by aborting an undecided transaction on each cycle in precedence graph of all transaction

view serializability is NP complete
materialized conflict if the requested conflicting operation is actually executed

Problem 1 Consider two schedules
S1 : r1(x) w1(x) r1(y) c1
S2: r2(x) r2(y) c2
1.What the number of possible schedules.
2. How many serial schedules are possible
3. Determine type of schedules(recoverable cascadeless strict) and serializability(conflict seriablizable or not) for below schedules
S1: r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 2 ; C 1
S2: r 1 (X); r 2 (X); w 1 (X); r 1 (Y); w 1 (Y); C 1 ; w 2 (X); C 2
S3: r 1 (X); w 1 (X); r 2 (X); w 2 (X); r 1 (Y); w 1 (Y); C 1 ; C 2 
In general, given m transactions with number of operations n1, n2, ..., nm, the number
of possible schedules is: (n1 + n2 + ... + nm)! / (n1! * n2! * ... * nm!)
here m=2 and n1= 4 and n2 = 3
so (3+4)! / (3!*4!)
7C3 or 7C4
2.Number of serial schedules is m! where m is number of transactions
so here we have 2!

3. To check type of schedule
we need to check all conflicting operations of schedule 
S1 : r2(X) comes after w1(x) , since no commit before r2(X) its not cascadeless and strict
commit of T1 should be before T2 in this case for it to be recoverable which is not so it is not even recoverable. Its non recoverable schedule
S2: there is only one conflicting operation w1(X) w2(X) and for that we have commit of T1 before w2(X) so its strict schedule and so is cascadeless also
S3: for one of the conflicting operation w1(X) r2(X) there is no commit in between so its not cascadeless and strict. To check for recoverable commit of T1 should be before that of T2 which is there

To check serializability
S1 precedence graph is
T1 ---> T2
acyclic so its conflict serializable

S2 precedence graph
^                 |
there is cycle so its not conflict serializable

//pending problems