Concurrency Control and its part in Databases

Published: November 30, 2015 Words: 4912

Concurrency control is an integral part of a database. Distributed concurrency control for databases is one special side of database Achieving distributed serializability, distributed database system can guarantees the concurrency control. Commitment ordering protocol can be used to achieve distributed serializability and it is the only effective technique for that purpose. Strong strict two phase locking, which is a variant of the Commitment ordering protocol is the most popular serializability technique. There are aggressive and conservative distributed database concurrency control techniques. When using two phase locking, have to deal with deadlocks. Distributed database designer have to think about how to detect and break or prevent dead locks. In this literature review all the above mentioned topics will be described.control. There are lots of centralized database concurrency control techniques are available and their distributed versions are used to achieve concurrency control in distributed database systems. Day by day new concurrency control techniques have being emerged for both centralized and distributed database systems. Two phase locking and timestamp ordering are most popular concurrency control techniques. There are lots of variations of these techniques. The technologies use to achieve database concurrency control can and does depend on the needs of the business and the sensitivity/confidentiality of the data to be stored in the database, and hence the price the business is willing to spend on ensuring data security, consistency and integrity. It is very important to have concurrency control in database systems which are use for financial applications. \\

Achieving distributed serializability, distributed database system can guarantees the concurrency control. Commitment ordering protocol can be used to achieve distributed serializability and it is the only effective technique for that purpose. Strong strict two phase locking, which is a variant of the Commitment ordering protocol is the most popular serializability technique. There are aggressive and conservative distributed database concurrency control techniques. When using two phase locking, have to deal with deadlocks. Distributed database designer have to think about how to detect and break or prevent dead locks. In this literature review all the above mentioned topics will be described.

Distributed concurrency control in databases is the concurrency control of a database distributed over a computer network. It mainly focuses on the activity of coordinating concurrent accesses to a database in a multiuser database management system (DBMS). Concurrency control allows users to access a database in a multi programmed manner by preserving the illusion (assuming) that each user is executed on a dedicated system. The concurrency control is concerned with controlling concurrent access to shared resources. \\

A distributed database is a collection of multiple logically connected databases distributed over a computer network. Distributed database system contains multiple computers called nodes or sites which are running transaction manager(TM), data manager (DM) or both of these software modules. TMs supervise interactions between users and the distributed database management system while data managers manage the actual database. Each computer must be connected by a communication network. A distributed database management system (DDBMS) is a software system which manages distributed databases while making the distribution transparent to the user. It also concern about the replication transparency and fragmentation transparency of the distributed databases. Distributed concurrency control emerges special challenges beyond centralized one, primarily due to communication and computer latency. It often requires special techniques such as distributed lock manager over fast computer networks with low latency. \\

The goal of concurrency control is to prevent interference among users who are simultaneously accessing a database. Attaining distributed concurrency control of databases is difficult due to concurrency control anomalies like lost update problem, inconsistent retrieval problem, temporary update problem and the incorrect summary problem. \\

There are lots of algorithms proposed to avoid these problems. These concurrency control problems can be subdivided into two sub problems, named as read-write and write-write synchronization. Every concurrency control algorithm consist a sub algorithm for each sub problem. Each concurrency control algorithm is a composition of only a few sub algorithms. The glue that binds the sub problems and solutions together is a mathematical theory known as serializability theory. Commitment ordering is a general serializability technique that allows to achieve distributed serializability effectively on a large scale database systems, without concurrency control information distribution, and without performance penalties which are typical to other serializability techniques. \\

Serializability theory is a collection of mathematical rules that tells whether a concurrency control algorithm works correctly. Serializability theory does its job by looking at the executions allowed by the concurrency control algorithm. The theory gives a precise condition under which an execution is correct. A concurrency control algorithm is then judged to be correct if all of its executions are correct. \\

Above mentioned sub algorithms are variations of just two basic techniques, two phase locking and timestamp ordering. Hybrid is also another basic method for transaction concurrency control. Majority protocol and biased protocol used as basic locking protocols. The most common distributed concurrency control technique is strong strict two-phase locking, which is also a common centralized concurrency control technique. It provides serializability, strictness, and commitment ordering properties. Two phase locking is simple and guarantees serializability, but the locking may damage the throughput of the system, and may cause dead-lock. There are some ways to solve the deadlock problem. They are Prevention, avoidance, detection, and resolution. Deadlock prevention is not easy to achieve since it must have the complete serial graph. It is a NP-complete problem to find the minimum cost edge for breaking the deadlock cycle. In two phase locking, and other locking techniques as well, the deadlock prevention or detection in a distributed environment, which is much more complex and costly. \\

Timestamp Ordering protocols don?t attempt to maintain serializability by mutual exclusion so that it won?t cause dead-lock. The basic timestamp ordering algorithm is simple and deadlock-free. The penalty of such mechanism is potential restart of a transaction numerous times. The global unique timestamp assignment is not an easy problem either. \\

From a user's perspective, a database consists of a collection of logical data items. These data items can be files, records, etc. A logical database state is an assignment of values to the logical data items composing a database. Each logical data item may be stored at any data manager in the system or redundantly at several data managers. A stored copy of a logical data item is called a stored data item. A stored database state is an assignment of values to the stored data items in a database. Concurrency control algorithms make all of their decisions on the basis of the data items a transaction reads and writes. These concurrency control algorithms don?t concern about computations performed by transactions. Transactions represent complete and correct computations, each transaction, if executed alone on an initially consistent database, would terminate, produce correct results, and leave the database consistent. \\

Operations are actions used by transactions to execute them. There are 5 types of operations. They are read, write, predeclare, commit and abort. A Read on a data item extracts the value of it. A write on a data item replace its value with a new value. A predeclare on a data item inform to the database that the transaction wants write on it in future. Commit inform to the database that the transaction is terminated normally and all of its effect should be permanent. An abort tell to the database that the transaction did not terminated normally and it?s all effects should remove from the database permanently. If a transaction wants to abort during it?s execution then it?s call as unilateral abort. Except that database management system aborts a transaction if the commitment of that transaction causes to an inconsistent state of the database. Transactions communicate with each other using read and write operations. Set of data items read by a transaction are called the read set of that transaction. Write set indicates the set of data items which are written to by the transaction.

\newpage

\section{Serializability}

\subsection{Correctness}

\subsubsection{Correctness-Serializability}

Sometimes when transactions executes concurrently, then their operations can be interleaved. By avoiding interleaved operations data base management system can avoid interferences among relevant transactions. One way of doing this is use serial execution. But the drawback is, since every transaction executes all of its operations before execute any operation of the next transaction, then it leads to poor use of system resources. Also serial execution not support for the concurrent access of the database. An efficient way to avoid interference is to control the execution of conflicted operations. Two operations can be conflicted if at least one of them is a write and both of them use the same data item. By using this technique can achieve results similar to serial execution. This type of execution is call serializable execution. Order of two or more transactions is serializable, if they produce the same output and same effect on the database as they execute serially. Transactions executes concurrently since it is the most efficient way. Distributed serializability is the serializability of a schedule of a distributed system.

\subsubsection{Correctness - Recoverability}

When emerging a failure to ensure correctness, an execution must be recoverable. An execution is recoverable, if commit of each transaction follows the commit of every transaction which read data from them. Aborting transactions may cause to further aborts, called cascading aborts. A DBMS can avoid cascading aborts if it allows transactions only to read values of data items which are written by committed transactions. Strict execution avoids cascading aborts and is recoverable. When DBMS use strict execution, delays all read and write operations on a data item until all the transactions which are previously written to the data item are committed or aborted. Serializabilyty is the most popular correctness criterion.

\subsection{View and Conflict Serializability}

There are two major types of serializability. They are view serializabiliy and complicit serializability. View serializability is the normal thing describe above. But the complicit serializability is a special case. Any schedule which is conflict serializable is also view serializable. Two operations are complicating, if they are belong to two transactions, act on the same data item and at least one of them is write. If two or more operations are complicating then they also require to non commutative, means that changing the order of operations cause to change the final result of these transactions. When checking for a serial schedule only consider about time order of pairs of conflicting operations since schedules belong same transactions can convert to one another by changing the order of transaction operations, and by changing the order of non conflicting operations does not affect the overall sequence of operation results. The meaning is that, if a schedule can be transformed to any serial schedule without changing orders of conflicting operations, then the outcome of both schedules is the same, and the schedule is conflict-serializable by definition. Schedule undergo with conflict serializability can check using precedence graph.

\subsection{Strong Strict Two Phase Locking (SS2PL)}

Strong strict two phase locking is a common mechanism use to ensure conflict serializability and recoverability of a schedule. When using this mechanism each data item is locked by a transaction before using it. According to the operation the lock is different. As a result of that, the data item access by another transaction may be preventing depending on the lock type and second transaction s access operation type. When using this SS2PL technique all locks acquire by a transaction are released only after the transaction has committed or aborted.

\subsection{Optimistic and pessimistic Concurrency control techniques}

There are two major types of concurrency control techniques.

\begin{itemize}

\item Pessimistic :

When using pessimistic concurrency control, a transaction blocks another transaction s data access operations which are cause to conflicts and conflicts are non-materialized until blocking is removed to ensure serializability and recoverability.

\item Optimistic :

When using optimistic concurrency control operations of one transaction are not blocked by another r transaction due to conflicts and conflicts are materialized immediately. When transaction is reaches the ready state if violation has occurred, then the transaction is aborted. Else it is committed.

\end{itemize}

A technique is inherently-blocking, if it can be enforced only by blocking transaction s data access operations until certain events occur in other transactions. The main difference of two types is how conflicts are handled. Conflicts can be generated by the way transaction operations are scheduled, independently of the type. A serializability violation is represented by a cycle of committed transactions with materialized conflicts in a precedence graph, and should be avoided for maintaining serializability. A deadlock is represented by a cycle with non materialized conflicts in a wait-for graph. Deadlock should be resolved by breaking the cycle. Both cycle types are results of conflicts and should be avoided from. At any type materialized and non materialized conflicts should be detected and considered with similar overhead. When using blocking method normally a context switching occurs with conflicts, cause to additional incurred overhead. Otherwise blocking transactions related computing resources are not properly used. Optimistic methods have an advantage if conflicts do not occur frequently. For different transaction types one technique type may perform better than the other. \\

Some mechanisms mix blocking in certain situations with not blocking in other situations, and have been referred to as semi-optimistic. These mechanisms employ materialized and non-materialized conflicts. In this case cycles in the graph which is the union of the conflict graph and wait-for graph are used to characterize serializability violations and deadlocks. Unless schedule classes are inherently blocking, they can be implemented also using optimistic techniques.

\subsection{Distributed serializability}

Distributed serializability is the serializability of a transactional distributed system like distributed database system. Such system includes distributed transactions and network nodes. Each distributed transaction has more than one local sub transactions. Each local sub transaction has its own particular state. Distributed transaction needs atomic commitment protocol to communicate between its local sub transactions to whether to commit or abort. Distributed serializability is an important goal of distributed concurrency control for correctness. \\

The way to achieve Distributed serializability is, implement distributed versions of the known centralized techniques. Distributed versions need to use the complicating information generated in different processes and remote locations. When the distributed system is relatively small and message delays in the system are small, then centralized concurrency control methods can be used without any change. Certain processes or nodes in the system have to handle the related algorithms. Even when distributed versions are used substantial performance penalties are occurred in a large scale system due to computer and communication latency. When such information is distributed, related techniques do not scale well. The only alternative, which needs only local information for its distributed version and scales well and avoid concurrency control information delays is Commitment ordering. CO can provide unconditional scalability in new experimental distributed database systems architectures also, where local sub-transactions in each processor core are one-threaded and serial, but the overall schedule is serializable.

\newpage

\section{ Commitment ordering}

Commitment ordering is both centralized and distributed serializability technique. CO is the resulting schedule property which defines a broad subclass of conflict serializability class of schedules. The most important characteristics of CO which make it an effective general distributed serializability solution are:

\begin{itemize}

\item Neither changing or blocking any transaction's operation scheduling nor adding any new operation with low overhead integration with any concurrency control mechanism CO is also a standalone serializability mechanism

\item Conflict or equivalent information distribution is not necessary

\item Automatic distributed deadlock resolution

\item Scalability

\end{itemize}

All the above characteristics, except the first one are included in SS2PL which is a special case of CO. Under certain conditions distributed CO can be used to achieve distributed serializability without paying the penalty of distributing conflict information. This is a major advantage of distributed CO when comparing with other distributed serializability techniques. Instead of that distributed CO uses messages of an atomic commitment protocol. Such protocol is an essential component for distributed transaction environment, used to maintain atomicity of distributed transactions. Distributed CO utilizes the atomic commitment protocol messages without any additional information communicated. An important side-benefit of distributed CO is that distributed deadlocks are automatically resolved.\\

Three conditions have to complete to guaranteeing the distributed serializability in a database system.

\begin{itemize}

\item Data partition: Recoverable data are partitioned among distributed transactional data managers that the distributed system comprises, is controlled by a single data manager

\item Participate in atomic commitment protocol

\item Each data manager guarantees CO locally

\end{itemize}

These conditions are applied to the distributed system's transactional data managers.

\subsection{How to achieve global serializability using Commitment ordering}

Distributed serializability in a heterogeneous system that deals with lots of transaction objects using different concurrency control techniques is called as global serializability. Each database in the system may use different concurrency control mechanisms to achieve serializability. Though every local schedule of a single database is serializable, the global schedule of entire system is not necessarily serializable. The huge communication exchange of complicit information between database systems to achieve conflict serializability would cause to performance penalties due to computer and communication latency. CO provides an effective global serializabily criteria if each database system in a heterogeneous multi database system is undergo CO. \\

If all the databases in the distributed system use only atomic commitment protocol messages to share concurrency control information, then CO is the most suitable technique to achieve global serializability. So CO is the only general technique which doesn t need the costly distribution of local concurrency control information. CO generalizes the strong strict two phase locking protocol, which has connection with two phase commit protocol is the real standard to acquire global serializability in SS2PL based database systems. As a result CO compliant database systems which use different concurrency control techniques can use SS2Pl based solutions jointly to achieve global serializability. Locking based global deadlocks are automatically resolved in a CO compliant multi database systems and completely SS2PL based systems is a further advantage. If there are read write complicit are occurred in the system then strict commitment ordering which is a combination of strictness and CO is perform better than SS2PL. The advantage of SCO is significant during lock contention. Strictness facilities both SS2PL and SCO to utilize the same database recovery techniques. \\

Extended CO and Multiversion CO are the main variations of CO. All are provide global serializability when combined with any suitable concurrency control technique and allow non blocking implementations. Both of them use additional information for relaxing CO constraints and achieving better concurrency and performance. CO and its variants interoperate transparently guarantees global serializability also together in a mixed, heterogeneous environment with different variants. \\

\subsection{How Commitment Ordering works in a distributed environment}

A schedule follows commitment ordering property, if the order in time of its transactions commitment events is compatible with the precedence order of the respective transactions, according to their local serializability graph. Any conflict serializable schedule can be changed to a CO compliant one, without aborting any transaction in the schedule. To do this schedule has to delay commitment operations to agree with the needed timing order. Including CO property in each local schedule can achieve conflict serializability globally. Atomic commitment protocol is the only needed communication between the databases for the above purpose. In each single database, a local CO algorithm can run with any local concurrency control mechanism. Also CO can be used in heterogeneous environments which may uses different serealizability techniques. The CO protocol deals with network size and the number of databases without any performance penalties. This causes CO use for global concurrency control. CO implementation by itself is not sufficient as a concurrency control technique because it lacks the important recoverability property. \\

The commitment event of a distributed transaction is generated by atomic commitment protocol, use to communicate between its processes whether to commit or abort it. This procedure is always using in distributed databases, not only in CO. In case of local commitment orders in two or more databases, which causes for a global cycle in the global complicit graph, CO generates a voting deadlock for the atomic commitment protocol. The protocol breaks that deadlock by aborting a transaction with a missing vote on the cycle and breaking the cycle.

\newpage

\section{Aggressive and Conservative Concurrency Control techniques}

Concurrency control in a database is coordinating operations of transactions which are execute in parallel, use shared data and then interfere with each other. Concurrency control combined with a recovery protocol guarantees the ACID properties even in the presence of failures. Various concurrency control mechanisms have been proposed. Each one is based on different type of scheduler. A scheduler can be conservative, aggressive or a combination of both. A conservative scheduler delays operations until it is realized that there is no any complicating operations. Sometimes there will be no any complicating operations to arrive, then the delay is unnecessary and conservative techniques suffer from lost opportunity cost. Examples of conservative techniques are conservative timestamp ordering and two-phase locking (2PL). An aggressive scheduler reduces unnecessary delay by scheduling operations immediately. But it has the risk of having to reject operations during the execution. It causes the issuing transaction to abort and restart. Aggressive techniques suffer from restart cost. Basic timestamp ordering and optimistic concurrency control are most popular aggressive techniques.

\subsection {Locking}

Locks are requested by the tm. Lock manager issue and maintain locks. When transactions commits write locks are released. But there are special lock release operations to release read locks. Every transaction acquires locks in two phases. During the growing phase the transactions obtains locks without releasing any locks. During the shrinking phase the transaction releases locks without acquiring any locks. A distributed lock manager facilitates distributed applications with a means to synchronize their accesses to shared resources.

Two Phase Locking (2PL)

In 2PL, a lock is acquired as a result of a read operation. A 2PL scheduler must hold a read lock for transaction t until commit time in order to ensure that no other transaction invalidates data items that transaction t has read, before t commits. In 2PL, write operations have the effect of making the data item unavailable until the writes are committed. In 2PL, a write lock is held until the corresponding commit arrives. On commit, a client in 2PL must send lock releases to the servers in order to release the read locks. Servers in 2PL incur the cost of either preventing deadlocks, or of detecting and breaking them. In a distributed database system, deadlock detection involves both local and global deadlock detection, the latter involving communication among multiple servers. Restarts in 2PL are caused by the need to break deadlocks. \\

Dynamic 2PL and static 2PL are popular basic 2PL technologies. In dynamic 2PL, a transaction acquires a lock only when it wants to access the corresponding data item. In static 2PL each transaction predeclares and acquires all the locks on data items that want to access, before it starts any computation. All variations of 2PL follow strict execution. When following strict execution, scheduler release all read locks of a transaction after the transaction terminates and all the write locks of the transaction after the DM has processed the transaction s commit or abort. 2PL have to face the overhead of extra massages, which use to acknowledge lock sets and to release locks. 2PL also need a mechanism to prevent or detect and break deadlocks. \\\

Basic Two Phase Locking

When using basic two phase locking a transaction must acquire a read lock on data item to read it and acquire a write lock on data item to write data to it. Read locks complicit with other write locks on the same data item. Write locks complicit with other write locks on the same data item. A basic two phase locking scheduler uses 3 rules.\\

\begin{itemize}

\item When a basic 2PL scheduler receives a lock request, it tests whether the requested lock conflicts with another lock which is already set. If there is a conflict then the lock request is queued. Else issue the lock.

\item After 2PL scheduler issuing a lock for a data item, then it cannot release the lock until DM has completed the lock s relevant operation.

\item Once the 2PL scheduler released a lock for a transaction, then it may not acquire any lock for that transaction.

\end{itemize}

A basic 2PL scheduler must have a way to prevent or detect and break deadlocks.

\subsection {Timestamp Ordering (TO)}

When using timestamp ordering technique the TM assigns particular timestamp to each transaction it executes. Each operation is assigned its transaction s timestamp. A timestamp ordering scheduler orders complicating operations according to each operation s timestamp. In a distributed system, the total ordering of timestamps is usually guaranteed by following the above method. Each TM is assigned a number which is unique to it, its site identifier. Each TM maintains a local counter and assigns the value of the counter to each transaction it generates, ensuring that each transaction gets a unique counter value. Timestamps are, therefore, totally ordered, first by their counter value and then by their unique TM numbers. The TM attaches a transaction s timestamp to each read and write operation issued by the transaction. A timestamp ordering scheduler ensures that transaction operations are executed in order according to their timestamps. If there are multiple servers, timestamps are useful in order to make the servers execute transactions in the same order.

\subsubsection {Basic Timestamp Ordering (BTO)}

BTO is an aggressive timestamp ordering technique. A BTO scheduler delivers transaction operations to the DM according to the timestamp order. A BTO scheduler executes operation immediately. The scheduler rejects an operation if it has already executed a conflicting operation with a later timestamp. When an operation is rejected, the corresponding transaction must abort and restart with a later timestamp. If a BTO scheduler receives lots of operations with an order widely different from their timestamp order, then the scheduler has to reject too many operations. It causes to lot of transactions to abort. Starvation of transactions can occur in a BTO system because of continuous conflicts and aborts.

\subsubsection {Conservative Timestamp Ordering (CTO)}

Conservative timestamp ordering is a TO technique which avoids restarts and aborts. A CTO scheduler blocks an operation until it is confirm that no complicating operations with lower time stamp will arrive. For every TM, the scheduler keeps track of the minimum timestamp of a read that it has received from that TM but not yet executed, and also the minimum timestamp of a write that it has received from that TM but not yet executed. The scheduler executes a read if it has received a write with a greater timestamp from every other TM, and buffers the read otherwise. The CTO scheduler executes a write if it has received a read or a write with a greater timestamp from every other TM, and buffers the write otherwise. To keep computation progressing, TMs periodically send timestamped null operations to every scheduler. This message sending is slow. The scheduler processes all operations in timestamp order, not just conflicting ones, thus enforcing an overly restrictive ordering.

\subsubsection {Multiversion Timestamp Ordering (MVTO)}

In a multiversion timestamp-ordering system, DMs maintain multiple versions of each data item. Maintaining multiple versions may not largely affect to the cost of concurrency control, because the versions may be needed by the recovery algorithm. An MVTO scheduler maintains timestamps of various versions of data items. When executing a read operation scheduler tells the DM which version of data item to use. If a read with a later timestamp has already read an earlier version of the data item, then the scheduler rejects a write. On the other hand, if the write can be executed without making a previous read execution invalid, the scheduler and the DM execute the write and create a new version.

\subsection {Time Intervals and Short-Term Locks}

Ordering by serialization numbers is a concurrency control technique for distributed database management systems which increases the level of concurrent execution of transactions. This technique works in the certifier model and uses time-interval techniques with short-term locks to provide serializability and prevent deadlocks. The scheduler is distributed, and the standard transaction execution policy is assumed that the read and write operations are issued continuously during transaction execution. Write operations are copied into the database only when the transaction commits.

\section {Conclusions and Future Directions}

According to this literature revive there are lots of techniques for concurrency control for distributed databases. Each technique has its advantages and drawbacks. It is important to do further research to improve these advantages and to reduce these drawbacks. Dynamic 2PL is the most popular concurrency control technique, and current database designers use it almost exclusively. Popular conception has been that timestamp ordering techniques like Basic Timestamp Ordering (BTO) perform poorly as compared to dynamic 2PL. By using global serializable schedules in distributed database systems distributed transactions can be executed concurrently without interfering with each other. Commitment ordering is a popular protocol which is used to achieve distributed serializability. Strong Strict two phase locking protocol is the best commitment ordering protocol for the above purpose.

\newpage

\section{\large References}

\bibliographystyle{plain}

\begin{thebibliography}{5}

\bibitem{key-1}Ozsu M. Tamer, Ozus ,Patrick Valduriez{}"Principles of Distributed Database systems",Available from<http://www.agiledata.org/essays/tdd.html>

{[}Accesses 12 April 2010{]}

\bibitem{key-2}Rashmi Srinivasa{},"Network-Aided Concurrency Control in Distributed Databases", Available from <http://www.rashmi.org/phd/thesis.pdf>

Available from<http://www.agiledata.org/essays/databaseRefactoring.html>

{[}Accesses 12 April 2010{]}

\bibitem{key-3}Serializability,Available from<http://en.wikipedia.org/wiki/Serializability>

{[}Accesses 22 May 2010{]}

\bibitem{key-4}Commitment Ordering,Available from<http://www.agiledata.org/essays/tdd.html>

{[}Accesses 22 May 2010{]}

\bibitem{key-5}www.cs.gsu.edu/yli/teaching/Spring08/DB/slides/gul.pp

{[}Accesses 22 May 2010{]}

\end{thebibliography}

\end{document}