Friday, 6 May 2016

SQL ISOLATION

Problem
You are looking at ways of ensuring atomicity and consistency in your database by ensuring transactional integrity. You may want to avoid locks, or ensure problems such as lost updates or dirty reads cannot occur, or otherwise ensure the quality of your data. Transactional isolation is a narrow but somewhat complex subject that might be ideal for your needs.
Solution
This article will cover the five transaction isolation settings - READ UNCOMMITTED, READ COMMITTED (locking), READ COMMITTED (snapshot), REPEATABLE READ and SERIALIZATION. You'll find short explanations on the differences between them with emphasis on practical code examples to demonstrate the effects at different isolation levels. Finally, you'll find some links at the bottom for further reading. Please execute the code in 'Setting up the Test Environment' below in your development database context to get started, or read the 'ACID: Review' section below for a bit of background in the four principles of an RDBMS system.

ACID: Review

Before I dive into transaction isolation, it's worth taking a brief look at four of the fundamental principles underlying relational database management systems. These four principles are referred to as 'ACID', and each letter is an acronym for one property of RDBMS systems that is non-negotiable for the sake of the integrity of the system. These are:
  • READ UNCOMMITTED: A query in the current transaction can read data modified within another transaction but not yet committed. The database engine does not issue shared locks when Read Uncommitted is specified, making this the least restrictive of the isolation levels. As a result, it’s possible that a statement will read rows that have been inserted, updated or deleted, but never committed to the database, a condition known as dirty reads. It’s also possible for data to be modified by another transaction between issuing statements within the current transaction, resulting in nonrepeatable reads or phantom reads.

  • READ COMMITTED: A query in the current transaction cannot read data modified by another transaction that has not yet committed, thus preventing dirty reads. However, data can still be modified by other transactions between issuing statements within the current transaction, so nonrepeatable reads and phantom reads are still possible. The isolation level uses shared locking or row versioning to prevent dirty reads, depending on whether the READ_COMMITTED_SNAPSHOT database option is enabled. Read Committed is the default isolation level for all SQL Server databases.

  • REPEATABLE READ: A query in the current transaction cannot read data modified by another transaction that has not yet committed, thus preventing dirty reads. In addition, no other transactions can modify data being read by the current transaction until it completes, eliminating nonrepeatable reads. However, if another transaction inserts new rows that match the search condition in the current transaction, in between the current transaction accessing the same data twice, phantom rows can appear in the second read.

  • SERIALIZABLE: A query in the current transaction cannot read data modified by another transaction that has not yet committed. No other transaction can modify data being read by the current transaction until it completes, and no other transaction can insert new rows that would match the search condition in the current transaction until it completes. As a result, the Serializable isolation level prevents dirty reads, nonrepeatable reads, and phantom reads. However, it can have the biggest impact on performance, compared to the other isolation levels.

  • SNAPSHOT: A statement can use data only if it will be in a consistent state throughout the transaction. If another transaction modifies data after the start of the current transaction, the data is not visible to the current transaction. The current transaction works with a snapshot of the data as it existed at the beginning of that transaction. Snapshot transactions do not request locks when reading data, nor do they block other transactions from writing data. In addition, other transactions writing data do not block the current transaction for reading data. As with the Serializable isolation level, the Snapshot level prevents dirty reads, nonrepeatable reads and phantom reads. However, it is susceptible to concurrent update errors. (not  ANSI/ISO SQL standard)

Setting up the Test Environment

-- init
SET NOCOUNT OFF
ALTER DATABASE [YourDB] SET READ_COMMITTED_SNAPSHOT ON
GO
CREATE TABLE dbo.TestIsolationLevels (
EmpID INT NOT NULL,
EmpName VARCHAR(100),
EmpSalary MONEY,
CONSTRAINT pk_EmpID PRIMARY KEY(EmpID) )
GO
INSERT INTO dbo.TestIsolationLevels 
VALUES 
(2322, 'Dave Smith', 35000),
(2900, 'John West', 22000),
(2219, 'Melinda Carlisle', 40000),
(2950, 'Adam Johns', 18000) 
GO

Experiment 1: Read using READ UNCOMMITTED

READ UNCOMMITTED is the most optimistic concurrency isolation option available in SQL Server. It allows a transaction to get the value in a row even when locks are present on the row/object or it hasn't yet been committed to disk. Reads like this are also known as 'dirty reads' since they effectively read from the transaction log rather than disk or cache - the data is unpersisted. (Note if no concurrent transactions are occurring, the read will occur from disk/cache). To show the effects of READ UNCOMMITTED, we can open a transaction as follows:
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels 
SET     EmpSalary = 25000
WHERE   EmpID = 2900
Now select the value that's being updated using the following (in a separate query window):
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED
SET NOCOUNT ON
GO
SELECT EmpID, EmpName, EmpSalary
FROM dbo.TestIsolationLevels
WHERE EmpID = 2900

Note the value for empSalary reflects the current *uncommitted* value
Note the value for empSalary reflects the current *uncommitted* value. You can view the intent lock on the key (empID) and the intent exclusive locks on the object containers (the page on which the row is located and the object) imposed by the UPDATE statement using the following:
SELECT      es.login_name, tl.resource_type, 
            tl.resource_associated_entity_id,
            tl.request_mode, 
            tl.request_status
FROM        sys.dm_tran_locks tl
INNER JOIN  sys.dm_exec_sessions es ON tl.request_session_id = es.session_id 
WHERE       es.login_name = SUSER_SNAME() AND tl.resource_associated_entity_id <> 0
Now rollback the transaction to reset the EmpSalary for this employee to 22000.00:
ROLLBACK;

Experiment 2: Read using READ COMMITTED (snapshot)

There are two levels of READ COMMITTED isolation, which are locking and snapshot. Locking is the most straightforward, and simply means that once an UPDATE transaction is open, exclusive and intent-exclusive locks are taken out on the page, key range (if appropriate) and object. When reading the row using READ COMMITTED while using locking, the SELECT query used will hang until the value of LOCK_TIMEOUT (session-level parameter, if set) has expired, at which point an error will be returned.
If the value of the database-level option READ_COMMITTED_SNAPSHOT is False, locking mode for READ COMMITTED transactions is the default option. If it is True, then snapshot is the default option unless overridden by the READCOMMITTEDLOCK table hint.
Here's a demonstration of READ COMMITTED isolation with locking, by using the table hint in a database with READ_COMMITTED_SNAPSHOT ON. You don't need the table hint if this value is OFF:
ALTER DATABASE [YourDB] SET READ_COMMITTED_SNAPSHOT ON
GO
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels 
SET     EmpSalary = 25000
WHERE   EmpID = 2900
Now in a separate query window:
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels WITH (READCOMMITTEDLOCK)
WHERE   EmpID = 2900
The query will hang as it is waiting for the key lock on EmpID to be released. Allow the query to execute by issuing in your first window:
ROLLBACK;
The query will hang as it is waiting for the key lock on EmpID to be released
READ COMMITTED with snapshot is different from locking. The UPDATE statement above will take the same locks on the objects, but with the SELECT transaction session specifying snapshot isolation, the row returned will be the last row version before the UPDATE statement began. This row version is stored internally and represents the last consistent state of the row. Logically it follows that if you are using row versioning, this capability must be DB-wide, since otherwise the transaction with the UPDATE statement would not know to maintain a version of the row before issuing the UPDATE. Therefore, to use snapshot isolation the option must be set using the ALTER DATABASE statement (note that all database user connections will be killed when doing this).
Note About Row Versioning
Row versioning is an internal feature used by SQL Server to maintain recent copies of rows that have been changed, for the purposes of maintaining table consistency and ensuring better isolation from reads or writes of transactions that concurrently access the same rows. Row versioning, also called 'Row-Level Versioning (RLV)' was first introduced in SQL Server 2005. Historical rows are kept in the 'version store', inside TEMPDB, and each row that has been 'versioned' has a row pointer added to it which allows the query engine to locate the versioned row. Interestingly, the 'inserted' and 'deleted' tables used with triggers and the OUTPUT clause (to name two uses) use a similar method of versioning. There are performance sacrifices made when using this level of transaction isolation - please see the note under 'Next Steps' for more information.
Below is an example of using the READ COMMITTED with snapshot isolation:
ALTER DATABASE [YourDB] SET READ_COMMITTED_SNAPSHOT ON
GO
SET TRANSACTION ISOLATION LEVEL READ COMMITTED
SET NOCOUNT ON
GO
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels 
SET     EmpSalary = 25000
WHERE   EmpID = 2900
Now in a separate query window:
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
The query will return the last consistent row of data. Note the empSalary column is 22000.00 despite the transaction being open and the update written (but uncommitted). This is correct, and the SELECT is reading from the previous row version, not the present state of the row. This is compliant with the C in ACID - consistency.
The query will return the last consistent row of data.

Experiment 3: Read using SNAPSHOT isolation

For all intents and purposes, reads using READ COMMITTED - snapshot and SNAPSHOT are almost identical - but not identical. There are some differences when it comes to details and behavior. READ COMMITTED - snapshot will read the most recent consistent row version since the start of the statement being issued, where snapshot isolation will read the most consistent row version since the transaction started. This can cause problems with concurrent transactions since SELECTs inside the transaction that occur later than the COMMIT time of the UPDATE transaction will return an incorrect value. Likewise, update conflicts can occur for the same reason when concurrent updates are attempted.
To use snapshot isolation, you must first enable the feature as follows:
ALTER DATABASE [YourDB] SET ALLOW_SNAPSHOT_ISOLATION ON
Now start the UPDATE again, and issue the SELECT in a separate query window like so:
SET TRANSACTION ISOLATION LEVEL SNAPSHOT
SET NOCOUNT ON
GO
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels 
SET     EmpSalary = 25000
WHERE   EmpID = 2900
Now in a separate query window:
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
You will note that, like READ COMMITTED, the correct snapshot of the data row is returned, yielding empSalary = 22000, which is consistent and correct. Rollback the transaction.
You will note that, like READ COMMITTED, the correct snapshot of the data row is returned, yielding empSalary = 22000

Experiment 4: Read using REPEATABLE READ

The REPEATABLE READ isolation level is similar to the READ COMMITTED isolation level, in that it guarantees the output of uncommitted transactions won't be read by other concurrent transactions. However, if a separate concurrent transaction commits before the first one, it is possible to read the same row twice within the transaction and obtain different values. Likewise it is possible that additional 'phantom' rows could be present depending on the behavior of the concurrent transaction.
Execute the following:
SET TRANSACTION ISOLATION LEVEL READ COMMITTED 
SET NOCOUNT ON
GO
BEGIN TRAN
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
WAITFOR DELAY '00:00:10'
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
COMMIT
Now while this is executing, execute the following in a separate query window:
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels 
SET     EmpSalary = 25000
WHERE   EmpID = 2900
COMMIT
Note the results below. Despite the two SELECTs being in one explicit transaction, the empSalary value differs between the individual statements in that transaction. The next isolation level helps to solve this problem.
Despite the two SELECTs being in one explicit transaction, the empSalary value differs between the individual statements in that transaction.
REPEATABLE READ is the isolation level to use if read requests (note: not updates) are returning inconsistent data *within one transaction*, and consists of a superset of the READ COMMITTED isolation type features (i.e. it encapsulates READ COMMITTED characteristics). Here is an example of using REPEATABLE READ when a concurrent UPDATE is occurring:
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
SET NOCOUNT ON
GO
BEGIN TRAN
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
WAITFOR DELAY '00:00:10'
SELECT  EmpID, EmpName, EmpSalary
FROM    dbo.TestIsolationLevels 
WHERE   EmpID = 2900
COMMIT
Run the below while the above is executing:
BEGIN TRAN
UPDATE  dbo.TestIsolationLevels
SET     EmpSalary = 25000
WHERE   EmpID = 2900
COMMIT

You'll notice that the UPDATE transaction is waiting on the SELECT transaction
You'll notice that the UPDATE transaction is waiting on the SELECT transaction, and that the SELECT transaction yields the correct data if the transaction consistency as a whole is considered. Interestingly though, this still doesn't hold true for phantom rows - it's possible to insert rows into a table and have the rows returned by a calling SELECT transaction even under the REPEATABLE READ isolation level.
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ
SET NOCOUNT ON
GO
BEGIN TRAN
SELECT  EmpName
FROM    dbo.TestIsolationLevels 
WAITFOR DELAY '00:00:10'
SELECT  EmpName
FROM    dbo.TestIsolationLevels 
COMMIT
Run the below while the above is executing:
BEGIN TRAN
INSERT INTO dbo.TestIsolationLevels VALUES (3427, 'Phantom Employee 1', 30000)
COMMIT

To counter this problem, we need to use the SERIALIZABLE isolation level - the toughest of the bunch
To counter this problem, we need to use the SERIALIZABLE isolation level - the toughest of the bunch.

Experiment 5: Serializable Isolation

SERIALIZABLE has all the features of READ COMMITTED, REPEATABLE READ but also ensures concurrent transactions are treated as if they had been run in serial. This means guaranteed repeatable reads, and no phantom rows. Be warned, however, that this (and to some extent, the previous two isolation levels) can cause large performance losses as concurrent transactions are effectively queued. Here's the phantom rows example used in the previous section again but this time using the SERIALIZABLE isolation level:
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE
SET NOCOUNT ON
GO
BEGIN TRAN
SELECT  EmpName
FROM    dbo.TestIsolationLevels 
WAITFOR DELAY '00:00:10'
SELECT  EmpName
FROM    dbo.TestIsolationLevels 
COMMIT
Run the below while the above is executing:
BEGIN TRAN
INSERT INTO dbo.TestIsolationLevels VALUES (3427, 'Phantom Employee 1', 30000)
COMMIT

Experiment 5: Serializable Isolation

Note About Performance
I hope this article has been relevant and interesting, but I cannot finish without writing about performance. The reason there are five variable levels of transaction isolation in SQL Server is so the DBA or developer can tailor the isolation level to the type of query (and frequency of query) being performed. Generally, the more pessimistic the isolation level (SERIALIZABLE, READ COMMITTED - locking), the worse the performance of the query operating under that scope. This is plain to see when you consider one example, READ COMMITTED - locking, which forces other queries to wait for the resources being held for the first query. This can cause significant performance delays along the application stack, potentially leading to timeouts or other errors.

However, reducing transaction isolation levels to the most optimistic (READ UNCOMMITTED) is not necessarily a good idea under all circumstances either, for the reasons demonstrated in this article. Some systems, such as finance / banking systems, require absolute data integrity and for this to be maintained, the isolation principle is paramount. You should choose carefully the level of transaction isolation depending on what is required from the query or queries you are writing. Your queries, and associated isolation levels, should always be tested in a suitable test/development environment before deployment to production.

No comments:

Post a Comment