Tuesday, March 27, 2012

Consistent Deadlock, need to identify the rogue PAGE lock

I'm having trouble tracking down a deadlock in a large multi-threaded application. I have been able to gather some information, but not quite enough to identify the two queries at fault.

I collected the following by turning on the -T1204 and -T3605 parameters:

Deadlock encountered .... Printing deadlock information
2006-04-13 09:13:48.70 spid3
2006-04-13 09:13:48.70 spid3 Wait-for graph
2006-04-13 09:13:48.70 spid3
2006-04-13 09:13:48.70 spid3 Node:1
2006-04-13 09:13:48.70 spid3 KEY: 10:1993058136:1 (aa0069f30857) CleanCnt:3 Mode: U Flags: 0x0
2006-04-13 09:13:48.70 spid3 Grant List 1::
2006-04-13 09:13:48.71 spid3 Owner:0x1a0d5b80 Mode: U Flg:0x0 Ref:1 Life:00000000 SPID:61 ECID:0
2006-04-13 09:13:48.71 spid3 SPID: 61 ECID: 0 Statement Type: UPDATE Line #: 1
2006-04-13 09:13:48.71 spid3 Input Buf: RPC Event: sp_execute;1
2006-04-13 09:13:48.71 spid3 Requested By:
2006-04-13 09:13:48.71 spid3 ResType:LockOwner Stype:'OR' Mode: U SPID:53 ECID:0 Ec:(0x1B4E54F0) Value:0x1a0d5ae0 Cost:(0/BA0)
2006-04-13 09:13:48.71 spid3
2006-04-13 09:13:48.71 spid3 Node:2
2006-04-13 09:13:48.71 spid3 PAG: 10:4:14898 CleanCnt:2 Mode: U Flags: 0x2
2006-04-13 09:13:48.71 spid3 Grant List 0::
2006-04-13 09:13:48.71 spid3 Owner:0x1a0c6bc0 Mode: U Flg:0x0 Ref:1 Life:00000000 SPID:58 ECID:0
2006-04-13 09:13:48.71 spid3 SPID: 58 ECID: 0 Statement Type: UPDATE Line #: 1
2006-04-13 09:13:48.71 spid3 Input Buf: RPC Event: sp_execute;1
2006-04-13 09:13:48.71 spid3 Requested By:
2006-04-13 09:13:48.71 spid3 ResType:LockOwner Stype:'OR' Mode: IU SPID:61 ECID:0 Ec:(0x1C2B54F0) Value:0x1a0d5440 Cost:(0/B98)
2006-04-13 09:13:48.71 spid3
2006-04-13 09:13:48.71 spid3 Node:3
2006-04-13 09:13:48.71 spid3 PAG: 10:4:14899 CleanCnt:2 Mode: IX Flags: 0x2
2006-04-13 09:13:48.71 spid3 Grant List 1::
2006-04-13 09:13:48.71 spid3 Owner:0x1a0c6e80 Mode: IX Flg:0x0 Ref:2 Life:02000000 SPID:55 ECID:0
2006-04-13 09:13:48.71 spid3 SPID: 55 ECID: 0 Statement Type: UPDATE Line #: 1
2006-04-13 09:13:48.71 spid3 Input Buf: RPC Event: sp_execute;1
2006-04-13 09:13:48.71 spid3 Requested By:
2006-04-13 09:13:48.71 spid3 ResType:LockOwner Stype:'OR' Mode: U SPID:58 ECID:0 Ec:(0x1C0114F0) Value:0x1a0d5a20 Cost:(0/B98)
2006-04-13 09:13:48.71 spid3
2006-04-13 09:13:48.71 spid3 Node:4
2006-04-13 09:13:48.71 spid3 KEY: 10:1993058136:1 (aa0069f30857) CleanCnt:3 Mode: U Flags: 0x0
2006-04-13 09:13:48.71 spid3 Wait List:
2006-04-13 09:13:48.71 spid3 Owner:0x1a0d5ae0 Mode: U Flg:0x0 Ref:1 Life:00000000 SPID:53 ECID:0
2006-04-13 09:13:48.71 spid3 SPID: 53 ECID: 0 Statement Type: UPDATE Line #: 1
2006-04-13 09:13:48.71 spid3 Input Buf: RPC Event: sp_execute;1
2006-04-13 09:13:48.71 spid3 Requested By:
2006-04-13 09:13:48.71 spid3 ResType:LockOwner Stype:'OR' Mode: U SPID:55 ECID:0 Ec:(0x1BFB94F0) Value:0x1a0c6fc0 Cost:(0/BA0)
2006-04-13 09:13:48.71 spid3 Victim Resource Owner:
2006-04-13 09:13:48.71 spid3 ResType:LockOwner Stype:'OR' Mode: U SPID:58 ECID:0 Ec:(0x1C0114F0) Value:0x1a0d5a20 Cost:(0/B98)

I then ran the following queries: -

> select name from sysobjects where id=1993058136;
result: ACCOUNTARRANGEMENTSCHEDULES

> select name from sysindexes where indid=1 and id=1993058136;
result: PK_ACCOUNTARRANGEMENTSCHEDULES

> sp_help PK_ACCOUNTARRANGEMENTSCHEDULES
result: PK_ACCOUNTARRANGEMENTSCHEDULES , dbo , primary key cns , 2005-11-01 13:51:01.650

It looks like there is sql server selected a row lock for the ACCOUNTARRANGEMENTSCHEDULES table (sounds good), but for the UPDATE being done by nodes 2 and 3 a page lock is being used. Is there any way I can get more info on what the statement is or what table it is updating?
I found some more useful info, using the following command: -

dbcc traceon (3604)
dbcc page(10, 4, 14898,2)
dbcc traceoff (3604)

I received the following (amongst other things): -
m_objId = 1993058136

Which would indicate the page lock is on the same table (ACCOUNTARRANGEMENTSCHEDULES).|||

The page that the deadlock occurs on has little to do with what led to the deadlock.

This is the deadlock scenario:

Spid 61 -> owns KEY: 10:1993058136:1 (aa0069f30857)
-> requesting 10:4:14898
Spid 58 -> owns 10:4:14898
-> Requesting 10:4:14899
Spid 55 -> Owns 10:4:14899
-> requesting 10:1993058136:1 (aa0069f30857)

So spid 61 used an index to find a row it was interested in and the needed additional data off page 10:4:14898.

Spid 55 appears to have found a row it intends to update on page 10:4:14899, and now needs a key lock to perform the necessary index maintenance.

Spid 58 appears to be scanning as it is requesting the page with the next ID, which likely just happens to be the next page in the list.

All 3 of these spids are exectuting some kind of prepared statement that does an update, as evidenced by the "sp_execute" statement in the input buffer.

To get to the bottom of this you're going to need to look at the exectution plan of these statements. Use profiler and gather the showplan all, sp:stmtstarting, sp:stmtcompleted, rpc starting and rpc completed events. If the plan shows (as I suspect it will) something like CLUSTERED INDEX SCAN, then you need to think about proper indexes to support these statements.

|||Unfortunately, whenever I use the profiler (even if I only turn on Deadlock and Deadlock Chain events) it seems to change the timings enough that the deadlock is not produced.

I don't know how I can change the indexes. The table has a primary key, which is a composite of two columns. Since a primary key exists, why is a page lock being used? The only thing that I can think that could be causing it, is the fact that an updateable jdbc ResultSet is being used by one of the queries. So it's not strictly an UPDATE, it's a SELECT (that selects only 1 row) then an updateRow.

This deadlock only appeared when introducing sp4. With sp3, no deadlocks occur. Has the lock selection algorithm changed?
|||The lock selection algorithm has not changed. It's possible that you have a different plan now. Even if you don't hit the deadlock the plans themselves can be instructive. If you see Scan's in the plan you want to understand why. Of course a Scan is going to cause each page in an index to be read and locked. The fewer pages that are locked the fewer chances there are of encountering a blocking, or deadlocking scenario.

No comments:

Post a Comment