Friday, May 18, 2007

Debugging - Exception stack traces are not always detailed enough

We often use the Microsoft Exception Management Application Block ( MEMAB )to publish unhandled exceptions in our applications. This provides a detailed log including all inner exceptions contained within the outer most exception. For each exception the message and stack trace are output. For the inner most exception the stack trace usually contains the actual method which caused the exception.

Although most well written applications should contain small reusable methods of perhaps no more than 20 lines, this is often not the case. If methods only contain a few lines, even with a stack trace it is sometimes very difficult or impossible to pin point the exact line which is causing the exception to occur. To find out the exact line causing the error the best course of action would be to take a dump. You can find out how to do this for a .NET application in my article under “Managed First-Chance Exception”. Once you have a dump you can usually find the line of code on the production server without even using debugging symbols using the techniques highlighted below.

Assuming you have a System.NullReferenceException thrown from a .NET component you should trap access violation exceptions ( av code ) and take a full dump on this type of exception. E.g. assuming the MEMAB logs the following exception info

Exception Type: System.NullReferenceException
Message: Object reference not set to an instance of an object.
TargetSite: Void Tranmit.Common.Interfaces.IXmlSerialisable.LoadFromNode(System.Xml.XmlNode, System.Collections.Hashtable)
We can find out the exact location in LoadFromNode by loading the full dump file in windbg then executing

We can find out the exact location in LoadFromNode by loading the full dump file in windbg then executing

0:000> !u @eip
Will print '>>> ' at address: 04a34312
Normal JIT generated code
[DEFAULT] [hasThis] Void Tranmit.Sprinter.AccountCodeCollection.Tranmit.Common.Interfaces.IXmlSerialisable.LoadFromNode(Class System.Xml.XmlNode,Class System.Collections.Hashtable)
Begin 04a34168, size 25b
04a34168 55 push ebp
04a34169 8bec mov ebp,esp

04a342de ff15381ea204 call dword ptr [04a21e38] (Tranmit.Sprinter.AccountCode.ExtractIdentityFromNodeAsString)
04a342e4 8945d8 mov [ebp-0x28],eax
04a342e7 b9241fa204 mov ecx,0x4a21f24 (MT: Tranmit.Sprinter.HistoryItem)
04a342ec e827ddf8fb call 009c2018mscorwks.pdb not exist
Use alternate method which may not work.
04a342f1 8bf0 mov esi,eax
04a342f3 8b45e8 mov eax,[ebp-0x18]
04a342f6 8945c8 mov [ebp-0x38],eax
04a342f9 8b157cc31202 mov edx,[0212c37c] ("The level {0} account code {1} could not be found")
04a342ff 8955cc mov [ebp-0x34],edx
04a34302 b978afba79 mov ecx,0x79baaf78 (MT: System.Int32)
04a34307 e80cddf8fb call 009c2018mscorwks.pdb not exist
Use alternate method which may not work.

04a3430c 8945d0 mov [ebp-0x30],eax
04a3430f 8b4ddc mov ecx,[ebp-0x24]
>>> 04a34312 8b01 mov eax,[ecx]
04a34314 8b400c mov eax,[eax+0xc]
04a34317 8b803c040000 mov eax,[eax+0x43c]

Reviewing the source code we can see

String accountCodeIdentity = Tranmit.Sprinter.AccountCode.ExtractIdentityFromNodeAsString(accountCodeNode);
historyItems.Add(new HistoryItem(this,
String.Format("The level {0} account code {1} could not be found", accountCode.Level, accountCodeIdentity),

Although we have no symbols loaded on the production server we can be pretty sure that the line

04a34302 b978afba79 mov ecx,0x79baaf78 (MT: System.Int32)

Refers to the references in the String.Format method call to the Int32 property Level referenced on the accountCode property. Reviewing the code surrounding this line we can see why the account code object would be equal to a null reference and proceed to diagnose the issue further.