Matt Adamson: 2005

Sunday, August 07, 2005

Visual Basic Production Debugging

A Visual Basic (VB) error is usually raised by either an explicit call to Err.Raise or by calling another COM object which raises an error. In both cases VB provides similar error handling to VB Script through the On Error Resume Next and On Error Goto statements. In VBScript production debugging I discussed how to generate dumps and obtain error information when VB Script errors occurred. Obtaining the same information when a VB error is raised is raised takes a bit more work.

You may be using VB components from within an ASP web application to perform a lot of business logic and may find some errors are being raised but ignored by either the components or the ASP script. Trapping these errors in a production environment is therefore very useful.

The main difference between VBScript and VB errors are VB errors are raised as true native exceptions (exception code c000008f ). To trap VB script errors break points had to be placed within the scripting runtime module. In this post I will walk you through the commands you need to execute on a live or full dump.After we’ve gone through the commands we’ll put together an adplus script to capture this information which you can use on production servers.

When the dump is open switch to the VB thread - you can use ~*kb to see which threads have MSVBVM60. e.g.

0:000> ~20s

Determine the index of the EBTHREAD object in thread local storage (TLS).

0:020> dd msvbvm60!g_itlsebthread l1
6aae0ec0 0000001e

Find the location of this thread's TLS based on the thread environment block (TEB). If you want to dump the Err object for other VB threads just get the TEB for those threads.

0:020> ~. 20
Id: 758.5bc Suspend: 1 Teb: 7ffa8000 Unfrozen
0:020> ?7ffa8000 +0xe10 Evaluate expression: 2147126800 = 7ffa8e10

Get the location of our EBTHREAD object based on the index into TLS

0:020> ? 7ffa8e10 + 0000001e * 0x4
Evaluate expression: 2147126920 = 7ffa8e88
0:020> dd 7ffa8e88 l1
7ffa8e88 0977ef58

Get the address of the EXCEPINFOA member variable based on its offset into EBTHREAD

0:020> ?0977ef58+0x78
Evaluate expression: 158855120 = 0977efd0

Now dump the EXCEPINFO data

0:020> dd 0977efd0 l8
0977efd0 00000000 250613ec 250314da 00000000
0977efe0 00000000 00000000 6aaa8bcf 80004005

At this point you have all of the information in the VB Error object. In this particular instance we have the bstrSource, bstrDescription, the pfnDeferredFillIn, and the scode. The pfnDeferredFillIn is simply a function in VB that is called as needed to get more information about the error object (ie. the description). This is called when the VB component tries to get the data (ie. access the Err.Description property) which is why the information may not be filled in sometimes.

For this dump we can see the error source

0:020> du 250613ec
250613ec "Microsoft OLE DB Provider for SQ"
250613ec "L Server"

and the description

0:020> du 250314da
250314da "[DBNETLIB][ConnectionOpen (Conne"
250314da "ct()).]SQL Server does not exist"
250314da " or access denied."

and the error number

0:020> ? 80004005
Evaluate expression: -2147467259 = 80004005

As a full dump is required to access this information you can typically run into performance issues on the server if many exceptions are being generated due to the size of the files. It would be far more effective if we could use an adplus script to extract the error source, description, and number from a process at runtime. This avoids the need to create dump files and adds all the relevant information in the adplus log file. The script is shown below

<ADPlus>
<Settings>
<RunMode> CRASH </RunMode>
</Settings>
<Exceptions>
<NewException>
<Code> c000008f </Code>
<Name> VBError </Name>
</NewException>
<Config>
<Code>c000008f</Code>
<CustomActions1>
.echo VB error occurred.;
.echo ;
r $t1 = poi($teb+0xe10+poi(MSVBVM60!g_itlsebthread)*4)+0x78;
.echo Source ;
.if poi($t1 + 0x4)=0 { .echo Unknown } .else { du poi($t1 + 0x4) };
.echo Description ;
.if poi($t1 + 0x8)=0 { .echo Unknown } .else { du poi($t1 + 0x8) };
.echo Error number ;
?poi($t1 + 0x1c);
</CustomActions1>
</Config>
</Exceptions>
</ADPlus>

The script uses some useful commands such as $teb to extract the TEB memory address and .if / .else to allow commands to be conditionally executed. It also sets a pseudo register $t1 to improve the performance of the error extraction, otherwise this data would have to be evaluated three times.

For some VB errors where the description is not filled in you can open VB6 and in the immediate window just type 'error ’. If you don’t have access to debugging symbols on the server you can use the same techniques I discussed in Setting breakpoints without symbols. In this case you should find the memory address of the symbol MSVBVM60!g_itlsebthread and replace it with the memory address e.g. MSVBVM60 + 0x12345678.

Thursday, July 07, 2005

Setting breakpoints when production servers have no access to symbols

In some cases a production server may not

Have access to the internet to use the Microsoft symbol server.
Access to symbol files using a UNC path.

In both cases it is impossible to set break points using symbolic names in either a live debug session or when using adplus. This is clearly a problem for the adplus configuration script I discussed in VB Script production debugging.

If symbols aren’t available there is another solution by adding break points set using a memory address location. In the previous blog the adplus config file was setting a breakpoint on the function vbscript!CScriptRuntime::RecordErrorContext. To find out the memory address for the breakpoint we need to determine what the offset to vbscript!CScriptRuntime::RecordErrorContext is in vbscript.dll.

First we need to take a dump of the process on the production server and load it into windbg on a machine which does have access to a symbol store. Open the dump and enter the x command to find the address of the function

0:031> x vbscript!CScriptRuntime::RecordErrorContext
734753cf vbscript!CScriptRuntime::RecordErrorContext =

Then we need to find the module address for vbscript.

0:031> lmm vbscript
start end module name
73460000 734c5000 vbscript

Subtract the module start address from the symbol address

0:031> ?734753cf - 73460000
Evaluate expression: 86991 = 000153cf

Now we have our offset we can use it to set a break point either during a live debug or in the adplus config file. The dump I used in the example above was using 5.6.0.8515 of vbscript.dll. The Address element of the adplus configuration file should be modified from

vbscript!CScriptRuntime::RecordErrorContext

vbscript + 0x000153cf

Sunday, July 03, 2005

Effective Production Debugging MSDN Article

Following my recent production debugging MSDN article (http://msdn.microsoft.com/msdnmag/issues/05/07/Debugging/default.aspx.
I've written a number of posts on techniques I wanted to cover in the article but couldn't as I ran out of space.

I've already written on topics such as creating dumps through terminal services and VB Script debugging however If there's anything you'd especially like to see let me know and I'll try and cover it.

Wednesday, June 29, 2005

VBScript Production Debugging

An error typically occurs in an ASP application running VB Script when either

An On Error Resume Next statement is called prior to executing some script. A number of method calls are then made which can potentially raise errors. After each statement is completed the Err.Number property is usually checked and action taken such as logging the error and providing user friendly messages.
No On Error Resume Next statement is called prior to executing some script . In this case any statements that follow will halt the processing of the script and return a runtime application error to the user.

Whilst the first case is the preferred approach to handle errors gracefully most developers typically do not check most statements for errors. If Err.Number is not checked errors can be silently ignored by the server side script, and then consequently cause other errors which are logged. It is therefore essential to be able to log information on the original error, namely the source, description and number.

Unlike languages such as C++ / VB or .NET exceptions aren’t raised to the debugger from scripting engines such as VBScript. Consequently we can’t use adplus to capture a dump file when a specific type of exception occurs. However when an error is raised a number of functions are called within the scripting runtime.

vbscript!COleScript::OnScriptError. This is called when a script occurs in the 2nd case above
vbscript!CScriptRuntime::RecordErrorContext. This is called to record an error in the scripting runtime when On Error Resume Next has been called.

Although we can’t use adplus to create a dump file on an exception (because none is raised) we can use it to create a dump when a specific break point is reached. Opening the dump file shows the following call stack

0:023> kvn
# ChildEBP RetAddr Args to Child
00 0148f4a8 733183bf 0003aa70 000b5f48 00000000 vbscript!CScriptRuntime::RecordErrorContext
01 0148f4c4 73318320 0003aa70 0148f5cc 00000010 vbscript!CSession::RecordExcepInfoAndClear
02 0148f5b8 73304b01 0003a9c0 000b5f48 60030001 vbscript!InvokeDispatch

If we dump the first argument plus 8 we can see the EXCEPINFO structure which is passed to this method. EXCEPINFO is the automation structure which defines an error i.e.

struct EXCEPINFOA, 9 elements, 0x20 bytes
+0x000 wCode : Uint2B
+0x002 EXC_unused1 : UChar
+0x004 bstrSource : Ptr32 to Void
+0x008 bstrDescription : Ptr32 to Void
+0x00c bstrHelpFile : Ptr32 to Void
+0x010 dwHelpContext : Uint4B
+0x014 EXC_unused2 : UChar
+0x018 pfnDeferredFillIn : Ptr32 to Void
+0x01c scode : Uint4B

Reviewing the dump we see

0:023> dd 0003aa70 + 8
0003aa78 00000000 000b85bc 00121254 00000000
0003aa88 000f43e8 00000000 00000000 800a0063
0003aa98 00000000 00000000 00000000 00000000
0003aaa8 00000000 00000000 000a0004 0008012a

To find the source and description we can use the du command i.e.

0:023> du 000b85bc
000b85bc "MyVBClass"
0:023> du 00121254
00121254 "Failed to get supplier invoice, "
00121294 "Reason=Failed to open connection"
001212d4 ", reason=Object required"

However to get this information a full dump needs to be created. If many script errors occur this maybe unworkable in a production environment. A more elegant approach involves using a custom action in an adplus config file to extract the source, description and number and add to the log file adplus produces. i.e.

<ADPlus>
<Settings>
<RunMode>CRASH</RunMode>
<ProcessID><Put process id in here></ProcessID>
</Settings>
<Breakpoints>
<NewBP>
<Address>vbscript!CScriptRuntime::RecordErrorContext</Address>
<CustomActions>.echo -----;.echo VB Script On Error Resume Next error occurred.;.echo;.echo Source:;du poi(poi(esp+4)+c);.echo Description:;du
poi(poi(esp+4)+10);.echo Code:;?poi(poi(esp+4)+24);g</CustomActions>
<Actions>Log;Stack;</Actions>
<ReturnAction>G</ReturnAction>
</NewBP>
<NewBP>
<Address>vbscript!COleScript::OnScriptError </Address>
<CustomActions>.echo -----;.echo VB Script On Error Resume Next error occurred.;.echo;.echo Source:;du poi(poi(esp+4)+c);.echo Description:;du
poi(poi(esp+4)+10);.echo Code:;?poi(poi(esp+4)+24);g</CustomActions>
<Actions>Log;Stack;</Actions>
<ReturnAction>G</ReturnAction>
</NewBP>
</Breakpoints>
</ADPlus>

The most interesting line in the config file is

.echo Source:;du poi(poi(esp+4)+c);.echo Description:;du poi(poi(esp+4)+10);.echo Code:;?poi(poi(esp+4)+24);

The source and description are output as strings using the du command. From the parameter poi(poi(esp+4)+c), esp+4 is the stack pointer + 1 DWORD which equals the first parameter to the function were breaking on ( RecordErrorContext or OnScriptError ). This is dereferenced using the poi operator from which we then add 0xC to get to the offset of the Source member in the EXCEPINFO structure. The Source member is just a pointer to a string so poi is used again to dereference it. Description is handled the same and the error code is simply output using the ? evaluate expression command.

When a VB Script exception occurs you get all the relevant information provided in the log file along with any other exceptions which are raised i.e.

VB Script On Error Resume Next error occurred.

Source:
0017edf4 "MyVBClass"
Description:
0017eeac "Failed to get supplier invoice, "
0017eeec "Reason=Failed to open connection"
0017ef2c ", reason=Object required"
Code:
Evaluate expression: -2146828189 = 800a0063

Thursday, June 16, 2005

Getting dumps for apps failing on start up using terminal services

In my previous post http://mattadamson.blogspot.com/2005/05/getting-dumps-for-apps-failing-on.html I discussed the use of gflags to invoke a debugger to start when a process starts and then sending debugger commands to create dumps when exceptions occur. However if your trying to debug against a process run in session 0, e.g. a web application or service, this technique will not work. As most support teams will probably be using terminal services to access production servers clearly we need a technique to get the dumps.

Remote debugging is commonly used to debug across two machines i.e. setting up a debugger on the server and using a debugging client to connect to it using named pipes or tcp. However there is nothing stopping us using the same machine to host the debugging server and the client. We need to set up a debugger server which runs in session 0 and a debugger client which runs in the terminal services session i.e. 3 in our example.

As before open gflags and specify the image module name you need to debug however instead of simply entering the path to the debugger you should specify some command line options to tell windbg to start as a debugging server. Enter the following in the debugger field

C:\Debuggers\windbg.exe -server tcp:port=123 –Q

The server option creates a debugging server and the tcp:port option specifies the port number. The –Q option suppresses the “Save workspace” dialog which is important if the debugger is running in session 0 and can’t interact with the user. Note the command line options are case sensitive so be sure to copy this exactly as it appears. As soon as the process your debugging starts the windbg debugee should be loaded in session 0 and broken in on the LdrpInitializeProcess method.

To control the debugger server start a new instance of windbg as the debugging client. Press Ctrl+R to connect to a remote debugger session and enter the following

tcp:server=MACHINENAME,port=123

When you connect the windbg instance will connect to the remote debugger and allow you full control to enter commands e.g. you can enter

Sxe *
g

to break on all exceptions and then continue. You can also send the same commands to create dump files as mentioned in my previous post. Or you could simply use .dump to create a dump when a specific exception is raised e.g.

.dump /ma c:\debugger\ExceptionDump.dmp

If you look at task manager you should also see two instances of windbg open. If you select the “Session ID” column you can see the different terminal services sessions used for each instance.

Saturday, May 28, 2005

Choosing between an ascending or descending index

I was recently asked whether making an index ascending or descending would improve query performance in SQL Server. In almost all cases the direction would make absolutely no difference to the speed of retrieval. Performance of queries is usually best measured by examining the number of logical page reads required to find the data.

Searching a table for a specific row
Any search across a single non-clustered / clustered index to locate a specific row would always result in a similar number of pages read. In the non-clustered case the time to retrieve the row would be navigating the index pages through the binary trees root index page and then through the intermediate level index pages to find the leaf level index page which matches. Then if no clustered index existed on the table the index page would point to the data page which would then be read to obtain the data.

If a clustered index existed on the table the leaf level non clustered index page would point to the clustered index key. The clustered index key would then be examined in a similar way to the non clustered index to find the leaf level index page. However for a clustered index the data exists on the leaf level index page so no further page reads would occur.

Searching for a range of rows
Even cases when searches are performed for a range of rows e.g. ID BETWEEN 100 AND 105 direction of the index would make no difference. A single column index can be scanned well in both directions because the index pages at a given level are linked by next and prior pointers pointing to the next and previous index pages at that level. The query optimiser may decide to search for the data either by starting at ID 105 and moving backwards through the index pages, or from 100 and working forwards.

When does direction make a difference?
The direction you specify when creating an index is used to determine the order of the leaf level index pages. E.g. names Adamson, Bamber, Doherty, Roche would appear exactly in that order in an ascending index, and in a descending index they would appear as Roche, Doherty, Bamber and Adamson.

The only case I can think of when the direction would have the most impact is a query where a composite index was used and a query was submitted which
Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.

Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.
Specifies a different direction in the ORDER BY clause in the SELECT statement of the query to the direction the index was created in.

Consider a query

SELECT SupplierNo, CreatedDate
FROM Invoices
ORDER BY SupplierNo, CreatedDate DESC

WHERE CreatedDate BETWEEN ‘1/5/05’ AND ‘10/5/05’

If either two single non clustered indexes or a single composite index on SupplierNo ASC, CreatedDate ASC was created a Sort operator would still be used in the execution plan of the query. If we can remove the sort operator the query time should be reduced.

If you created a single composite non clustered index on SupplierNo ASC, CreatedDate DESC the index pages could be navigated without the sort operator being used and only an index scan performed. As the index is covered because all the columns requested existed on the index pages then no data pages would have to be navigated from the leaf level index pages.

If you decided to change the query and add a column which didn’t exist on the index such as a reference number, the data page would have to be located for each matching row on the index page. Depending on the total number of index pages and data page scans required to get all the data, the query optimiser may even decide not to use an index and simply perform a table scan.

This example cites the usefulness of the query execution plan to show you how slow a specific query is. This is the only effective method to diagnose potentially slow running queries and improve the performance in a methodical manner. You can either show an execution plan for a query directly in query analyser or capture the event from a SQL profiler trace in a production environment. Simply executing the query in the query analyser window and trying to time it manually will not yield good results unless the speed differences before and after are that noticeable.

Wednesday, May 18, 2005

Getting dumps for apps failing on start up

Applications such as services or out of process servers may generate errors on start up. Adplus provides a spawn option to launch processes and attach the debugger on startup. Unfortunately the spawn option cannot be used because the process is either started by the Service Control Manager (SCM) or the COM runtime loads the module. For these type of applications it is still possible to create dump files but the process is a little convoluted.

What we need is some mechanism to attach to the process as soon as it launches but before the error is raised. The debugging tools package provides a tool called gflags which allows us to specify a number of debugging options both globally and for specific applications. One of the options allows us to configure a debugger which is attached as soon as the process starts.

Enter the full image name in the image file page then press tab to enable all the other controls. Select the full path to the windbg debugger in the Debugger field which can either be cdb.exe or windbg.exe. Now start the failing application either by forcing the client to create the COM object or start the service. The debugger should then be launched ready to recieve commands.

Now we've broken in the debugger we need to send all the commands adplus usually sends to configure specific exceptions to generate dump files. The debugger command "sxe" controls the action the debugger takes when an exception occurs in the application being debugged. When you use an Adplus XML config file to configure what exception codes generate dumps for 1st or 2nd chance exceptions Adplus sends a number of these commands to the console debugger (CDB) e.g. one is shown below with some detail omitted.

sxe -c @"dump -u /mdi /c 1stchanceAVSCANmini.dmp; GN"
-c2 @"dump -u /ma /c 2ndchanceAVSCANfull.dmp; GN" av

The last parameter specifies the exception code ( access violation in this case ) and the -c and -c2 options specify the actions to take on first and second chance exceptions respectively. The /ma option effectively takes a mini dump with all additional options which is more commonly known as a full dump. However if you have a lot of different exception codes to configure this process can be very tedious and error prone. Fortunately when you run adplus it generates a file with a CFG extension in the Crash_.. folder. This file contains a list of all commands needed to be sent to CDB to configure all exception codes.

The most effective approach is to run adplus against any other process e.g. notepad, to generate a CFG file using the options you specify in an adplus XML config file. When the process is started and windbg breaks in you can use the "$<" debugger command with the full path of the cfg file i.e.

D:\Development\Debugging\Debugging Tools for Windows\DebuggerCommands.cfg

This will send all the necessary commands in the CFG file to the debugger to create the dump files for first and second chance exceptions. It is worth editing the CFG file first to modify the application from notepad in this example to the name of your app. Then simply press F5 or enter the g command to allow the debugger to continue and the dump files should be created.

Sunday, May 15, 2005

Should the GAC always be avoided?

Currently we use the Global Assembly Cache (GAC) to install shared assemblies which are called by a number of different applications; however articles such as the infamous “Avoid the GAC” http://www.sellsbrothers.com/spout/#Avoid_the_GAC article have made me question the usefulness of this approach.

To summarise our deployment we have perhaps 40 or so assemblies which are referenced by a number of different applications namely.

A windows service.
An ASP application.
A thick client windows application.
A .NET console application.

All applications use the same set of core assemblies to perform most of the functions e.g. core invoicing and ordering components as well as components which are applicable across many different functional areas.

Every time we rebuild and deploy a new component e.g. assembly A which has upgraded from say 1.0.8.0 to 1.0.9.0 we ship a new publisher policy assembly which redirects 1.0.0.0 to 1.0.9.0 references to 1.0.9.0 and obviously also re installs the new updated assembly into the GAC. E.g. 1.0.9.0 may fix an issue in the database where a stored procedure is being used with an incorrect parameter order causing data integrity issues.

This deployment works well in that components which are shared amongst all applications only need to be installed in one place. In the future we may provide breaking components with major functionality changes e.g. we introduce a new assembly A 2.0.0.0 version which may have breaking changes. In this case we envisage we may update the publisher policy with multiple redirections setup such as

1.0.0.0 - 1.0.9.0 ---> 1.0.9.0
2.0.0.0 - 2.0.2.0 ---> 2.0.2.0

In this way we can support side by side versioning of both sets of assemblies. We make a statement with the publisher policy that version increments containing changes across 1.0.0.0 – 1.0.9.0 are compatible and are needed for all applications referenced assemblies in this range however applications which reference new assemblies in the range 2.0.0.0 + are breaking and should reference a different assembly.

Currently we use the GAC to deploy all our assemblies and it seems to work well. The GAC was chosen because of the following benefits in order of importance:

Ease of deployment. When shared assemblies are updated they only have to be updated in one location as opposed to four directories namely system32, windows\system32\inetsrv, c:\program files\Tranmit\Transaction Engine and C:\program files\Tranmit\Products\SprintLog. Deploying the assemblies to more than one location is obviously more error prone and an additional burden on an administrator installing patches.
Performance. When strongly named assembly files are loaded from a location other than the GAC the CLR compares hash values when the assembly is loaded. A hash of the file is performed every time an application executes. In addition any referenced assemblies in that assemblies manifest are also hashed and compared. This performance hit is required to ensure that the assembly file’s content hasn’t been tampered with.
Security. The GAC is only accessible to those users with sufficient privilege.
Traced references. Assemblies installed in the GAC can be added with a reference to the application. This prevents assemblies being removed accidentally and breaking client applications dependent on them.

Chris Sells author of the “Avoid the GAC” article actually agreed the GAC was the best choice in our case however I am keen to hear other peoples’ opinions on this interesting topic. I would certainly consider changing to XCOPY deployment if there is good reason.

Matt Adamson