Saturday, May 28, 2005

Choosing between an ascending or descending index

I was recently asked whether making an index ascending or descending would improve query performance in SQL Server. In almost all cases the direction would make absolutely no difference to the speed of retrieval. Performance of queries is usually best measured by examining the number of logical page reads required to find the data.

Searching a table for a specific row
Any search across a single non-clustered / clustered index to locate a specific row would always result in a similar number of pages read. In the non-clustered case the time to retrieve the row would be navigating the index pages through the binary trees root index page and then through the intermediate level index pages to find the leaf level index page which matches. Then if no clustered index existed on the table the index page would point to the data page which would then be read to obtain the data.

If a clustered index existed on the table the leaf level non clustered index page would point to the clustered index key. The clustered index key would then be examined in a similar way to the non clustered index to find the leaf level index page. However for a clustered index the data exists on the leaf level index page so no further page reads would occur.

Searching for a range of rows
Even cases when searches are performed for a range of rows e.g. ID BETWEEN 100 AND 105 direction of the index would make no difference. A single column index can be scanned well in both directions because the index pages at a given level are linked by next and prior pointers pointing to the next and previous index pages at that level. The query optimiser may decide to search for the data either by starting at ID 105 and moving backwards through the index pages, or from 100 and working forwards.

When does direction make a difference?
The direction you specify when creating an index is used to determine the order of the leaf level index pages. E.g. names Adamson, Bamber, Doherty, Roche would appear exactly in that order in an ascending index, and in a descending index they would appear as Roche, Doherty, Bamber and Adamson.

The only case I can think of when the direction would have the most impact is a query where a composite index was used and a query was submitted which
Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.

  1. Covers all columns used in the index. This means all fields specified in the SELECT statement can be found in the index page.
  2. Specifies a different direction in the ORDER BY clause in the SELECT statement of the query to the direction the index was created in.

Consider a query

SELECT SupplierNo, CreatedDate
FROM Invoices
ORDER BY SupplierNo, CreatedDate DESC

WHERE CreatedDate BETWEEN ‘1/5/05’ AND ‘10/5/05’

If either two single non clustered indexes or a single composite index on SupplierNo ASC, CreatedDate ASC was created a Sort operator would still be used in the execution plan of the query. If we can remove the sort operator the query time should be reduced.

If you created a single composite non clustered index on SupplierNo ASC, CreatedDate DESC the index pages could be navigated without the sort operator being used and only an index scan performed. As the index is covered because all the columns requested existed on the index pages then no data pages would have to be navigated from the leaf level index pages.

If you decided to change the query and add a column which didn’t exist on the index such as a reference number, the data page would have to be located for each matching row on the index page. Depending on the total number of index pages and data page scans required to get all the data, the query optimiser may even decide not to use an index and simply perform a table scan.

This example cites the usefulness of the query execution plan to show you how slow a specific query is. This is the only effective method to diagnose potentially slow running queries and improve the performance in a methodical manner. You can either show an execution plan for a query directly in query analyser or capture the event from a SQL profiler trace in a production environment. Simply executing the query in the query analyser window and trying to time it manually will not yield good results unless the speed differences before and after are that noticeable.

Wednesday, May 18, 2005

Getting dumps for apps failing on start up

Applications such as services or out of process servers may generate errors on start up. Adplus provides a spawn option to launch processes and attach the debugger on startup. Unfortunately the spawn option cannot be used because the process is either started by the Service Control Manager (SCM) or the COM runtime loads the module. For these type of applications it is still possible to create dump files but the process is a little convoluted.

What we need is some mechanism to attach to the process as soon as it launches but before the error is raised. The debugging tools package provides a tool called gflags which allows us to specify a number of debugging options both globally and for specific applications. One of the options allows us to configure a debugger which is attached as soon as the process starts.

Enter the full image name in the image file page then press tab to enable all the other controls. Select the full path to the windbg debugger in the Debugger field which can either be cdb.exe or windbg.exe. Now start the failing application either by forcing the client to create the COM object or start the service. The debugger should then be launched ready to recieve commands.

Now we've broken in the debugger we need to send all the commands adplus usually sends to configure specific exceptions to generate dump files. The debugger command "sxe" controls the action the debugger takes when an exception occurs in the application being debugged. When you use an Adplus XML config file to configure what exception codes generate dumps for 1st or 2nd chance exceptions Adplus sends a number of these commands to the console debugger (CDB) e.g. one is shown below with some detail omitted.

sxe -c @"dump -u /mdi /c 1stchanceAVSCANmini.dmp; GN"
-c2 @"dump -u /ma /c 2ndchanceAVSCANfull.dmp; GN" av

The last parameter specifies the exception code ( access violation in this case ) and the -c and -c2 options specify the actions to take on first and second chance exceptions respectively. The /ma option effectively takes a mini dump with all additional options which is more commonly known as a full dump. However if you have a lot of different exception codes to configure this process can be very tedious and error prone. Fortunately when you run adplus it generates a file with a CFG extension in the Crash_.. folder. This file contains a list of all commands needed to be sent to CDB to configure all exception codes.

The most effective approach is to run adplus against any other process e.g. notepad, to generate a CFG file using the options you specify in an adplus XML config file. When the process is started and windbg breaks in you can use the "$<" debugger command with the full path of the cfg file i.e.

D:\Development\Debugging\Debugging Tools for Windows\DebuggerCommands.cfg

This will send all the necessary commands in the CFG file to the debugger to create the dump files for first and second chance exceptions. It is worth editing the CFG file first to modify the application from notepad in this example to the name of your app. Then simply press F5 or enter the g command to allow the debugger to continue and the dump files should be created.

Sunday, May 15, 2005

Should the GAC always be avoided?

Currently we use the Global Assembly Cache (GAC) to install shared assemblies which are called by a number of different applications; however articles such as the infamous “Avoid the GAC” article have made me question the usefulness of this approach.

To summarise our deployment we have perhaps 40 or so assemblies which are referenced by a number of different applications namely.

  • A windows service.
  • An ASP application.
  • A thick client windows application.
  • A .NET console application.

All applications use the same set of core assemblies to perform most of the functions e.g. core invoicing and ordering components as well as components which are applicable across many different functional areas.

Every time we rebuild and deploy a new component e.g. assembly A which has upgraded from say to we ship a new publisher policy assembly which redirects to references to and obviously also re installs the new updated assembly into the GAC. E.g. may fix an issue in the database where a stored procedure is being used with an incorrect parameter order causing data integrity issues.

This deployment works well in that components which are shared amongst all applications only need to be installed in one place. In the future we may provide breaking components with major functionality changes e.g. we introduce a new assembly A version which may have breaking changes. In this case we envisage we may update the publisher policy with multiple redirections setup such as - ---> - --->

In this way we can support side by side versioning of both sets of assemblies. We make a statement with the publisher policy that version increments containing changes across – are compatible and are needed for all applications referenced assemblies in this range however applications which reference new assemblies in the range + are breaking and should reference a different assembly.

Currently we use the GAC to deploy all our assemblies and it seems to work well. The GAC was chosen because of the following benefits in order of importance:

  • Ease of deployment. When shared assemblies are updated they only have to be updated in one location as opposed to four directories namely system32, windows\system32\inetsrv, c:\program files\Tranmit\Transaction Engine and C:\program files\Tranmit\Products\SprintLog. Deploying the assemblies to more than one location is obviously more error prone and an additional burden on an administrator installing patches.
  • Performance. When strongly named assembly files are loaded from a location other than the GAC the CLR compares hash values when the assembly is loaded. A hash of the file is performed every time an application executes. In addition any referenced assemblies in that assemblies manifest are also hashed and compared. This performance hit is required to ensure that the assembly file’s content hasn’t been tampered with.
  • Security. The GAC is only accessible to those users with sufficient privilege.
  • Traced references. Assemblies installed in the GAC can be added with a reference to the application. This prevents assemblies being removed accidentally and breaking client applications dependent on them.

Chris Sells author of the “Avoid the GAC” article actually agreed the GAC was the best choice in our case however I am keen to hear other peoples’ opinions on this interesting topic. I would certainly consider changing to XCOPY deployment if there is good reason.