Friday, February 24, 2006

VBScript plug ins for ASP applications

Recently one of the developers in my team was developing some custom code in a COM table rendering object we've built for an ASP web application. The object essentially takes source SQL with added meta data and creates a HTML table which is returned into the calling ASP application, and then written out using Response.Write.

The design required some very custom presentation within one of the table cells. The table rendering object is quite generic ( knowing little about specific tables in the database schema ) and the custom presentation task required work on very specific tables. Custom work is essentially associated with each table cell using meta data tags such as "SubordinateList(%UserID%)". Assume the table object is rendering a list of users and if a user is a manager displaying a comma separate list of subordinate user names.

Although this example is contrived (as our real example would require a lot more explanation) the design issue is the same. The obvious easy approach would be to simply add handling within the table rendering object in a switch statement to match on the start of the tag "SubordinateList" and then perform some custom processing in SQL to find the list of sub ordinates. The list would then be converted into a comma separate string which would be added to the table cell. This business logic ideally should not be contained within a generic table rendering engine.

You may have seen plug in design patterns which support the extensiblilty we are trying to achieve here. Typically this would require the creation of a well defined interface e.g. IValueResolver which is defined and called in the table rendering object. The implementation of different IValueResolver objects could be created in one or many plug in components. The IValueResolver interface could only require one method such as GetValue which takes a row, column and table data.

Although ASP, using VB script, doesn't support strongly typed interfaces we can still use the late bound IDispatch support in VBScript to implement this design. We could create a VB script class on the ASP page such as

Class SubordinateListValueResolver
Dim UserColumn

Public Function GetValue(row, col, tableData)
' Work to retrieve the list of subordinates from the user specified
' in the column UserColumn in the row from the tableData data source.
End Function
End Class


The plug in class could then be instantied on the ASP page and passed to the table rendering object using code such as

Dim oValueResolver
Set oValueResolver = New SubordinateListValueResolver
oValueResolver.UserColumn = 4 ' Means the 4th column in the result

oTableRenderer.AddValueResolverPlugIn(oValueResolver, "SubordinateList")

The table rendering object can then determine whether any plug ins exist using the meta data tag passed when the data source SQL is specified. e.g. if a matching plug in object is found for the tag SubordinateList the VBScript object of type SubordinateListValueResolver will be returned. The GetValue method can then be called directly passing the row, column and table data from within the table rendering object. i.e.

oCellResolvedValue = plugin.GetValue(row, col, tableData)

Although designs which use well defined interfaces and contain the plug in code in compiled languages are preferrable to this approach, it does demonstrate how easy a simple plug in mechanism can be developed for a script based application.

Tuesday, February 14, 2006

Remote debugging - Release build optimisations

I recently had to diagnose an issue which required remote debugging the application as no exception was raised and therefore no dump could be obtained and analysed. The issue was caused by an error code deep down in the call stack being returned and ignored a few levels up the call stack :(

Whilst this was trivial to debug using the windbg client and dbgsvr on the server ( both part of the debugging tools package ) I found in some cases, particularly where the error code was being returned, that local variables couldn't be seen. The message <Memory access error> was displayed in the locals window for each variable.

Our standard process is to generate debugging symbols with each release build by simply selecting the generate debugging symbols option (http://msdn.microsoft.com/library/en-us/dnvc60/html/gendepdebug.asp). To accurately see variables in both the watch and locals windows you need to make sure you disable optimisations i.e. through Project Settings / C++ / Optimisations - Set this to Disable ( Debug ).

Obviously this should only be disabled for ad hoc builds as you will most likely want to enable all optimisations in your standard builds you deploy. It's hard enough trying to get our applications to be as performant as possible without disabling optimisiations which the compiler gives us automatically :)

Friday, February 10, 2006

Database filtering of documents - performance issues

Our application has a requirement to filter the documents a user can see based on various criteria e.g. security / approval rights. A user may only wish to see documents which they can approve or have approved. This would seem to be a common pattern for a lot of business applications.

To implement this our business objects create a list of document identifiers which are then appended to a SQL statement which selects the data and filters the documents returned. We used to use the IN clause to filter the documents returned using a pattern such as


SELECT * FROM PurchaseOrders WHERE PurchaseOrderID IN ( CommaSeperatedListOfIDs )

where the list of order ids was generated from the business object. After usage of the application quickly increased we encountered a stack overflow exception issue http://support.microsoft.com/kb/q288095. To resolve this we changed our approach to use temporary tables i.e. insert all the IDs from the business object into a temporary table and then use this to filter the main query e.g.

SELECT * FROM PurchaseOrders WHERE PurchaseOrderID IN ( SELECT ID FROM UniqueTempTableName )

The IDs are inserted into the temporary table using batch INSERTS e.g. in batches of 20 statements at a time. However when upwards of 10000 document identifiers of inserted into the table, it can take over 5 seconds to do this. E.g. an ASP screen which presents a list of invoices takes 15 seconds of which 11 seconds is attributed to the time to insert into the temporary table. Over the last day I've been exploring ways to minimise this.

Extended stored procedure
My first thoughts were creating an extended stored procedure which would create an in memory result set using a comma separated list of identifiers. The extended stored procedure would be called from a UDF which would return a table variable which could be used in a select statement e.g.

CREATE FUNCTION dbo.Tfn_IDs(@listIdentifiers text)
RETURNS @idTable TABLE (id int )
AS
BEGIN
INSERT INTO @idTable
EXEC master..xp_IDContainer(@listIdentifiers)
RETURN
END


However unfortunately EXEC cannot be used as the source for an INSERT INTO statement.

OPENXML
I then realised I maybe able to generate an XML representation of the identifiers using the extended stored procedure e.g. in the format

<values>
<value id="84">
<value id="85">
</values>

and then use the XML in a SQL OPENXML statement using

INSERT INTO @IDTable
SELECT ID
FROM OPENXML (@hDoc,'/Values/Value')
WITH (ID int)


UDF Only
I then went back to an approach without using an extended stored procedure and only using a UDF. The UDF could parse the comma separated list of values and insert into a table variable which would be returned. I won't outline the solution in detail but it basically encompasses using CHARINDEX to find the start and end positions of each identifiers using the comma as the delimmiter. It would be interesting to compare the timings of this and the OPENXML approach. The XML approach could be slower because of the more verbose structure of the data and the cost of creating the XML internally in SQL Server and performing XQuery expressions however it maybe offset by the improvement in speed of using a single INSERT INTO statement rather than multiple INSERT INTO statements for each identifier found and parsed in the UDF only approach.

However after increasing the size of the input to this solution it quickly became apparent that CHARINDEX truncates text data passed to varchar(8000) and so won't work with long text types which contains 10000 identifiers with an average of 3 characters each.

Back to a UDF + extended stored procedure
To overcome the issues faced in the UDF only solution an extended stored procedure would probably be the logical place to put all the parsing logic of the comma separated string into values used for an insert statement. However instead of using the resultset of an EXEC call as source to an INSERT INTO statement, which weve shown isn't possible, we could use an output parameter to get each identifier. Psuedo TSQL could look like

@sessionToken = xp_IDContainer_ParseValue('1,2,3')
WHILE xp_IDContainer_EndOfResults(@sessionToken) <> 0
BEGIN
@ID = xp_IDContainer_GetNextID(@sessionToken)
Insert id into table variable
END


Has anyone solved this problem using the approaches I've highlighted or other approaches which are successful with large volumes of data.