Thursday, September 06, 2007

My first open source project - Web modelling UML add-in for Enterprise Architect

I recently created my first open source project called EA WebModeller Add In which is an add in for the Enterprise Architect ( EA) UML modelling tool. The add in provides a much filled gap to the product to enable users to reverse engineer web projects into UML artifacts. Specifically using the web extensions for UML which Jim Conallen produced. The web extensions provide a standardised way of modelling the components which make up a web application.

A seemingly simple concept, which most would just think of as one web page associated with another quickly becomes very complex and is proof of why Jim could write a substantial book on this subject and still leave room for more. Classes are used with a variety of different UML stereotypes to define the various components which combine to produce a web page e.g. server side page which builds a “client side page” which aggregates a number of “forms”. Various stereotypes of association are also defined to support different associations between the components e.g. one server side page redirects to another server side page or a client side pages links to another client side page ( through hyperlinks perhaps ). Components contained within pages both on the server and client are also modelled appropriately.

The example web project included with the download available on source forge contains a number of scenarios e.g. file 1.asp includes files include1.asp and include2.asp. Both included files contain server side functions. 1.asp also contains a redirect call to 2.asp, an embedded class, some client side functions ( in java script ) and an embedded form. The 1.asp server side page and it’s associated elements are shown in the diagram below

This is produced by the following settings in the add in

The add in is built in an extensible way to allow new script parsing languages to be easily integrated within the framework e.g. a base ScriptParserBase class contains JavaScriptParser and VBScriptParser derived classes. Similarly the ServerPage class which models the UML server side page contains derived classes ActiveServerPage (ASP ) JavaServerPage (JSP) and DotNetActiveServerPage ( ASP.NET ) for each of the web application technologies. Note only the ActiveServerPage class is fully supported for ASP pages, I’m hoping the open source community will develop the JSP and ASP.NET technologies :)

The add in uses mostly regular expressions to parse the script source into it’s appropriate components. They range from very simple expressions such as #include\s+(?virtualfile)=""(?.*) which parses include directives within ASP pages. The PathType and IncludedFile are named captured groups which here capture the type of file reference e.g. virtual or absolute file location. To relatively compex expressions such as “^\s*function\s*(?\w+)\((?.*)\)\s*(?(?:(?\{)(?\})(?(LeftBrace)[^\{\}]*.*))*)” which use “balance grouping” to support recurision within the expression. This power is required here to match all function definitions because we can’t simply match an opening and closing brace as we may find the first closing brace is not closing the function, it’s in fact closing the first for loop defined within the function.

To parse the client side page the internet explorer active x control is used to load the htm file produced when navigating to the web page. By using the IE control, the parts of the web page such as embedded forms and controls within the form can be found and added to the UML components. I actually wrote a web application testing framework around this control however recently I’ve started to use the WatIN framework which is more feature rich.

Given the main web application our team develops is written in ASP with many hundreds of pages, the productivity boost from using the add in is huge. Although we couldn’t possibly try and reverse engineer the whole web application in one go, being able to selectively single out and reverse engineer a few web pages when adding new features or doing some refactoring is of great help. It clearly shows the effected pages and where the new logic is best placed and perhaps most importantly helps to show the impact of any changes on existing functionality.