Thursday, September 06, 2007

My first open source project - Web modelling UML add-in for Enterprise Architect

I recently created my first open source project called EA WebModeller Add In which is an add in for the Enterprise Architect ( EA) UML modelling tool. The add in provides a much filled gap to the product to enable users to reverse engineer web projects into UML artifacts. Specifically using the web extensions for UML which Jim Conallen produced. The web extensions provide a standardised way of modelling the components which make up a web application.

A seemingly simple concept, which most would just think of as one web page associated with another quickly becomes very complex and is proof of why Jim could write a substantial book on this subject and still leave room for more. Classes are used with a variety of different UML stereotypes to define the various components which combine to produce a web page e.g. server side page which builds a “client side page” which aggregates a number of “forms”. Various stereotypes of association are also defined to support different associations between the components e.g. one server side page redirects to another server side page or a client side pages links to another client side page ( through hyperlinks perhaps ). Components contained within pages both on the server and client are also modelled appropriately.

The example web project included with the download available on source forge contains a number of scenarios e.g. file 1.asp includes files include1.asp and include2.asp. Both included files contain server side functions. 1.asp also contains a redirect call to 2.asp, an embedded class, some client side functions ( in java script ) and an embedded form. The 1.asp server side page and it’s associated elements are shown in the diagram below






This is produced by the following settings in the add in









The add in is built in an extensible way to allow new script parsing languages to be easily integrated within the framework e.g. a base ScriptParserBase class contains JavaScriptParser and VBScriptParser derived classes. Similarly the ServerPage class which models the UML server side page contains derived classes ActiveServerPage (ASP ) JavaServerPage (JSP) and DotNetActiveServerPage ( ASP.NET ) for each of the web application technologies. Note only the ActiveServerPage class is fully supported for ASP pages, I’m hoping the open source community will develop the JSP and ASP.NET technologies :)

The add in uses mostly regular expressions to parse the script source into it’s appropriate components. They range from very simple expressions such as #include\s+(?virtualfile)=""(?.*) which parses include directives within ASP pages. The PathType and IncludedFile are named captured groups which here capture the type of file reference e.g. virtual or absolute file location. To relatively compex expressions such as “^\s*function\s*(?\w+)\((?.*)\)\s*(?(?:(?\{)(?\})(?(LeftBrace)[^\{\}]*.*))*)” which use “balance grouping” to support recurision within the expression. This power is required here to match all function definitions because we can’t simply match an opening and closing brace as we may find the first closing brace is not closing the function, it’s in fact closing the first for loop defined within the function.

To parse the client side page the internet explorer active x control is used to load the htm file produced when navigating to the web page. By using the IE control, the parts of the web page such as embedded forms and controls within the form can be found and added to the UML components. I actually wrote a web application testing framework around this control however recently I’ve started to use the WatIN framework which is more feature rich.

Given the main web application our team develops is written in ASP with many hundreds of pages, the productivity boost from using the add in is huge. Although we couldn’t possibly try and reverse engineer the whole web application in one go, being able to selectively single out and reverse engineer a few web pages when adding new features or doing some refactoring is of great help. It clearly shows the effected pages and where the new logic is best placed and perhaps most importantly helps to show the impact of any changes on existing functionality.

3 comments:

Anonymous said...

Hi,

great work. I really would like to test this addin, but I somehow can't get it to compile (dependency-issue). Is there a way to obtain the addin (I am particularly in the js-parsing-part) in binary form?

thanks,
Tobias.

Matt Adamson said...

Hi Tobias,

Let me know the dependency issue you face as I'm really keen to understand why this won't compile

The source code package on sourceforge should compile cleanly without any modifications.

Anonymous said...

okay, I have several issues, some of which are probably due to my studio-installation (?) I'll list them here.

1. Opening the WebModellerAddIn.sln I get an error-msg "..cannot open project-file [...]\EAModelTest.csproj because this installation does not support it". I use Visual Studio 2005pro, which one did you use creating the ModelTest-solution? But probably I won't even need the unit-test...

2. after opening the warnings-list ist filled up by 11 warnings:
9 times "cannot find the component X". ("SHdocVw", "Interop.EA",
"Microsoft.mshtml", "System",
"System.configuration", "System.Data",
"System.Drawing", "System.Windows.Forms" and "System.XML").
Then "error with loading the property OutputPath. the given path is no valid path"
And "the directory F:\Development\Design\Sparx Systems\EA\ cannot be created. [...]".
I figuered that the last one can be specified in the projectproperties (debug-settings), which I did. But I keep getting it (when trying to "build" even as an "error") even after setting it to a directory which actually exists on my machine.
The "OutpuPath"-Property-Warning also stays, I don't know where to modify this.
And finally the 11 warnings about the missing components: How can I get rid of thse?

3 Building does not work, of course because of the error (above mentioned, he caoont find the directry F:\Development... even after I changend the project-properties. Is there another place, where your develpment-directory is mentioned? I cannot find it.)

thank you, and sorry for the rough english, but I use a Germen-Version of the "Studio", so the error-msgs are probably a bit different in the english studio-version.

Tobias.