Reverse Engineering an Open API Specification into Sparx Enterprise Architect
Introduction
When having to analyze large OAS specifications, reading thousands of lines of yaml is time consuming and not productive. One has to manually translate the specification into a business object model either on paper, using a drawing tool, or employing a specialized software design repository such as Sparx EA.
What’s the difference between an architecture / design repository vs a drawing tool, such as Lucid Charts or Visio? Both types of applications allow the user to draw diagrams. However, an architecture repository stores the components and diagram information in a common database, facilitating the reuse of components and their relationships across diagrams.
Starting with version 16, Sparx EA is using the SQLite database for local repositories. This change was the trigger to the article — on how to reverse engineer an OpenAPI specification into the Sparx EA database, and how to create the associated diagrams for the resources (a.k.a. the object model) and the paths.
In and Out of Scope
The code that I am sharing is a proof of concept, it is not, by all means perfect, or production grade. The Open API Specification standard is vast, and covering all use cases of the specification is beyond the scope of the project.
In Scope
- Paths, methods, components are analyzed and saved to the model database.
- For paths, the methods, the query and path parameters are captured.
- For resources, the schema components with their fields and field constraints are captured. Aggregation and inheritance relationships between resources are included in the model.
Out of Scope
- The current version of the code ignores descriptions. They may be added at a later time. Descriptions may be very verbose, and they clutter the model
- Some folks prefer to declare parameters in the OAS, and then reference them in the path definitions. At this point in time, referenced parameters are not analyzed, resulting in a loss of information at the method level. This is a gap that will be addressed in the next iteration of the code
- Security specification is not reversed engineered because it is difficult to model
- References to other files are not unpacked. When a type is not present in the specification, it is modelled as an attribute with the name of the referenced object.
For the use cases the code does not handle, the errors / exceptions are logged and the program continues. This approach results in a model that may be miss a few types, but still be useful from an analysis / understanding point of view.
Notes
- The OAS allows for objects to be defined using the ‘any of’ instruction. When this instruction is used, it means that the OAS designer wants to use a generic object to pass in one instance of another object type. I have not decided what the best venue to model this instruction. To account for inheritance, one can achieve the same goal by using the ‘all of’ instruction, for more specific objects, and in addition define a ‘discriminator’ for the more generic option. The ‘all of’ approach is modelled using inheritance
- The names of the path objects are inferred from the URL of the path. The code is strongly opinionated about the naming convention that the paths URLs use. It assumes that URLs are using the plural nouns that correspond to the root entity of the path, and for single objects, that the path query parameter is after the plural noun. Example: /accounts and /accounts/{account-id}. The resulting path object names are Accounts and Account. I have seen many examples where semantically strong path naming conventions are not followed, e.g. group of end-points that all end with …./execute. In these situations the model will contain objects with the same name in the same package. While Sparx EA does not throw an error, manual intervention is needed to improve the quality of the information in the model.
- Some standards use the SOAP era style of imposing constraints of field lengths or defining other basic formats, e.g. date, time, etc in the XSD. When these additional fields are defined in the specification, the model becomes polluted with unnecessary information. The OpenAPI Standard allows designers to solve this problem by using the (type, format) tuple, e.g. type: string and format: date, or define field level restrictions, e.g. minLength = 1, maxLength = 50, regex, etc.
Tests
I have tested the code against several public standards, such as FDX and BIAN, and the yaml specification generated by the forward engineering code you read about in the second article of the series.
How to Use
To start, create a local Sparx EA model. Under the root create two Views, one for paths, e.g. labelled ‘Paths’, and one for types, e.g. labelled ‘Types’. In the Sparx navigator, the model looks like:
Once you are finished, start the Python app.
The user interface takes 4 parameters, as in the image below:
- Path GUID — the GUID of the Paths package in Sparx model
- Types GUID — the GUID of the Types package in the Sparx model
- Yaml file — the path for the Open API Specification in yaml format
- The Sparx DB repository — the path for the Sparx EA model (*.qea)
The user interface is shown below:
Once you have filled in the data for the 4 fields, click ‘Execute’.
The program first analyzes the definition of the components and adds them to the repository, including attributes. As a second pass, the application analyzes the paths definitions and adds them to the model. The object types references in the methods of the paths are linked to the type definitions.
In the Git repository I have included the ‘accounts.yml’ file, which was forward engineered from the model in the first article of the series. After the reverse engineering operation, the model tree should look like this:
The paths and resources have been created in the database, however we are not better off from understanding the structure of the OpenAPI specification yet. Next step is to create the resource and path diagrams.
Path Diagram
When I created the Paths and Types packages, I have also created the associated class diagrams. If you chose to create the packages only, please create the class diagrams at this point.
Steps to display the paths:
- Open the Paths diagram
- In the diagram properties, under ‘Elements’ enable’ Tags’ and under ‘Features’ select ‘Name Only’ for feature visibility
- From the navigation tree, select all the object in the Paths package and drag them to the diagram
- Open the ‘Layout’ tab, select ‘Box’ and sort by Name, ascending and run the setup
The resulting diagram will display for each path the methods, their stereotypes and the names of the query parameters. The diagram will look like:
Types Diagram
For the types diagram, the steps are:
- Open the Types diagram
- In the diagram properties, under ‘Elements’ enable’ Tags’
- From the navigation tree, select the resource that you want to be the root for the diagram; I will select Account. Drag the root object as ‘link’ on the diagram
- Select the root object in the diagram, and open the context menu, then select ‘Insert Related Elements’:
- Determine the depth level you want to display; I have chosen ‘3’ to show most of the objects in the model. Select the objects in the list and execute the command. The resulting diagram is:
At this point in time it is relatively easy to analyze the structure of the OpenAPI specification.
The Code
The code is in GitHub.
Thank you for reading and if you found the article useful, please don’t forget to clap.