Architecture as Code with C4 and Plantuml
Architecture as Code with C4 and Plantuml
(This article has also been published at DZone)
I'm lucky enough to currently work on a large microservices-based project as a solution architect. I'm responsible for designing different architecture views, each targeting very different audiences, hence different preoccupations:
- The application view dealing with modules and data streams between them (targeting product stakeholders and developers)
- The software view (design patterns, database design rules, choice of programming languages, libraries...) that developers should rely upon;
- The infrastructure view (middleware, databases, network connections, storage, operations...) providing useful information for integrators and DevOps engineers;
- The sizing view dealing with performance;
- The security view, which is mainly transversal.
We use this Open Source Template to document our architecture.
Our current project architecture is fairly complex because of the number of modules (tens of jobs, API and GUI modules), because of the large number of external partners and because of its integration with a large legacy information system.
At this time, we have to maintain more than one hundred architecture diagrams. Following a living documentation approach, we adapt and augment diagrams, text and tables several times a day. As we will see later, it's often a collaborative process taking advantage of several great tools.
The Sample Application
We illustrate this article with a fictional AllMyData microservices application. This is a .gov web application enabling any company to get all its information known to all the public administrations.
We can split our feature "Deliver Companies Data" into two main call chains:
- A first call chain is made of the GUI requests that create requests into the system.
- A second one is made of a job launched periodically and consuming new requests. It gathers data about the company both from a local repository and from another administration IS (Information System), produces a PDF report and sends an e-mail to the company original requester.
The C4 Model
We use the C4 model to represent our architecture. It is beyond the scope of this tooling article to describe it in depth but I invite you to have a look at this very pragmatic approach. I find it very natural to design complex architectures. It leverages the UML2 standard and provides a great dichotomy between high level concerns and code-level ones.
Archimate could be another good fit for us but probably overkill in our context of very low modelization adoption and knowledge. Also, we like the C4 KISS/low tech approach that takes many human psychological criteria into account. Note that some Archimate tools support C4 diagrams using some mapping between concepts. Not sure it is good idea to mix both though.
In our context, we currently use three main C4 diagrams types (note that C4 and UML2 contain others not listed here):
- System landscape diagrams provide a very high-level view of the system. We use it to describe the general application architecture.
- Container diagrams are used to describe the middleware, databases, and many other technical components as well as data streams between them. They are similar to UML2 deployment diagrams but more natural in my opinion. In the application view, we mainly display modules and databases and in the infrastructure view, we drill down into technical devices like reverse proxies, load balancers, cluster details, etc. We also use C4 dynamic diagrams, very similar to container diagrams but including call numbering.
- Various UML2 diagrams (sequence, activity, classes). We use them with parsimony and only to express a pattern or something especially important or complex but certainly not for ordinary code.
I'm a quite reluctant to use the C4 container term because of the risk of confusion with Docker/OCI containers (as pointed out by Simon Brown, the C4 creator). In our organization, we prefer to call them deployable units. The C4 model encourages terminology adaptation. A C4 container is basically a separated deployable process. The C4 documentation states: "Essentially, a container is a separately runnable/deployable unit (e.g. a separate process space) that executes code or stores data".
In the C4 model, a container can contain one or more software components. This concept doesn't refer to infrastructure components, but to large pieces of code (like a set of Java classes). We barely use C4 components in our architecture document because we don't really need to go into that level of details (our hexagonal architecture makes things simple to design and understand just by reading the code and our agile approach makes us prefer limiting the design documentation we have to maintain).
Plantuml is an impressive tool that generates instantly diagrams from a very simple textual DSL (Domain Specific Language).
For instance, this very short text:
@startuml [Browser] -> [API Foo]: HTTPS @enduml
...is enough to produce this diagram:
Plantuml comes with hundreds of features and syntax goodies, sometimes undocumented and evolving very quickly. I suggest this website as a clear and exhaustive documentary reference.
Check out some real-world examples here.
Plantuml Combined With C4
Plantuml component diagrams can be customized as C4 diagrams using this extension library.
Just import it at the top of your Plantuml diagrams and use C4 macros:
@startuml !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/chrome> !include <tupadr3/devicons2/java> !include <tupadr3/devicons2/postgresql> LAYOUT_LEFT_RIGHT() Container(browser, "Browser","Firefox or Chrome", $sprite="chrome") Container(api_a, "API A","Spring Boot", $sprite="java") ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql") Rel(browser,api_a,"HTTPS") Rel_R(api_a,db_a,"pg") @enduml
is exported as:
Always export diagrams in SVG format to allow unlimited zooming. It is a appreciable when dealing with large diagrams.
We use here the online latest version but you may prefer to use a static downloaded version in an air-gap mode.
A great thing about Plantuml is the factorization capabilities using the
!includesub preprocessor directives.
It is possible to include local or remote diagrams (ie. starting with
@startuml and ending with the
@enduml directive). For instance, C4 macros are included using this instruction:
More interestingly, it is also possible to import diagram fragments (ie. starting with
!startsub and ending with the
!startsub dmz !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/chrome> !include <tupadr3/devicons2/java> Container(browser, "Browser","Firefox or Chrome", $sprite="chrome") Container(api_a, "API A","Spring Boot", $sprite="java") !endsub !startsub intranet !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/postgresql> ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql") !endsub !startsub extranet !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/postgresql> ContainerDb(db_b, "Database B","Postgresql", $sprite="postgresql") !endsub
@startuml use-case-1 ' We only include context-related sub-diagams !includesub fragments.iuml!dmz !includesub fragments.iuml!intranet Rel(browser,api_a,"HTTPS") Rel_R(api_a,db_a,"pg") @enduml
Filtering Unlinked Containers
Since mid-2020, Plantuml supports a game-changing feature for software architects: the
remove @unlinked directive. It only keeps from a C4 diagram the containers calling or being called and drop any other.
This feature (along with the diagram fragments capacities) was a requirement to achieve the diagram patterns described below.
Thousands of sprites are available to decorate the C4 containers. They are now embedded directly into the last Plantuml releases. They include Devicons, Font-Awesome, Material, Office, Weather and many other icon libraries. Most software, hardware, network and business-oriented icons are ready to use out of the box!
From my experience, using sprites inside C4 containers makes the diagrams airier and thus more pleasant to read. Maybe does it help our brain to identify faster the nature of each container?
Note that even if you can use different background colors to differentiate C4 containers based on a specific criteria (for instance, I use a light grey for external APIs), we recommend using sprites instead to represent nature as it makes cleaner diagrams and the default blue color is fine in most of the cases.
Plantuml IDE Plugins
Plantuml is a very versatile technology that can be used in many different contexts including:
A simple base64 encoded URL like
Inside a Word processor like LibreOffice or Word;
From programming languages like Groovy, Java or Python;
In most IDE like Intellij IDEA thanks to this plugin;
Or in Eclipse with this plugin;
But my own favorite is the VScode plugin. Among other features, it supports multi-diagrams generation from a single
.pumlfile and multi-diagrams/multi-puml files diagrams generation. It can be finely tuned.
Architecture as Code
A very nice side-effect of the IDE Plantuml integration is the fact that you can not only create diagrams much faster by being released from the arrangement chore but also write them as you code. Diagrams can be automatically generated and refreshed as you type.
This kind of tooling enables what I would call Mob design. Especially at the beginning of our project but still currently, we used to brainstorm about the software architecture. Using Plantuml and a large shared screen, it is very convenient to create and compare several architecture scenarios.
"What if the API
A is called directly by the client
B?" Or "Should it be called asynchronously by the job
In the same manner that end-users truly need to visualize screen mockups, developers and architects think better in front of diagrams. This also greatly limits misunderstandings induced by the limitation and numerous ambiguities of natural languages.
Inventory and Dependencies Diagrams
As a blueprint we use the
!includesub directives to separate:
Inventory diagrams show static elements of the architecture (classified into different network zones and represented by boundaries) but don't display relations between them. They are useful to respond to questions like "What contains zone
xyz?" or "Which modules cover system
xyz?"). It is particularly useful in the application view to clearly display systems modules of complex microservices architectures or in the infrastructure views to represent nodes in each network zone and their deployable units. This kind of diagram uses C4 container diagrams.
Dependencies diagrams leverage the static diagrams but augment them with calls between the containers. Inventory diagrams can be used alone but dependencies diagrams have to import the inventory diagram. It should respond to questions like "Which module/container is called by X" or "Which modules/container does X call ?". It is also helpful for impact studies: "What's the impact if I change
Example of an inventory diagram:
Example of dependency diagram (importing its inventory counterpart and adding a person and a bunch of calls):
@startuml dependencies header Dependencies diagram !include inventory.puml Rel(client, static_resources, "HTTPS") Rel(spa,sm,"REST call","HTTPS") Rel(sm,queue,"AMQP") Rel(sm,amd_db,"psql") Rel(batch, queue, "AMQP") Rel_R(batch, saccounting, "HTTPS") Rel(batch, sreporting,"HTTP") Rel(batch, smails, "SMTP") remove @unlinked @enduml
Dynamic Diagrams to Describe Call Chains
Once we have provided the system big picture using both an inventory and dependencies view, we describe the detailed architecture of each main feature using a third kind of C4 diagram: C4 dynamic diagrams. C4 container and dynamic diagrams are very similar but the latter comes with automatic call numeration.
Some may prefer good old UML2 sequence diagrams for complex interactions. In most cases, I find the C4 dynamic diagrams easier to read when dealing with container interactions.
When working on complex code design, we rather use UML2 sequence diagrams.
C4 dynamic diagrams target developers. They detail calls or data streams between C4 containers involved in the context of a given feature, hence providing a detailed view of each call chain.
The feature term should be intended in the agile meaning (fulfills a stakeholder need). It can be something like "Allow an enterprise to access its data online" or "Pay for an order".
This kind of diagram can still contain zones or boundaries (already available in the inventory or dependencies diagrams), thus setting up the call chain in a more global context.
The feature architecture leverages one or more call chains and a call chain is made of a group of ordered calls or actions (like calling an API, writing a file on disk, etc.) all performed synchronously. Any further call is referenced in the next call chain.
By 'synchronous', we mean a set of activities sharing the same logical "transaction". A technically asynchronous call (like when using reactive programming) still applies. On the contrary, in the case where a call chain produces a message as a part of an Event Driven Architecture, this event consumption and computation by another module are NOT counted in the same call chain even if the production and the consumption of the event are technically almost instantaneous.
When considered helpful, we augment the diagrams with some textual context (using AsciiDoc) before or after the diagram but this text should be synthetic, not redundant with the diagram itself. Call chain diagrams are however often sufficient in themselves.
We leverage inventory diagrams fragments and unlinked container filtering explained before to achieve an effective Architecture As Code pattern.
File call chain
deliver-1.puml (note the
remove @unlinked usage here):
@startuml deliver-1.puml !include inventory.puml !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Dynamic.puml ' For call chains, we advise to put a header (displayed by default at the upper-right ' side of the diagram) to ease its identification. header deliver-1 Person_Ext(company, "Company", "[person] \nWeb client (PC, tablet, mobile)") Rel(client, static_resources, "Visit https://allmydata.gouv", "HTTPS (R)") Rel(client, spa, "Retrieves information via") Rel(spa,sm,"REST call","HTTPS (W)") RelIndex(LastIndex()-1,sm,queue,"Produces a request message to the queue","AMQP (W)") RelIndex(LastIndex()-2,sm,amd_db,"Stores the request data","JDBC (W)") increment() ' Remove all C4 containers imported from inventory.puml file but not involved ' in this call chain to make the diagram much cleaner remove @unlinked @enduml
It is paramount to standardize call chains naming (like
pay-3, ...) because it becomes a strong vector of communication between developers and business analysts. It is then possible to talk using canonical names like
deliver-1 3-1 for instance. This is a massive misunderstanding killer, time saver and is one of the main benefits of this methodology.
I suggest to simply using the
<feature>-<incrementing number> naming scheme.
File call chain
deliver-2.puml (note the 'remove @unlinked' usage here):
@startuml deliver-2.puml !include inventory.puml header deliver-2 Rel(sm,amd_db,"JDBC CRUD calls","psql") Rel(batch, queue, "Consume each request message", "AMQP (R)") Rel(batch, amd_db, "Read various very interesting data about the requester company", "JDBC (R)") Rel(batch, saccounting, "Get more interesting data from the Accounting system", "HTTPS (R)") Rel(batch, sreporting, "Produces a great PDF including great pie charts", "HTTP (W)") Rel(batch, smails, "Send an e-mail to original requester with the attached PDF", "SMTP (W)") Rel(batch, amd_db, "Store the request data (date, final status...)", "JDBC (W)") remove @unlinked @enduml
Each call should detail used network protocols along with a modifier flag (
E:Execute). These flags are important to figure out the call intention. More than a single flag on the same call is possible.
In our context, these call chain diagrams provide enough architectural details to code the application. They are the only design documentation we write before actually coding. Apart from them, the real (and best) documentation is the (clean) code itself.
I hope this introduction has aroused your curiosity about coding architectures using Plantuml and C4. A future article will provide our diagramming best practices and some Plantuml useful tips in an architectural context, keep in touch!
I will finish with a personal feeling I can't formally demonstrate but observed many times: the graphical "harmony" of an architectural diagram is directly proportional with its intrinsic quality. It is, therefores possible to form a first opinion of complex architecture with just a glimpse of the main diagram on the wall...
In the same order of ideas, dependencies diagrams highlight the strategic modules and reflect the balances of power hidden behind the architecture (as expected by the Conway's Law).