Architecture as Code with C4 and Plantuml
Architecture as Code with C4 and Plantuml
(This article has also been published at DZone)
I'm lucky enough to currently work on a large microservices-based project as a solution architect. I'm responsible for designing different architecture views, each targeting very different audiences, hence different preoccupations:
- The application view dealing with modules and data streams between them (targeting product stakeholders and developers)
- The software view (design patterns, database design rules, choice of programming languages, libraries...) that developers should rely upon;
- The infrastructure view (middlewares, databases, network connections, storage, operations...) providing useful information for integrators and DevOps engineers;
- The sizing view dealing with performances;
- The security view, mainly transversal.
We use this Open Source Template to document our architecture.
Our current project architecture is fairly complex because of the number of modules (tens of jobs, API and GUI modules), because of the large number of external partners and because of its integration with a large legacy information system.
At this time, we have to maintain more than one hundred architecture diagrams. Following a living documentation approach, we adapt and augment diagrams, text and tables several times a day. Like we will see later, it's often a collaborative process taking advantage of several great tools.
The sample application
We illustrate this article with a fictional AllMyData micro-services application. This is a .gov web application enabling any company to get all its information known to all the public administrations.
We can split our feature "Deliver Companies Data" into two main call chains:
- A first call chain is made of the GUI requests that create requests into the system.
- A second one is made of a job launched periodically and consuming new requests. It gathers data about the company both from a local repository and from another administration IS (Information System), produces a PDF report and sends an e-mail to the company original requester.
The C4 Model
We massively use the C4 model to represent our architecture. It is beyond the scope of this tooling article to describe it in depth but I invite you to have a look at this very pragmatic approach. I find it very natural to design complex architectures. It leverages UML2 standard and provides a great dichotomy between high level concerns and code-level ones.
Archimate could be another good fit for us but probably overkill in our context of very low modelization adoption and knowledge. Also, we like the C4 KISS/low tech approach that take many human psychological criteria into account. Note that some Archimate tools supports C4 diagrams using some mapping between concepts. Not sure it is good idea to mix both though.
In our context, we currently use three main C4 diagrams types (note that C4 and UML2 contain others not listed here):
- System landscape diagrams provide a very high-level view of the system. We use it to describe the general application architecture.
- Container diagrams uses to describe the middlewares, databases, and many other technical components as well as data streams between them. They are similar to UML2 deployment diagrams but more natural in my opinion. In the application view, we mainly display modules and databases and in the infrastructure view, we drill down into technical devices like reverse proxies, load balancers, cluster details, etc. We also use C4 dynamic diagrams, very similar to container diagrams but including calls numbering.
- Various UML2 diagrams (sequence, activity, classes). We use them with parsimony and only to express a pattern or something specially important or complex but certainly not for ordinary code.
I'm a quite reluctant to use the C4 container term because of the risk of confusion with Docker/OCI containers (as pointed by Simon Brown, the C4 creator). In our organization, we prefer to call them deployable units. The C4 model encourages terminology adaptation. A C4 container is basically a separated deployable process. The C4 documentation states: "Essentially, a container is a separately runnable/deployable unit (e.g. a separate process space) that executes code or stores data".
In the C4 model, a container can contain one or more software components. This concept doesn't refer to infrastructure component but some large piece of code (like a set of Java classes). We barely use C4 components in our architecture document because we don't really need to go into that level of details (our hexagonal architecture makes things simple to design and understand just by reading the code and our agile approach makes us to prefer limiting the design documentation we have to maintain).
Plantuml is an impressive tool that generates instantly diagrams from a very simple textual DSL (Domain Specific Language).
For instance, this very short text:
@startuml [Browser] -> [API Foo]: HTTPS @enduml
...is enough to produce this diagram:
Plantuml comes with hundreds of features and syntax goodies, sometimes undocumented and evolving very quickly. I suggest this website as a clear and exhaustive documentary reference.
Check out some real-world examples here.
Plantuml Combined With C4
Plantuml component diagrams can be customized as C4 diagrams using this extension library.
Just import it at the top of your Plantuml diagrams and use C4 macros:
@startuml !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/chrome> !include <tupadr3/devicons2/java> !include <tupadr3/devicons2/postgresql> LAYOUT_LEFT_RIGHT() Container(browser, "Browser","Firefox or Chrome", $sprite="chrome") Container(api_a, "API A","Spring Boot", $sprite="java") ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql") Rel(browser,api_a,"HTTPS") Rel_R(api_a,db_a,"pg") @enduml
is exported as:
Always export diagrams in SVG format to allow unlimited zooming. It is a appreciable when dealing with large diagrams.
We use here the online latest version but you may prefer to use a static downloaded version in an air-gap mode.
A great thing about Plantuml is the factorization capabilities using the
!includesub preprocessor directives.
It is possible to include locally or remotely diagrams (ie. starting with
@startuml and ending with the
@enduml directive). For instance, C4 macros are included using this instruction:
More interestingly, it is also possible to import diagram fragments (ie. starting with
!startsub and ending with the
!startsub dmz !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/chrome> !include <tupadr3/devicons2/java> Container(browser, "Browser","Firefox or Chrome", $sprite="chrome") Container(api_a, "API A","Spring Boot", $sprite="java") !endsub !startsub intranet !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/postgresql> ContainerDb(db_a, "Database A","Postgresql", $sprite="postgresql") } !endsub !startsub extranet !include https://raw.githubusercontent.com/plantuml-stdlib/C4-PlantUML/master/C4_Container.puml !include <tupadr3/devicons2/postgresql> ContainerDb(db_b, "Database B","Postgresql", $sprite="postgresql") } !endsub
@startuml use-case-1 ' We only include context-related sub-diagams !includesub static.iuml!dmz !includesub static.iuml!intranet Rel(browser,api_a,"","HTTPS") Rel_R(api_a,db_a,"pg") @enduml
Filtering Unlinked Containers
Plantuml supports since mid-2020 a game-changer feature for software architects: the
remove @unlinked directive. It only keeps from a C4 diagram the containers calling or being called and drop any other.
This feature (along with the diagram fragments capacities) was a requirement to achieve the diagram patterns described below.
Thousands of sprites are available to decorate the C4 containers. They are now embedded directly into the last Plantuml releases. They include Devicons, Font-Awesome, Material, Office, Weather and many others icon libraries. Most software, hardware, network and business-oriented icons are ready to use out of the box!
From my experience, using sprites inside C4 containers makes the diagrams more airy and thus more pleasant to read. Maybe does it help our brain to identify faster the nature of each container?
Note that even if you can use different background colors to differentiate C4 containers along a specific criteria (for instance, I use a light grey for external APIs), we recommend to use sprites instead to represent the nature as it makes cleaner diagrams and the default blue color is fine in most of the cases.
Plantuml IDE Plugins
Plantuml is a very versatile technology that can be used in many different contexts including:
A simple base64 encoded URL like
Inside a Word processor like LibreOffice or Word;
From programming language like Groovy, npm, Java or python;
In most IDE like Intellij IDEA thanks to this plugin;
Or in Eclipse with this plugin;
But my own favorite is the VScode plugin. Among others features, it supports multi-diagrams generation from a single
.pumlfile and multi-diagrams/multi-puml files diagrams generation. It can be finely tuned.
Architecture As Code
A very nice side-effect of the IDE Plantuml integration is the fact that you can not only create diagrams much faster by being released from the arrangement chore but also write them as you code. Diagrams can be automatically generated and refreshed as you type.
This kind of tooling enables what I would call Mob design. Especially at the beginning of our project but still currently, we used to brainstorm about the software architecture. Using Plantuml and a large shared screen, it is very convenient to create and compare several architecture scenarios.
"What if the API
A is called directly by the client
B?" Or "Should it be called asynchronously by the job
In the same manner than end-users truly need to visualize screen mockups, developers and architects think better in front of diagrams. This also greatly limit misunderstandings induced by the limitation and numerous ambiguities of natural languages.
Inventory and Dependencies Diagrams
As a blueprint we use the
!includesub directives to separate:
Inventory diagrams showing static elements of the architecture (classified into different network zones and represented by boundaries) but doesn't displaying relations between them. They are useful to respond to questions like "What contains zone
xyz?" or "Which modules encover system
xyz?"). It is particularly useful in the application view to clearly display systems modules of complex microservices architectures or in the infrastructure views to represent nodes in each network zone and their deployable units. This kind of diagram uses C4 container diagrams.
Dependencies diagrams leverage the static diagrams but augment them with calls between the containers. Inventory diagrams can be used alone but dependencies diagrams have to import the inventory diagram. It should respond to questions like "Which module/container is called by X" or "Which modules/container does X calls?". It is also helpful for impact studies: "What's the impact if I change
Example of inventory diagram:
Example of dependency diagram (importing its inventory counterpart and adding a person and a bunch of calls):
@startuml dependencies header Dependencies diagram !import inventory.puml Rel(client, static_resources, "HTTPS") Rel(spa,sm,"REST call","HTTPS") Rel(sm,queue,"AMQP") Rel(sm,amd_db,"psql") Rel(batch, queue, "AMQP") Rel_R(batch, saccounting, "HTTPS") Rel(batch, sreporting,"HTTP") Rel(batch, smails, "SMTP") remove @unlinked @enduml
Dynamic Diagrams to Describe Call Chains
Once we have provided the system big picture using both a inventory and dependencies view, we describe the detailed architecture of each main feature using a third kind of C4 diagram: C4 dynamic diagrams. C4 container and dynamic diagrams are very similar but the latter comes with automatic calls numeration.
Some may prefer good old UML2 sequences diagrams for complex interactions. In most of the cases, I find the C4 dynamic diagrams easier to read when dealing with containers interactions.
When working on complex code design, we rather use UML2 sequence diagrams.
C4 dynamic diagrams target developers. They detail calls or data streams between C4 containers involved in the context of a given feature, hence providing a detailed view of each call chain.
The feature term should be intended in the agile meaning (fulfills a stakeholder need). It can be something like "Allow an enterprise to access to its data online" or "Pay for an order".
This kind of diagram can still contain zones or boundaries (already available in the inventory or dependencies diagrams), thus setting up the call chain in a more global context.
A feature architecture leverages one or more call chains and a call chain is made of a group of ordered calls or actions (like calling an API, writing a file on disk, etc.) all performed synchronously. Any further call is referenced into the next call chain.
By 'synchronous', we mean a set of activities sharing the same logical "transaction". A technically asynchronous call (like when using reactive programming) stills applies. At the contrary, in the case where a call chain produces a message as a part of an Event Driven Architecture, this event consumption and computation by another module is NOT counted in the same call chain even if the production and the consumption of the event is technically almost instantaneous.
When considered helpful, we augment the diagrams with some textual context (using AsciiDoc) before or after the diagram but this text should be synthetic, not redundant with the diagram itself. Call chain diagrams are however often sufficient in themselves.
We leverage inventory diagrams fragments and unlinked containers filtering explain before to achieve an effective Architecture As Code pattern.
File call chain
deliver-1.puml (note the
remove @unlinked usage here):
@startuml deliver-1.puml !import inventory.puml ' For call chains, we advise to put a header (displayed by default at the upper-right ' side of the diagram) to ease its identification. header deliver-1 Person_Ext(company, "Company", "[person] \nWeb client (PC, tablet, mobile)") Rel(client, static_resources, "Visit https://allmydata.gouv", "HTTPS (R)") Rel(client, spa, "Retrieves information via") Rel(spa,sm,"REST call","HTTPS (W)") RelIndex(LastIndex()-1,sm,queue,"Produces a request message to the queue","AMQP (W)") RelIndex(LastIndex()-2,sm,amd_db,"Stores the request data","JDBC (W)") increment() ' Remove all C4 containers imported from inventory.puml file but not involved ' in this call chain to make the diagram much cleaner remove @unlinked @enduml
It is paramount to standardize the call chains naming (like
pay-3, ...) because it becomes a strong vector of communication between developers and business analysts. It is then possible to talk using canonical names like
deliver-1 3-1 for instance. This is a massive misunderstandings killer, time saver and is one of the main benefits of this methodology.
I suggest to simply use the
<feature>-<incrementing number> naming scheme.
File call chain
deliver-2.puml (note the 'remove @unlinked' usage here):
@startuml deliver-2.puml !import inventory.puml header deliver-2 Rel(sm,amd_db,"JDBC CRUD calls","psql") Rel(batch, queue, "Consume each request message", "AMQP (R)") Rel(batch, amd_db, "Read various very interesting data about the requester company", "JDBC (R)") Rel(batch, saccounting, "Get more interesting data from the Accounting system", "HTTPS (R)") Rel(batch, sreporting, "Produces a great PDF including great pie charts", "HTTP (W)") Rel(batch, smails, "Send an e-mail to original requester with the attached PDF", "SMTP (W)") Rel(batch, amd_db, "Store the request data (date, final status...)", "JDBC (W)") remove @unlinked @enduml
Each call should detail used network protocols along with a modifier flag (
E:Execute). These flags are important to figure out the call intention. More than a single flag on the same call is possible.
In our context, these call chains diagrams provide enough architectural details to code the application. They are the only design documentation we write before actually coding. Apart from them, the real (and best) documentation is the (clean) code itself.
I hope this introduction has aroused your curiosity about coding architectures using Plantuml and C4. A following article will provide our diagramming best practices and some Plantuml useful tips in an architectural context, keep in touch!
I will finish with a personal feeling I can't formally demonstrate but observed many times: the graphical "harmony" of an architectural diagram is directly proportional with its intrinsic quality. It is therefore possible to form a first opinion of a complex architecture with just a glimpse of the main diagram on the wall...
In the same order of ideas, dependencies diagrams highlights the strategic modules and reflects the balances of power hidden behind the architecture (as expected by the Conway's Law).