Figuring out systems and their organisation

kevin-ku-w7ZyuGYNpRQ-unsplash (1)

"Figuring out systems and their organisationis the title for those that prefer Powerpoint, Word and TikTok.

Alternatively this post could be titled:

"Artifacts for the hardening, enhancement and transformation of a pre-existing systems architecture and corresponding organisational construct" which is the title for those that prefer LaTeX, BibTeX and Xfic.

Did you buy a Mac because it has a great UI and seldom crashes or because under the hood it's running Linux? It's your choice. As is the decision at the start of a project whether you begin with a specification document or a #include statement.

Why would you want to re-document a system? Simple: because nobody writes documentation for someone else. Usually that’s the stated aim, but there’s always some aspect of the description that has been overlooked because the author has a deep understanding of it, so will skip over some detail sub-consciously, because it’s obvious to them, or consciously, because clearly it’s glaringly obvious to everyone right? Unfortunately developers have a tendency, either consciously or unconsciously, to merge the two reasons. There is also the case where documentation falls short because delivery pressures didn’t allow time for documentation to be written[1], or because the developers in question are so insecure that they want to ensure that they always have to be consulted on any non-trivial aspect of the build[2]. For evidence, consider any occasion you have been presented with a system to review, support or replace and ask yourself whether you were able to understand all of its requirements, its design and its implementation from the associated documentation. Unless the system was trivially simple the answer is most likely “never”.

There is a school of thought that considers the visual rendering of systems to be unnecessary and documentation before coding a luxury. This view is usually held by people that, when attending a tech talk, immediately open a sticker-covered laptop, not to take notes, but because they feel uncomfortable without an IDE and a screen full of code in front of them. Obviously their code requires no further explanation. It is elegant, clear, concise and self-documenting. You know who you are. They like to explain complex system interactions over videoconference by scrolling through their code oblivious to the exasperated bemusement of their viewers. Gentlemen, (and you are always gentlemen[3]) this post is for you.

Here we discuss some of the motivations for, and benefits of mapping out a system, then consider a set of artifacts that have proven useful both in modelling systems and their support structures, and subsequently maintaining or overhauling them. We cover their benefits, and who the intended audience may be. Finally we examine some organisational constructs that can be effective in building and maintaining the systems. These are proposals not canonical prescriptions. None of this is rocket science[4], it’s just stuff that has worked. Schematics don’t replace code analysis but they provide a gentle path into it rather than the “dropped from the height of a kilometre at 9.8m/s2” alternative.

Many architectural mapping tools and frameworks have been proposed over the years covering system descriptions from various angles and to varying degrees of granularity. From a practical perspective none have gained universal acceptance to describe every tranched size of application or system let alone systems of different forms, technologies, distributions, topologies or complexities. That’s because there is no single modelling tool that satisfactorily and succinctly presents all required aspects of all, although several have been proposed[5] over the years. The extent of the potential differences in system forms is too broad, and the descriptive tools become too diluted in effectiveness or cumbersome. So don’t expect to be able to schematically describe every nuance of a build. Do however use these tools to stop and think before diving into code. When teaching undergrad and postgrad programming courses I often advocated stepping away from the keyboard to an empty desk and deploying a pencil and paper, noting that it provided a great graphical UI and enabled text and visuals to be presented in a free-form mixed media rendering[6]. Most importantly it forced a pause for thought rather than continued hackery.

Before we enumerate the artifacts it is worth considering some examples of why we might be analysing a system and its support structure, and what the point of any system descriptions might be.

 

Why re-model a system?

Consider the case where we have redesigned a system at an abstract level and are comparing it with the existing implementation. There may be design flaws in the original for a number of reasons:

  • The use case may have changed. Consider as an example a formerly collocated system which might use a database to store messages since it provides the handy side effect of data retention and is familiar to many developers, so easy to support. But if the system's use case changes to require geographically distributed messaging then holding data at rest in a single location is an odd choice when our requirement is for data transmission rather than storage. So we might choose to represent this as a dataflow component rather than data storage. We would follow on to implement this using a wholly different technology such as messaging or event streaming with logging (if temporal history is a requirement) rather than simply data at rest in a database.
  • Genuine architectural mistakes may have been made in the original design due to a number of factors. Often people reuse tried and tested designs. This is no bad thing provided the use case is strongly correlated to previous ones. However it is easy to get a quick deliverable using a cookie cutter but not necessarily an approach that delivers long term robustness or performance. Everyone knows the developer that always uses a SQL database or a csv filestore as a data repository no matter what the use case. Conversely there is always the temptation to be the “cool kids” and try out the contents of the latest Apache repo for bragging rights even if the benchmarks don’t necessarily add up. I once replaced an object request broker component in an inherited system with simple TCP/IP sockets because the request broker was just being used to stream messages.
  • A POC may have been grown into a live production system because it was adequate for the launch condition, and pressure to ship may have outweighed the need to re-engineer, but design shortcuts in the POC prevent scale. The growth of the POC into a fully-featured system may well have inherited design flaws that were permitted due to the assumption that the POC would be for trial purposes only.

There are other reasons to re-model. Consider the obvious one:

  • There just isn’t adequate, or even any documentation available.

Yes. It happens. You know that it does.

 

System and Application Artifacts

The following half dozen artifacts are relevant at the organizational and team level. Some are useful for organization-wide architectures, others for single team apps, but all have their place. It is always useful to have a firm-wide or function-wide diagram that illustrates the structure of the firm’s systems at the macroscopic level that can be drilled into to visualize the subordinate component level views. You will most likely be familiar with all of them but at the end of the next section will be trying to remember the reasons you didn’t prepare any of them in your last project.

  1. Requirements Definition

This contains words. Maybe pictures, but largely words, written by humans to describe the problem at hand. It’s not ideal because it serves two masters (see later), but generally something is better than nothing here. Often they are written by a Business Analyst(BA) in conjunction with the client in order to tailor the document in such a way as to enable the developer to build a design against it. As a developer I always resented the intermediation and preferred to liaise directly with the client but I (grudgingly) acknowledge the fact that BAs serve a purpose since omniscience over all domains isn’t prevalent in the developer community. Alternatively they may be written by a Product Owner who serves the same function as the BA in this context.

The document should answer a couple of questions: what are the client requirements, and can they be enumerated and ideally rendered as metrics we can measure the delivery against so we know when it’s “done”? The latter part is necessary to avoid scope creep. As an example consider a design for a volume pricing and risk system. In what timeframe will the client want to see:

  • positions/PnL?
  • Linear risk?
  • Non-linear risk?

If the client is a human watching their position then the answer will differ considerably from the case where the client is an algo driven by these elements as feedback parameters. These requirements along with other such as peak trade volume per minute and per day are all quantifiable and can be used as metrics to drive the design and determine the success or failure of the build.

The next question to ask is who are the audience for the document? This should be obvious from the discussion so far:

  • The client – to confirm the document accurately describes their use case.
  • The development team – to provide a definitive reference for their system design.

 

  1. System-wide Dataflow Diagram (DFD)

A DFD is used to render:

  • data at rest
  • data ingress
  • data egress
  • data transformation

The benefits of such a diagram are that it:

  • will show which systems use what data (ie data dependencies)
  • can give an indication for data audit trailing
  • can show duplication around the architecture
  • can be used as a didactic tool for large scale architectures at the organizational level

Two variants of the DFD should be considered:

  • A version that shows the architecture from a business user’s perspective only. This diagram will contain no, or very limited technology-specific labelling. So a “Trade Presentation” process connects to a “Trade API” via an arrow labelled “trades”. This version is useful to guide discussions with business stakeholders and to give a more generalised focus on the business flows without technical clutter.
  • A version that focuses on the technology. This additionally labels processes with more technical detail. For example a “Trade presentation” process (ie a data transformation component) could be further labelled with “Protobuf->JSON conversion” which would connect to a “REST API” point via a pipeline symbol (or plain old arrow again) additionally labelled “Kafka”. This version is more noisy, but it is helpful to describe the system from a technical perspective.

 

  1. Functional dependency graph or component diagram showing system components

While the DFD illustrates data dependencies. A dependency graph illustrates components dependencies, ie functional or process dependencies. The initial version can be derived from a DFD which will present the system components. However, the dependency diagram may need to be more granular to reflect the functional dependencies between applications and their libraries. This is pertinent if, as is likely, a library and the processes that utilize it as built by different teams. A benefit of this is that it will show what applications must be tested in the event of a change to another element and in what build-order.  Note that dependency graph should be acyclic (ie a DAG) as per component-level modelling principles. If it is not acyclic then there is a problem because changes to one system can cause recursive testing requirements. In this case the circular dependency needs to be reviewed for options to remove it.

 

  1. Control Flow diagram

A diagram that shows the flow of control through a system illustrates the temporal relationships between components and the decision points that cause state change. Neither facet is rendered in either of the preceding diagrams. It is useful for understanding state evolution but less useful as a didactic tool.

 

  1. Target Operating Model architecture

A TOM diagram is the “to be” systems diagram. This can be a technical DFD for macro-systems or a component diagram for single applications. It’s not something you back out from an existing implementation but it is something you derive from a Requirements Definition and compare against an existing implementation. The organization needs to know which components of the architecture are strategic and which need to be kept alive while building a replacement. A TOM gives the organisation something to measure all new RFCs and all architectural decisions against. Any divergence from the TOM in an RFC should prompt additional review. There will be occasions where some development is divergent but this will be clearly identifiable as tactical.

 

  1. Organisational mapping

If you map each component that appears on the DFD or dependency graph to an owner (or team) within the organisation you can understand the functional nature of the team and spot:

  • Clustering risks,
  • key-person dependencies,
  • orphaned systems,
  • nonsensical support arrangements.

Each element in this list will have code attached to it which must have an owner. That team must be able to troubleshoot and enhance the component. Any system/app/component that does not have a clear owner is orphaned and therefore a risk to the organization and the maintainability of the architecture. Similarly any team that is overburdened with components will be unable to turnaround fixes and enhancements in a reasonable timeframe.

I have found colour-coding a DFD by team to be helpful and insightful in understanding this mapping.

 

Summary

Of these artifacts I find the DFD to be the most broadly useful, particularly across different areas of an organization. Nevertheless it is arguable that without a Requirements Definition you will be designing in a vacuum. Similarly having a TOM to measure all requests against can be highly beneficial in the task prioritization process, while both the DFD and functional dependency graph help you understand who is using your stuff and may get upset when you want to make changes.

To conclude, unless the systems you work on have a half-life of a couple of months or so then at some point someone else is going to need to understand how they fit together, and how they fit into a larger architecture. Understanding the role your work plays in the greater system is an empowering and valuable motivator. Having up-to-date schematics to illustrate this helps someone understand the role their component plays and in doing so can avert some disastrous assumptions. A visual overview also enables them to understand the component’s function and internal workings faster, making them productive quicker. So, write some words that aren’t in camelCaps, and draw some lines and boxes. Then redraw the lines and boxes each time you change something. If altruism doesn’t do it for you, consider the fact that while you’re the expert on the greenfield build you work on today, tomorrow may bring you a brownfield nightmare. What goes around comes around.

 

[1] So produce it as you go along people.

[2] Warning: The longer you keep them in your organisation the bigger the problem becomes. Take the pain sooner rather than later.

[3] With the caveat that this observation is based solely on my own unscientific empirical evidence.

[4] Given that rocket science largely involves densely packing combustible material into a tube then burning it in a loosely controlled manner…the bar is low.

[5] Booch, Rational Rose, UML anyone? Where are they now? Maybe UML died because an IDE can reverse engineer a visualisation of the logic flow pretty well and everything else was diagrammatic sugaring.

[6][6] Note: Exasperated students stuck on knotty Programming 101 coursework don’t have a great sense of humour.