Architecture Basics: Performance

Estimated Reading Time: 11 minutes

Opening Words

The third design quality discussed here is Performance. This is one of the major, though often misunderstood, qualities.

All components of a system are measurable by their “Performance”, and have performance specifications.

Let’s see in the next paragraphs what this design quality measures, validates and implies.

Definition of Performance, KPI

The word “Performance” is over-utilized in many ways.
Regarding system’s design qualities, it can mean:

  • Ability of the system to perform a task in a defined time (percentage)
  • Time needed to achieve a given task (s/req)
  • Number of tasks completed in a given time (req/s)
  • …and more!
4 tasks completed per second

Performance is measured by different metrics, depending on the type of system. These metrics are tracked by application or workload group.

Some companies implement regular benchmark reports to calibrate the metrics and rely on realistic and achievable numbers.

Compute, Storage, Network

Performance metrics are ubiquitous in the infrastructure: CPU/memory, network cards, hard drives/solid drives, routers, switches… each component has one or more metrics, either from the system vendors, but also from the user community who have performed benchmarks.

Capture of Passmark website with CPU benchmarks
  • Storage is expressed in IOPS (number of writes or reads per second) and latency (time to write and acknowledge the data).
  • Network will be measured with the throughput in Gbps and pps (or bpps, mpps).
  • CPU and GPU metered with the power consumption/TDP in Watt, the number of instructions processed per second, the processing capacity with a score in teraflops or – scores linked to the benchmark application (CPUMark or OpenBenchmarking…).

Application Metrics

Application performance is closer to business needs, and tied to the application chunks.

Let’s take two examples:

  • Web/http servers : Performance can be measured by page response time, measuring the time to finish the task of displaying the http page ; or the maximum number of connected clients with a response time under a defined threshold.
  • Relational database: Performance can be measured in transactions per second (tps).

With metrics directly linked to one or more components of the application, we can then assemble these different metrics and determine a performance level for the application

Performance vs. Specifications

Product/software specifications can be considered to design the Performance quality attribute.

For example, the CPU frequency (in GHz), number of cores available, cache memory, pipeline depth, and others could be used to provide an approximation for the sizing of the physical servers or the sizing to be given to the cloud instance.

Screen capture of CPU specifications
Source :

If this can be a valid approximation for the CPU, it may not be appropriate for all the components/systems. SSD IOPS, for example, are values given a specific context.

The difference between theory and your use case can be (very) significant (negatively but also positively), it is important to understand the vocabulary and to agree on the metrics to use as well as the service/offer commitments.

Adapting to use case

These metrics and measures can also be derived, adapted, to be easier to understand and compare.

For example, CPUs are sometimes measured in benchmarks via performance per watt or performance per dollar. If your business is to offer paid computing capabilities, then those metrics will make more sense than raw performance for you to build and bundle the offer. This allows you to choose a purchasing strategy adapted to other parameters (scalability, energy footprint…)

Depending on your use case, existing metrics can be used together to build a new metric that will be meaningful to you. We are seeing more and more sustainability metrics, with a CO2 cost of a cloud instance, derived from its energy consumption metrics, cores used, disk usage and access rate, network ports and usage and server age/amortization.

Carbon footprint dashboard – source :


The offers and requests are quite diverse on the performance engagement.

We can distinguish 3 cases :

  • Maximum : Defines a danger zone, a Performance that you MUST NOT / CAN NOT exceed.
    • It could be a soft limit (not technically enforced) or a hard limit (a cap fixed by the code)
    • For example, a network device “capable to deliver a total of up to 720 mpps or 600 Gbps”, or a CPU at 3.1 GHz.
    • Usually, these performances KPI are almost impossible to achieve; or with a very specific setup
  • Average : a generally observed level of performance that can sometimes be expected to be achieved.
    • e.g. an SSD at 100k IOPS, a network switching latency of 10µs
    • These performances are sometimes achievable, or even exceeded, under favorable conditions.
  • Guarantee : infrequent offer, consists of a minimum guaranteed performance level, usually within a given context.
    • Differs from the previous case as the vendor/provider is contractually committed to provide a given level of performance (in a scoped context, on validated metrics), with penalties if the performance is below the contract.
    • For example, 1 “IOD” (IO Density, or “IOPS per GB used”) on block storage arrays ; for a volume of 12 TB, the guarantee is 12000 IOPS for up to 600 devices in concurrent access.
Needs and achieved compared to contractual cases

Most architectures use the second case, the “average”. This is the most available, cost-efficient and most realistic scenario to agree upon to provide a standardized and simple architecture.

Service offerings of “guarantee” type are very specific and out of context of a global solution. Of course, this can be applicable in well-scoped use cases.

When building a cloud offering for internal customers, performance metrics are often included. Not for comparison purposes, but to know if the proposed offer can match to the (real) need. And when there is no level of performance required, this will set “limits of reason”, in case of issue, escalation or dispute.

Common Mistakes

Foolish Guarantee

The first usual error is to try making a design on guarantees, but based on vendor specifications. These numbers are operational data, but do not clearly represent a performance status. For example, stating in an offer that the CPU is running at 3 GHz does not tell you how many AES crypto hashes that CPU is capable of, depending on generation, offloads etc.

This is a misunderstanding of “Performance”, using specs to put numbers in an offer so that it can be compared with others and customers can get an idea of what to expect.

This can be good for an “iso” comparison, but generally this “performance metric” will fall into the marketing trap.

For example, for an ISP: between a 56 kbps analog vs 1 Gbps fiber offer, it’s a good indicator and obvious choice. But between a 100 Mbps satellite connection and 30 Mbps cable connection, an internet gaming user (with low latency requirements) will prefer the cable circuit, despite lower bandwidth,

Fear of running out

Other architectures or end-customers “simply” request the maximum possible, “in case of”, “to be hedged”. This is common when request/needs are poorly assessed, or when end-customers are unable to provide demand metrics, or in heated discussions about the service offering.

In terms of design quality, it doesn’t lead anywhere. If the need is not established, then it is a customer problem. If the need is clear but not achievable by the offer, then it is simply a no-go.

Sometimes, requests can be aberrant to deliberately eliminate an offer, even though these needs do not exist.
Some examples: 500 TB on a single virtual disk, 80 Gbps bandwidth usage on each of thousands of hosts, etc. These requests can be easily defeated with the help of managers if needed.

“Isolation Syndrome”

The most important and frequent issue with Performance quality is what I will call the “isolation syndrome”.

It consists in focusing on single and isolated values. A solution contains many components/systems, each of which has one or more performance metrics. Each of these metrics can be strongly affected by the others, so it is critical to consider the overall context of the system(s).

before an instruction can get executed on a CPU, it has to pass through pipelines, which adds a latency besides of time taken for instruction cycles

In storage for example, sizing a storage array only on the given IOPS (ex: 500 000 IOPS @4kB 70/30 W/R) without considering the latency, number of clients and requests in parallel at this performance level will misrepresent the performance quality of the offer.
The communicated number might be achievable, but at the expense of other needs, often implicit to the requesters.
With this storage offering of 500k IOPS, if for each of the 300 clients the latency is 30ms, the OIO queue is 32, then the advertised performance (of 500k IOPS) will not be suitable for a use case with 600 clients and 10ms maximum allowable latency.

When communicating on any Design Quality, you need realistic, accurate and useful data. Giving a single performance figure (e.g. 500kIOPS) without specifying the context of benchmarks/tests cannot produce a viable offering.

This is where adapted metrics find their place. For example, EUC architectures use “user experience” (UX) as a major metrics group. Tools like LoginVSI simulates load, user activities (browser, office…) and thus obtain contextualized metrics, such as user density.

Login VSI result metrics
Source :

Impact on other Design Qualities

Performance has impacts on other design qualities, especially on Scalability, but other qualities are also concerned indirectly, like Recoverability and Availability.

If the solution has a certain level of performance, then in a case of service recovers from failure (including to another system), the same level of performance must be satisfied. Similarly, in a case of disaster recovery to another datacenter and system, the performance level must be identical, or the offering adapted.

Impacts of Performance are also visible on Sustainability / GreenIT: a solution at a higher performance level is often consuming more energy than that same solution at a lower performance level.

Different architectures have different power needs

How to change Performance quality

There are several factors to consider.

The main and easiest lever for improvement is scaling. Adding more systems (horizontal scale) or adding resources to the system (vertical scale) is the most classic way to increase performance.

Vertical scale consists for example of changing the CPU for a more powerful one, or bigger disks. Horizontal scaling will also increase Availability, by adding systems to a cluster, for example a CPU with more cores, a second CPU, adding disks…

SQL Server performance with vertical scaling on VMware Cloud on AWS

However, the increase is not linear, at a certain level, scaling is not enough and can even be counterproductive, reducing performance because the systems have become too greedy to manage themselves without performance gain.

A common scenario is oversized virtual machines, where you can see databases on a 32 vCPU vm perform worse than it was with 24 vCPUs. There can be multiple reasons: additions of vCPUs without any (or bad) reconfiguration, misaligned with the underlying system and its hardware (NUMA for example), to non-redistributed partitions or shards.

Other levers to change the Performance are related to the development, such as code optimization, use an optimized compiler, a more powerful rendering engine, or even a change of development language.

But all these changes have a high implementation cost, especially compared to “simply using compute resources”. For example, using Go (compiled language) vs. Python, or changing the Javascript workflow engine from Rhino to V8.

Compiler, runtime, language comparisons
Sources :,,

Performance can also be lowered voluntarily – although less costly, it is nevertheless not to be neglected in the architecture. Some examples of the reasons for decreasing performance:

  • Tasks that have to be ordered in a business or automation workflow. We will then use waiting loops to slow down the execution time.
  • Emulators of old terminals (i.e. IBM3270), to avoid “race conditions” which could happen with a (too) fast solution
  • Simulators of low bandwidth, high jitter network connectivity and so on. We can add “parasites” to the solution, to voluntarily keep the systems busy with dummy tasks.

Final Words

How to use Performance Quality in a solution, and what is it for? We have already listed the KPIs, contractual, offers. Including Performance in a solution is almost mandatory, whatever the contractual case is (average, maximum, guaranteed).
As for other design qualities, increasing the performance quality implies increasing the cost of the solution (whether it is adding hardware or man days).

Performance numbers are a kind of promise under control, provided that the end customer has read carefully and has a clear idea of the offer and associated KPIs.

Context and deviation over time must be taken into account, and a complex solution integrated in a broader information system (and its context) cannot reliably commit to a performance level with isolated, out-of-context metrics.

A good architecture starts with correctly collected and traced requirements. Performance must be adapted to the need, its definition and metrics must be clearly defined and understood by customers – even if it means adapting these metrics to adapt to the context.

Concrete data and metrics/KPIs should be included in the solution offering. Performance indicators ensure that the solution is adapted for the use case – not too powerful, not too slow – just the right choice, of the right solution, for the right use case.

Published by bsarda78

My name is Benoit Sarda, My job is simply to "make it happen" at the customer IT once a project has been confirmed, from stategic thinking to tactical delivery, no matter the difficulties. I'm one of the key individuals, peering with the architects, LoB and technical experts, while driving internal workforce to the right target. I have the chance to work on my passion, I'm eager of tech. I love the difference of scenarios, from simplicity to complexity, but also the diversity of my customers and the impact that a team can make. I'm a tech dissector, love to learn and share, be on the edge and beyond, as a future thinker.

One thought on “Architecture Basics: Performance

  1. Some topics:
    Expected Performance and Measured Performance. Audit and Adapt.
    Performance in a sustained environment: What do we expect in a “normal situation” -> Giving informations on the system’s limits
    Performance at startup and during peak: One of most difficult to measure. Generaly you know that when you hit the peak…
    Network Performance: Throughput, yes but not only 😉 Latency/Jitter/Resiliency
    Method to add “fuel” in the system: Adding resources and hoping performance will be better ?
    And so on…
    Will be happy to have a brainstorm session on this.


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: