Managed Services: The benefits, and a review of pitfalls for the unwary. Part 3 of 3 – Considerations Before Adoption

Consider the pros and cons in the same way that you would any other service cost. Typically a decision point is driven by observation. An existing service may be end of life so the metric is replacement cost through either build, buy or gluing a tessellated set of applications and services together. Either way there will be a capex hit so presenting an opex-based managed service may be preferable from an accounting perspective. Consequently the managed service cost needs to be assessed against the:

- Cost of replacement versus another on-premise solution
- Cost of replacement versus an inhouse build

When the incumbent system is not end-of-life we measure the cost of the managed service versus the maintenance costs of the current system.

Concessions To Consider

Each of the forms of managed service have different challenges so we will consider each in turn.

With the IaaS model traditionally you save on capex, time to provision, support costs and management oversight (aka aggravation). However you will pay more for high utilisation usage because the provider will add a margin to their costs. For the more recent offerings like collocation and FPGAs the positives are if anything amplified due to either their scarcity or the amount you will utilise them. Implementing the technology yourself could be a periodic one off hit that doesn't justify a capex purchase or it could be that the supporting build is extensive making it uneconomical, or simply that you don't have time or inclination to negotiate contracts, purchase hardware and hire support engineers.

Where the PaaS virtualised instance category is concerned, being abstracted away from the underlying hardware carries some risk. Your provider may make changes to the underlying configuration of their instance. The more abstracted it becomes the greater the likelihood that more low level requirements you have will break. Consider the case of a firm that made extensive use of lightweight processes but found that suddenly the image they had previously deployed successfully, and was still running, could not be redeployed. The reason was that the vendor had inadvertently changed the configuration of the underlying supporting service layer preventing one of the libraries they relied on from running. It would still run on a fat instance, but not on the lightweight instance. Note that volume PaaS providers will not necessarily give advance warning of changes to underlying layers of their stack. Caveat emptor. Consider also that you cannot necessarily choose your chipset, so if you make use of some manufacturer specific libraries you may not be able to leverage them. Worse still if you are running virtualised which is usually the case, the deployments may be non-deterministic. While some historical issues such as the Intel MKL not running on AMD have been (mostly) resolved there remains a risk in a heavily optimised codebase.

Independent managed services have the drawback that you may be accepting layered third party vendor dependencies. You may grandfather in exposure to a vendor you know nothing about, be it a library vendor or cloud provider. You may need to carry out increased vendor due diligence to ensure you know where your data is domiciled. You may think you have a multi cloud policy but that gets difficult to enforce. In the latter case consider the case of a well-known SaaS CRM provider that uses a particular cloud provider. You now have exposure to that cloud provider and with the crown jewels of your client information.

In the case of multifunctional managed technology services if you are looking at using such services you will have less key person risk and permanent staff can focussed on strategy, internal client advisory, and elements of the support and planning that require institutional knowledge. Furthermore if something goes wrong, as with the case of independent managed services, you only have to phone one firm to get the issue addressed, there's no passing the buck back and forth between different vendors, each blaming the other. The downside is the inverse of the upside. You have a lot of exposure to one firm so they better know what they're doing. Furthermore, the more functions you are unable to transfer to the supporting firm, the more diluted the service becomes.

Federated business services have similar benefits as multifunctional managed technology services but moving up the service stack to include business services. When you review a firm offering federated services the win is that you have a one-stop shop - if you have a problem, again, there's only one phone number you need, you might even have a named individual looking after you, and you don't end up getting batted between different vendors each blaming the other one. Consider however that the same drawbacks are also inherited. Your own support structure shrinks massively so you can focus on alpha creation, but you are wholly dependent on the providers SLOs and SLAs.

SLAs, SLSs, SLOs and SLIs

Service Level Agreements (SLAs), Service Level Standards (SLSs), Service Level Objectives (SLOs) and Service Level Indicators (SLIs) all aim for the same thing – they are measures to assess performance. Ironically there tends to be some dispute around the web as to the precise definition of each of these terms so I'm going to propose some below that we can argue about later.

An SLA is a contractual agreement with a third party that governs the levels of service that can be expected based on a number of measures. The set of measures are the SLSs. An SLO is an individual goal in the SLSs. If a service supplier fails to deliver to a level above the threshold stipulated in the SLA then they become liable for whatever remuneration or other actions are stipulated by the SLA.

An SLO is an individual goal that a vendor will be measured against. If they fail to achieve an SLO then they suffer a penalty as documented in the SLA. An SLI is an indicator that measures the vendor’s achievement against an SLO. It does not trigger an external penalty but it acts as a trigger to review their levels of provision and potentially focus on resilience over features for example. It is the warning level that gives a vendor time to review their performance trends in advance of being penalised, hopefully heading off a charge.

Consider the case of an ISP that externally targets an SLO of 3 x 9s uptime, Their SLI may be showing 5 x 9s which is comfortably within their safe target zone. If however their performance drifts out to 4 x 9s they may consider remedial action. As a consumer of managed services you are not likely to be quoted SLIs since they are internal measures.

An acceptable SLA is not a cast iron guarantee of availability but supported by a demonstrable track record of delivery and empirical evidence of achievement it contributes towards the justification to adopt. It is worth noting that no vendor will remunerate you for lost business resulting from the loss of their service; the most you can hope to receive is a percentage of their fee which is why you must consider other aspects of their capabilities and track record.

Vendor Risk Assessments

Always undertake a vendor risk assessment. A considered review of the provider is necessary across your organisation to examine the claims of the provider and their resilience as an organisation. Are they well-funded? Do they have other clients? Can they demonstrate expertise in their area? Will they offer a viable Proof of Concept (POC) engagement? Implementing a risk assessment procedure will enable you to assess different vendors accurately and dispassionately. Do not rely on the experience of your newest senior trader who is going to revolutionise your business by bringing in his friend’s firm that will solve all your problems. Having a defined process will take away bias and you will be able to achieve comparative benchmarking.

A third party provider can't necessarily guarantee their tech won't fail any more than you can guarantee it if you adopt the same technologies. But they can add in layers of failover support that you can't necessarily do yourself due to a lack of specialist talent, or the economies of scale that come with a mutualised support model.

Evaluation

Your evaluation should focus on the provider’s track record and capabilities, versus your requirements. So look at their client base and the breadth of their capabilities. Try to run a POC against their service suite. Do you have to take all the services or can you start with a subset and work your way up as you gain confidence in them, or is the most effective adoption model “all in” from day one? Your POC should be an accurate reflection of the actual use case. In some cases a partial adoption is going to be harder work than just going all in because you may need to do additional interfacing if you only take part of a workflow.

When building a business case for any of these categories be clear about the level of usage that is appropriate based on your projected use case. If you are going all-in on cloud in an IaaS or PaaS scenario your costs could conceivably go up if your systems typically have a low level of utilisation. Remember that cost is not the only metric to consider, and may not be the primary one for you. If you are taking on a suite of business processes can they integrate with other third parties (administrators, brokers, exchanges) or other systems you want to keep? When kicking off a new line of business there may be no incumbent system to measure against so other factors in addition to functionality or cost come into play. In particular time to market is a pertinent metric. You should generally expect time to market for a managed service to be faster than a “build” or “buy and integrate” solution. When measuring against an-inhouse build note that it takes a long time to arrange contracts, get hardware delivered, racked, configured and in some cases hire support staff so be realistic when assessing your own internal delivery capabilities.

There are a number of ways to justify a managed service, but fundamentally you can measure the adoption argument against a single metric - is it something that generates revenue for your line of business. That may be because the service is cheaper, quicker to market, or offers you something that you cannot efficiently or effectively build yourself.

In all cases a cogent vendor risk assessment process is key to making a balanced decision.

Replacing Inhouse Solutions

As in most cases, assuming the problem domain is well understood, a greenfield implementation will often be easier than a brownfield one. On occasion it is possible to swap out an existing component or application in favour of a third party service but our experience of replacing applications is that the effort is typically non-trivial. However in most cases the experience will be simpler than replacing an application. You still have the effort of plumbing in to existing up and downstream dataflows but it will require less effort to implement because you do not need to worry about self-hosting.

Containerised applications facilitate portability so you may be able to collocate critical internal data sources alongside the managed service since most of them are also cloud-deployed. It also helps if you adopt the same technology support services on cloud as on premise such as cybersecurity software, databases, caches, messaging platforms, etc.

The timing of the service adoption can play a part in its acceptance. Replacing when your existing application or service reaches EOL or a renewal point means that you will either face an imminent cost, implementation effort or both. Consequently any additional cost or service adoption effort can be measured against cost of replacement.

Generally a POC or parallel run period is preferable. That way you are not forced into a cliff edge adoption with a pre-ordained go-live date. In the case where a contract renewal for an existing service is concerned this obviously means forward planning as you would with any application replacement.

Commercial viability

I've left the most important criteria until last. You need guardrails around your managed service environments to ensure the costs don't spiral out of control - like when a developer spins up a test cluster to replicate production then forgets to tear it down again (they will), or when someone decides it would be a great idea to pull all the intermediate datasets as well as the results set back on-premise when the scaled out cloud calc is complete (there is always one and they will not know that you pay egress charges on download). More importantly still though you need to be confident that the total cost of the managed service stacks up versus doing it yourself. For example a cloud build may cost more than an on-premise build but you deem the flexibility to horizontally scale the service to be worth the additional cost. However, one/three/five years later you are going to be held up against those costs and will need to defend your decision. So you need guardrails plus a comfortable degree of confidence in your cost projections. I know of one CTO of a large fund who does not use cloud purely on commercial grounds. It's not always the answer. So, the rules are as per wall-hanging a picture, measure once, measure twice then measure it again.

Conclusion

To summarise, we have considered different scenarios for adoption based on differing use cases. In general however there are a number of ground rules such as always carrying out a vendor risk assessment that will provide guard rails in the vendor selection and onboarding process. Where possible, try before you buy. This will enable you to select a vendor based on your use case not theirs and be mindful of the benefit you are seeking; is it time to market, cost reduction or both? Also be aware that whatever choice you make you will be swapping some risks around, so identify which you can accept and which you cannot.

Picture: This picture is the DALL.E 2 interpretation of "Safeguarding managed technology and commoditised financial services by service level agreements, contracts, risk assessments and proof of concepts". To be fair it makes more sense than a lot of SLAs and contracts I've seen.