22 December 2020

End-2-End test environments, a dead End road

End-to-End test environments: Environments that are set-up to test the integration. But with the rise of Distributed Architecture, independent DevOps teams and automated CI/CD, the End-to-End test environment is under pressure. They are unstable, not to be trusted and difficult to maintain. Why is the use of End-to-End test environments a dead end and what are the alternatives?

As you might have expected, I’m not a big fan of End-to-End test environments. In recent years I have developed a love-hate relationship with these environments. The discussions and conversations about it were frustrating in the beginning, but gradually I started to enjoy sparring about it with others. Personally, I don’t see the value of these environments, especially when you compare that to all the energy that goes into them. I don’t want to immediately say that End-to-End test environments are never necessary, but I have not yet encountered a problem where an End-to-End test environment is the best solution.

Definition

With an End-to-End test environment we mean an environment in which teams deploy their deliverables in order to be able to test them against the deliverables of other teams, with the aim of integrated testing. So it’s not an environment that a team sets up to test the integration of all their own software, or a front-end that is tested in collaboration with a backend. We are talking about environments specifically intended to test the integration between different teams. No “Test Doubles” (Stubs, Mocks, etc.)(1), but software that will also run in production. An End-to-End test environment is trying to hide under different names such as:

Chain environment
User Acceptance Environment
Broad-stack Environment
Theory (In Theory it works, in production it doesn’t)
Acceptance Environment
Pre-Production Environment
Enterprise-wide Integration Test Environment
Full-stack Environment
Production-like Environment

Cons

There are a number of aspects of End-to-End test environments that, to put it nicely; are not ideal.

Test-Data

You need data to be able to test. Since we want to test against systems from other teams, those systems also need data. So we have to make sure that the test data is aligned. What happens when data is consumed, how do you ensure that the initial situation is the same in the next test rounds? Or what happens if another team consumes the data on which testing is based? In short, you are dependent on the data that others have in their system.

Production-like

The semantics of the data and the software deployed are the same as those of production. At least, it should be if it’s ‘Production-like’. As far as the semantics of the data are concerned, that seems feasible to me. In any case, it will be better than a Test Double (Stub, Mock etc.), but the version of the deployments will sometimes differ from that of production. Especially in a (microservice) architecture where several independent teams work together on separate services that communicate with each other. Teams are constantly deploying new versions in the End-to-End test environments for testing purpose. Until they also deploy that version in production, they made the End-to-End test look less like production for all the other teams. The more teams, the bigger the problem. Everyone does their utterly best to make the environment as unrelated to production as possible.

Instable tests

End-to-End test environments ’live’, they persist and are constantly changing. Data is mutated, deleted and added, so that the starting position for the next test is different. The environment has a ‘state’ and it changes constantly. This is in contrast to Unit testing and Integration testing, which should always have the same starting state. Due to the changes in the environment, the tests that are performed on it become unstable.

Long feedback time

When you are developing an application you want to know quickly whether what you are building is good. If you first have to deploy everything in an environment for testing, this increases the feedback time. If it is possible to test the same in a lower part of the test pyramid, for example Unit testing, that will give you faster feedback.

Longer ’time to market’

This argument always works well with Product Owners. Unstable testing, longer feedback time and test dependencies with other teams cause a longer time between the idea and the use of the solution.

Costs

An End-to-End test environment costs money. Might be worth mentioning again: The virtual machines that everything runs in don’t run on virtual iron. Think of costs for the iron, Cloud, license and maintenance. In addition, all of the above points also contribute to higher costs.

Fake pros

There seem to be some advantages to using End-to-End test environments. These can usually be divided into the following topics:

Quality of Test data

In general, the test data in an End-to-End test environment is more similar to that in production than that from Test Doubles (1). Think of the semantic correctness of data like a value in a field that you get back and didn’t provide in your Enum. This does not mean that you can best test this in an End-to-End test environment. It would be better to:

create strict and clear specifications and contracts between systems
logic that depends on output from other systems, can be tested in Unit tests.
if the data in an End-to-End test environment is highly valued, extract it to a Stub.
“build for failure”; Either way, you’re going to get unexpected answers from other systems. Make sure this is handled in a user-friendly
manner rather than an HTTP 500 with a stack trace.

Trust

“I want to see it work before we go to production” is a common argument. Click through it before it goes to production and watch it work. Actually, this is an indication that there is no confidence in the automatic tests. Or, that there are no automatic integration tests and there is a love for manual testing. What if a manual test is forgotten?

Therefore, build trust by testing automatically.
In case of a bug, first make a failing test and then fix the bug so that the bug won’t come back.
Invest in automation
Show that no new problems are found in the End-to-End test environment and that it contributes nothing.

Maturity

The answers given by the used services are still in accordance with the contract, but something goes wrong and you have a production incident. With a bit of luck this could have been prevented by a manual test in the End-to-End test environment, but that is symptomatic relief. Address the root of the problem.

Strict and clear contracts so that there are fewer interpretation differences.
Be ready for the unexpected and make sure that the user notices as little as possible of unexpected behavior.
Apply Consumer Driven Contract testing to test expectations and agreements. (See later in this article)

Compensation

Not every team is able (for whatever reason) to deliver the quality they would like to deliver. You can verify whether this causes problems for your team in an End-to-End test environment before they bring a new version to production. This only keeps the problem alive, the quality is compensated with extra testing in an End-to-End test environment by another team. It would be better to:

Leave the responsibility where it belongs and solve it there.
Make sure you are prepared for errors and make sure they are handled properly, an HTTP 500 with stack trace is not. If you still want to do a verification before they go to production, make sure you have automated expectations testing. This ensures that what you expect to get back is in any case correct, without having to test it in an End-to-End test environment.

Tips

CDC

Consumer Driven Contract tests (2) are, as the name suggests, customer-driven contract tests. The idea is simple, when using a service from another team you always have a producing team and one or more consuming team. Traditionally, the producing team creates a specification for their service that the consuming team can consume. CDC provides contract testing between services by testing both request and response expectations in isolation to ensure agreement on the documented contract. Practically, this means that the consuming team can offer the expectations they have in the form of tests to the producing team. The producing team can then include those tests when building the project. This has a number of advantages:

The consuming team can guarantee that the service continues to work according to the contract.
The producing team can test whether they break the expectations of other teams when making adjustments.
Expectations can be used as Stub by the consuming team so that they can test it against their own application, in isolation.

But CDC has even more advantages; It can be used for the design of specifications. If the consuming team knows exactly what they need, why not offer the specification in the form of a test to the producing team. The producing team then only has to write the implementation that meets the expectations. Or, the producing team can clean up all endpoints for which there are no tests, because apparently nobody expects an answer to that. Projects that can help you at CDC are Spring Cloud Contract, Pact, but also Postman/Newman.

Lower the risks

After you’ve tested everything extensively, without an End-to-End test environment of course :), it’s time to go to production with that new version. No matter how well you test, something can always go wrong. You can’t always prevent things going wrong, but you can control how badly things go wrong! There’s no satisfaction to achieve when doing heroic actions in a crisis which your team caused because of a new release. Real heroes prevent crisis from happening.

Separate deploy from release

By deploying we mean placing a new version and by releasing we mean making the new functionality available. First you deploy the latest version in production, but the new functionality has not yet been activated. First you see how the new version behaves, if everything goes well you activate the new functionality. For example through a ‘feature toggle’.

Release fast and often

The risk of a small adjustment is smaller than many small adjustments at once. If the release process hurts, then you need to do it more often and make sure it doesn’t hurt anymore.

Canary release

Don’t immediately send all traffic to the new version, but first see how the new version behaves with a smaller part of the traffic before rolling it out to all users. (3)

Monitoring

To know what is happening in production, you have to monitor closely. You want to know, among other things:

how many errors occur
how many requests come in
the response codes and the numbers On the most important statistics you probably also want to have push notifications to mail/chat/phone when a certain threshold is reached.

Blue/Green deployments

With Blue/Green deployments (4) you deploy a new release (green) next to your current production release (blue). Then you ensure that, for example, 5% of the requests go to the new release. You monitor this and if you are satisfied you increase the amount of requests to the new release until everything goes to the new version. The old release can then be removed and green becomes the next blue. This has the advantage that you are able to roll-back quickly and that the impact of problems is kept to a minimal by not immediately sending 100% to the new version.

Conclusion

Stopping End-to-End test environments is certainly not always easy. It involves trial and error, but in the end you go a lot faster. Every step away from End-to-End test environments is worth celebrating. More info:

Roy Braam

Roy Braam is director and founder of OpenValue Amsterdam. He loves Java, DevOps, and everything that comes with developing good solutions. Besides developing, as an architect, he loves software architecture and solving 'the bigger' puzzle.