Canary testing is a method of evaluating the quality of a new version of an application by making it available to a small group of users. The principle involves gradually rolling out new features to a limited number of consumers, while the rest continue to use the previous version of the software until the changes are fully accepted. The modified code is deployed in real-time, and the end users participating in the testing are not notified about it.

🔴 Note! You may also encounter other terms for this type of testing:

Canary deployment
Incremental, Phased, Staged rollout
Canary release

Why canary testing?

The origin of the term of canary testing is tied to the work of coal miners. They used canaries to detect excessive carbon monoxide accumulation in the mines. These birds are more sensitive to toxic gases than humans, and they would die quickly, signaling to the miners that they needed to ascend to the surface.

In the software development context, users who first test new features of a digital solution in production environments act as the canaries in a coal mine.

Canary testing is popular among software developers because, during the process, code changes affect only a small group of users. This minimizes the potential negative impact on the global user experience. It also gives the development team ample time to fix any defects before the software is made available to a wider audience.

How Does Canary Deployment Work?

This process is quite simple and does not require significant resources. As a result, such deployment can be used during every new release that potentially contains critical bugs and poses risks to the broader public. Here’s a look at the standard algorithm for performing this type of testing:

Canary testing occurs in several stages:

#1: Selecting the Canary Group

At this stage, the development team and software testers choose a subgroup of users who will participate in the testing.

The principle of how canary deployment works

It’s important to strike a balance here. The subgroup should be large enough to ensure the reliability of the results, but small enough to minimize risks. Several options are possible:

1% to 5% of the entire user base. This option is most commonly used by testers. It allows teams to monitor the behavior of the new release in a real-world environment, without exposing the majority of users to potential issues.
0.1% to 1% of real users. This approach may be used for particularly large releases. If no critical defects are found, the test group gradually expands.
B testing. This is intensive testing of an almost finished version of an app before the final software release. It helps identify as many errors as possible. B testing involves a limited number of users, often selected from within the team.

#2: Setting Up the Testing Environment

This stage involves creating an environment that runs parallel to the real environment but does not include real users. The new version of the software that needs to be tested will be deployed in this environment.

Then, using a load balancer, traffic will be redistributed in such a way that only the selected small group will interact with the new release.

Canary Testing Process and Metrics Evaluation

Once the test environment is set up, users begin interacting with the updated features of the web or mobile apps. At this stage, it is crucial to closely monitor all system metrics:

error rates;
response times;
CPU and memory usage;
latency, etc.

If any metric rises or falls to an unacceptable level, the canary test is stopped. Users are redirected to the previous version, and the new feature is sent back for further development.

To conduct such tests, use feature flags.

What is feature flag in the canary test?

This is a software development method that allows developers to enable or disable certain features of an application without deploying new code. Feature flags make it possible to grant access to features for different groups of users.

In practice, the canary test flag works like this: If the flag is ON a specific part of the code is executed, and the canary group uses the new feature. If a defect is found, the flag is immediately turned OFF, and the code is bypassed.

This scheme describes how the new functionality handles up to it will not work until the error is fixed, error detected by canary tests.

Evaluation of Testing Results

At this stage, there are several possible outcomes:

The results of the canary testing are satisfactory, meaning all metrics are at the desired level. In this case, a release decision is made regarding the feasibility of full-scale deployment.
The development team has doubts about the stability of the new release. To gather more comprehensive data, more users are involved in the canary testing. Then, detailed monitoring of the impact of software changes on user experience takes place. Once the desired results are achieved, the final new version is deployed to the production environment, and the testing environment is deactivated.

Here, you can watch a video guide on canary releases from an industry expert: What is Canary deployment?

Why Canary Testing is Effective?

Staged rollout is an effective approach to the development and deployment of software products. This is explained by several advantages of such releases:

Minimizing Risks. Releasing the new version of the application to a limited number of users allows the development team to fix errors before they affect the global user base.
Budget Savings. This type of testing guarantees fast feedback. As a result, defects are fixed in the early stages of the SDLC (Software Development Life Cycle), which reduces the cost of fixing them.
Total Control Over Progress. Incremental rollout involves gradually increasing the number of users participating in the testing. This allows software developers to monitor system performance at each stage of development and track consumer feedback.
Confidence in the Final Product Quality. Regular canary tests give teams confidence that the final version of the product, deployed to all users, will not contain significant bugs or errors.
Simplicity of Implementation and Interpretation. Staged rollouts do not require complex infrastructure or maintenance, do not lead to system downtime, and all unsuccessful versions can be easily rolled back to previous ones. Testers also have clear metrics to indicate whether tests have been successful or not.

Canary releases are an excellent method of feature management, requiring minimal investment and significantly reducing project risks. This type of testing allows you to implement code changes without affecting the global target audience.
Mykhailo Poliarush
CEO Testomat.io

How to Determine When Canary Testing Makes Sense?

So, in the previous section, we were able to confirm that staged rollout can bring a lot of benefits to the QA team. However, is it always reasonable to launch such tests? Let’s break down how to determine if implementing canary testing will be justified👇

Define the nature of codebase changes. Such tests are most suitable for checking software after high-risk changes, the introduction of experimental features, and fixes that may impact system performance.
Assess the potential impact on the end user. Run canary tests when the application is being developed for a wide audience and the test group minimizes negative feedback from all users. These tests are also necessary for digital solutions in industries where the cost of failure can be too high, such as in healthcare or finance.
Analyze the probability of failures. If regressions or bugs occurred in similar rollouts, conduct canary testing to reduce the risk of their recurrence.
Evaluate the team and infrastructure readiness. A phased rollout can be initiated if everything is ready: automation tools are in place, CI\CD pipelines are set up, and there are resources for monitoring, analyzing, and responding to test results.
Consider the project’s development approach. Canary testing type of testing aligns with the approach of teams that prefer gradual deployment, meaning the incremental introduction of changes to the codebase.

— Have you confirmed that your project requires an incremental rollout?

👉 Then, you must consider the other side of this testing process — the challenges you may face.

Disadvantages of Incremental Rollout

Along with its many benefits for teams and consumers, this type of testing has some limitations. To optimize QA processes on a digital project, it is important to familiarize yourself with these before the test launch.

→ Impact on Users. Although canary tests only affect a small percentage of real users, they still influence consumer opinions about the product.

→ Susceptibility to Errors Due to Human Factors. Despite having clear criteria for evaluating test results, the analysis process can be quite time-consuming.

QA engineers are actively working on solving this issue by implementing various tools to automate canary analysis. One such example is the Kayenta tool from Google and Netflix.

Automated canary analysis is an essential part of the production deployment process at Netflix, and we are excited to release Kayenta. Our partnership with Google on Kayenta has yielded a flexible architecture that helps perform automated canary analysis on a wide range of deployment scenarios.
Greg Burrell, Senior Reliability Engineer at Netflix

→ Limited Capabilities of This Testing Type. This QA process is not suitable for instance for testing standalone desktop applications, regardless of the device type. Complications may arise if the selected users have different versions of the application or devices.

Canary Release VS Other Deployment Models

Incremental rollout is not the only model used by teams when releasing new versions of digital solutions. Despite having the same overall goal — minimizing risks — they differ in several ways. Let’s review the main ones.

Blue/Green Deployment

This software release management strategy involves two environments. As the name of the model suggests, they are called blue and green. The first is active and serves users, while the second is designed for the new version of the solution.

How the Model of Canary deployment Works:

The active blue environment handles all incoming traffic.
The green environment is where the new version of the software is deployed. This environment does not receive any traffic initially.
In the green environment, testing of the new version takes place. This may include load testing, performance testing, etc.
Once the new version’s stability is confirmed, traffic is switched between environments — from blue to green. This can be done at the DNS server level or through a load balancer.
After the traffic switch, the green environment becomes active, and the blue environment is taken out of service.
In case any issues are detected, traffic can be quickly redirected back to the blue environment. The rollback process is simple and involves minimal downtime.

Progressive, Rolling Deployment

This is another approach to deployment that helps minimize the risk of errors and downtime. It involves gradually deploying new versions of the product in the production environment by replacing components of the old version with those of the new one.

How the Rolling Model Works

At the beginning of a rolling deployment, all system components operate under the old version of the application.
The new version of the digital solution is deployed on a small subset of servers or containers in the production environment.
Traffic is distributed between all components, meaning that users are served by servers running both the old and new versions.
The number of components involved in deploying the new version gradually increases. Initially, this may be no more than 10%, but over time it can reach 50% or more. Each stage is accompanied by careful monitoring of system metrics and user feedback.
If errors occur at any point, a rollback to the previous version can be performed to fix the defect. After that, deployment can resume.
Once all components have been updated, the deployment is considered complete.

A/B Testing Deployment

This strategy, alongside canary testing, is used to test different versions of the application by real users. A/B Testing goal is not only to evaluate the stability of the new release but also to compare different versions of the application to determine which one performs better. A/B testing is even more marketing tool which indicates product readiness.

How the A/B Testing Model Works

The team creates two or more versions of the app. For example, these could be different variations of a feature, design, interface, etc. Each of these versions is given a name, such as A and B, or A, B, and C.
When using the app, users are split between versions according to the approved strategy. For example, 50% of users will use version A, and another 50% will use version B.
During user interaction, the team tracks certain metrics, such as conversion rate, number of clicks, retention, or audience engagement.
After gathering sufficient data, the team needs to analyze which version works better.
If the result is clear, the winning version is deployed for all users. If the winner is not obvious, A/B testing is adjusted and repeated.

The common distinguishing feature of these three deployment models is that they require extensive IT infrastructure to deploy both the new and old versions of the digital product. This is not necessary with a canary test release.

Basic Deployment Strategy

This is the simplest approach to deployment. It is most often used for simple apps, where the main goal is a new release with minimal operational costs. The strategy offers less complex rollback mechanisms compared to more advanced strategies.

The Basic Deployment Strategy works by deploying the new version of the software in the production environment. It replaces the old version all at once, meaning all users get access to the updated version simultaneously.

User Acceptance Testing (UAT)

This is the final stage of the Software Release Life Cycle (SRLC). It involves the verification of the final version of the application by end users and stakeholders. If the digital product works as expected in real-world scenarios, it is considered ready for deployment in the production environment.

Principle of Conducting UAT

Testing is carried out by end users and stakeholders. During the testing, the product’s compliance with functional and business requirements is checked.
Real-world scenarios and a realistic testing environment, close to the production environment, are used for testing.
Testing efforts should not depend on QA or development teams. This helps ensure the most unbiased evaluation.
The focus of the review is on the entire system workflow, not its individual components. This ensures the seamless operation of the entire application.
After the testing process is complete, the testing and development teams are promptly notified about any found issues for immediate resolution.

For clarity, we present a comparative table of different deployment models:

	Canary Testing	A/B Testing	User Acceptance Testing (UAT)	Blue-Green Deployment	Rolling Deployment
Goal	Reducing risk when introducing new features	Determining the product version that works better	Checking the app’s compliance with requirements before release	Smooth transition from the old version of a digital solution to the new one	Gradual replacement of one product version with another
Focus	Stability and performance of the app during gradual deployment	User preferences	Functionality and usability	No downtime or data loss during deployment	Stability and performance of the system during deployment
Environment	Production environment	Production environment	Pre-production or UAT environment	Production environment	Production environment
User Involvement	Indirect – users are unaware of testing	Direct – users actively interact with test versions	Direct – users test functionality	Indirect – users are unaware of testing	Indirect – users are unaware of testing
Rollback Complexity	Quick rollback if errors are detected	Rollback depends on which version of the software is saved	Feedback is gathered – immediate rollback is not possible	Can quickly revert to the previous version	Gradual stoppage of deployment and rollback of changes is allowed

The Sense of Agile Increments Development & C\D

Canary deployment is closely related to Agile development methodologies and continuous delivery (CD). Here are the key intersections of these processes:

Iterative Development. Agile teams emphasize making gradual changes to the product, and canary testing fully supports this idea. It allows deploying small components of the application to ensure that each one works as intended.
The Importance of Feedback. In Agile methodology, quick feedback from end users and stakeholders is crucial. Canary tests provide real-time feedback.
Automation of the Deployment Process. CD focuses on automating the deployment process. This ensures quick and reliable delivery of new features. Canary testing easily integrates into the CI\CD pipeline.
Frequent Deployments. Continuous delivery involves very frequent deployments. A staged rollout ensures that they occur without significant failures and have no impact on users.
Collaboration Between Teams. Canary testing aligns with this Agile principle, fostering close interaction between teams on the project.

Considering all of the above, it is clear that incremental rollout aligns with the goals and methods of Agile development. In fact, this testing approach actively supports the practical application of its core principles.

Automation of Canary Deployment to Optimize the Process

The deployment model under consideration involves real users in canary testing. However, to optimize the process, it is advisable to automate many of its aspects using specialized tools.

Here are some of them:

Kubernetes. This platform allows for automating the gradual rollout of updates. Its capabilities include configuring deployment policies, as well as setting up replica sets.
Spinnaker. A continuous delivery service that easily integrates with various cloud providers and deployment systems.
AWS CodeDeploy. A tool that can be configured for incremental rollouts and allows automation of the deployment process.
Service Meshes (Istio, Linkerd, etc.). These tools help distribute traffic between different versions of the software product.

Automating canary releases makes them sequential, fast, and secure. It minimizes downtime, prevents human errors, and guarantees continuous feedback.

Final Thoughts on Canary Testing

Canary testing is used to minimize the risks that arise when deploying a new version of the software to all users simultaneously. It is highly effective and plays an important role in the software development life cycle.

— Do you have any questions? Contact the experts at testomat.io We will be happy to answer each of them 😀

Canary Testing: The Key to Safe Software Releases

Why canary testing?