# Revitalizing a 25-Year-Old Project with DORA

Imagine inheriting a project burdened with technical debt and a team exhausted by firefighting. This was my reality in 2020 when I took over a 25-year-old project labeled as "stuck" by top management. Overwhelmed by years of mergers, losses of domain experts, and outdated tools, even basic tasks had become a challenge.

However, my team and I managed to turn things around, and adopting DORA metrics was pivotal in this journey.

In this article, I’ll share how these metrics helped us detect bottlenecks, redesign our tools, and streamline processes, transforming a project once thought unsalvageable into a well-oiled machine.

# What Are DORA Metrics?

Before diving into the details of our story, let's quickly review DORA metrics and why they matter.

The [DevOps Research and Assessment (DORA)](https://dora.dev/research/) is a multi-year research that examines thousands of companies to explore the connection between software development practices and business outcomes.

The book *Accelerate: The Science of Lean Software and DevOps* explains this research in-depth and is a must-read for anyone looking to improve software development efficiency.

DORA's research identified a set of [engineering, process, and cultural capabilities](https://dora.dev/devops-capabilities/) closely tied to company performance. Also, it introduced four key metrics to measure software delivery performance, which fall into two categories:

**Throughput Metrics:**

* **Deployment Frequency**: How often the team deploys code to production.
    
* **Lead Time for Changes**: The time it takes for a commit to reach production.
    

**Stability Metrics:**

* **Change Failure Rate**: The percentage of deployments that result in a failure.
    
* **Time to Restore Service**: How long it takes to recover from a failure.
    

According to the DORA research, these metrics reveal underlying issues like poor design, excessive manual work, or bureaucratic obstacles. They provide insights into the overall efficiency of the development process.

### 💡A vital note on metrics

The idea of measuring and improving development efficiency isn’t new.

However, traditional metrics like lines of code or test coverage have often become the subject of jokes rather than true success indicators. For example, linking compensation to lines of code leads to bloated, inefficient software. Similarly, setting test coverage as a target can result in excess tests without meaningful assertions.

DORA metrics are a breath of fresh air because they focus on team performance rather than individual output. They balance speed and stability, helping us avoid the common trap of over-prioritizing one at the expense of the other. Furthermore, DORA metrics show real work, making them more valuable than velocity-based metrics like story points or the number of tickets, which can be "improved" without changing the actual work.

That said, like any metrics, they can be misused, leading to unintended and harmful outcomes. I highly recommend reading the *Accelerate* book and watching Bryan Finster's talk "[How to Misuse DORA DevOps Metrics](https://www.youtube.com/watch?v=0vi7XW15UIg)" before introducing any efficiency metrics to your team.

# **Assessing the State of Our Project**

When we first collected DORA metrics from our CI system and visualized them in Grafana, the results were alarming:

[![DORA Metrics for our project for the first month of collecting them](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824807730/a2fa2d27-a584-43aa-be34-8e3bf722e9fa.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824807730/a2fa2d27-a584-43aa-be34-8e3bf722e9fa.png)

<center><small>DORA Metrics for our project for the first month of collecting them</small></center>

* **Deployment Frequency:** every two days
    
* **Lead Time for Changes:** 10 days
    
* **Change Failure Rate:** 25%
    
* **Time to Restore Service:** 29 hours
    

While deployment frequency had already improved from the monthly "big-bang" releases [when we were optimizing our team processes](https://ordinarytech.blog/empower-an-overwhelmed-team), the other metrics painted a grim picture. A **25% failure rate** and a **29-hour recovery time** meant that one in four releases would break the system and take a full day and night to fix.

Imagine being a developer in this situation. You want to make a slight improvement, but you know it will take **ten days** for the change to reach production, and there’s a **1 in 4 chance** that the release will fail. If it does, you'll spend an entire day and night fixing it, all while juggling your current work. Would you take the risk of introducing additional changes to the code in such conditions? Probably not. Even without knowing the exact numbers, experience tells you it’s better not to.

Even worse, this fear of change compounds over time. Flaws pile up, the codebase becomes harder to understand, and the fear of breaking things escalates. This is a [**vicious cycle**](https://en.wikipedia.org/wiki/Vicious_circle).

But just as vicious cycles exist, so do virtuous ones. The [more often we make changes](https://martinfowler.com/bliki/FrequencyReducesDifficulty.html), the easier they become. This gradually reduces our fear of change and increases our confidence and ability to improve in smaller and smaller steps with lower and lower risk.

Our 25-year-old project had its share of architectural and code quality issues, but DORA metrics revealed another big problem: our delivery pipelines were causing significant overhead and risk with every change. We decided to focus on optimizing the delivery pipeline first, making it easier to clean up the project later.

So, we focused on improving the delivery pipeline and making it easier to troubleshoot deployment failures.

# **Uncovering the Major Delivery Bottleneck**

Our project was structured as a monolith with 13 services, which inherited a delivery pipeline from the original monolith years ago.

Here is how the code went through git branches and CI pipelines:

[![Our delivery pipeline before any changes are applied](https://cdn.hashnode.com/res/hashnode/image/upload/v1719067957410/7f2a900a-9228-4cde-9618-e44c91de70d2.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1719067957410/7f2a900a-9228-4cde-9618-e44c91de70d2.png)

<center><small>Our delivery pipeline before any changes are applied</small></center>

One red flag in our pipeline was the repetition of identical steps. Moreover, our end-to-end (e2e) tests that we ran repeatedly were a known bottleneck.

These tests were comprehensive, running 3,500 tests across four production-like clusters and verifying all the critical business cases. However, they had serious drawbacks:

* **Limited hardware**: Only one e2e-test pipeline could run at a time.
    
* **Unstable execution times**: Tests could take anywhere from 30 to 90 minutes.
    
* **Flakiness**: Failure rates averaged 25%, spiking to 50% during bad periods.
    
* **Debugging difficulties**: There was no way to run or debug tests locally, and according to the metrics we gathered, fixing a flaky test could take up to five days.
    

Our e2e-tests were a constant source of frustration. Developers frequently said, "I finished my changes yesterday, but I’m still waiting for the e2e-tests to pass."

To see how such tests can turn even simple changes into arduous toil, let’s imagine that we need to quickly fix the infamous [Log4j critical vulnerability](https://nvd.nist.gov/vuln/detail/CVE-2021-44228) across our 14 services. Our steps will be the following:

1. **Create 14 pull requests and get 14 “green” e2e-test runs**. The up to 50% failure rate of tests combined with 14 services meant **up to 28 test runs.** Considering the ability to make only one test run at a time, this alone took up to **four workdays**.
    
2. **Repeat the same process for the** `develop` **and** `main` **branches**: this added another **eight days** of testing and waiting.
    
3. **Release all 14 services**: this took another **half day**.
    

Finally, it takes a grueling **12 workdays** to update a dependency and fix a critical vulnerability! Something as simple became an overwhelming ordeal.

# **Streamlining Delivery**

So, we identified two root problems to address: the instability and complexity of our delivery pipeline and the challenges with our e2e tests.

For the delivery pipeline, we removed all the redundant steps, such as creating dedicated release branches, and automated the rest, like preparing release notes and notifying stakeholders.

Our commitment to pipeline optimization has gone so far that we have built a Slack bot integrated with the Jenkins pipeline to manage releases (I’ve detailed this [in a separate article](https://ordinarytech.blog/integrate-jenkins-and-slack)). This bot automates everything from changelog collection to health checks and rollbacks. Additionally, it allowed us to deploy the entire system at once when needed.

[![Release initiation with Slack bot](https://cdn.hashnode.com/res/hashnode/image/upload/v1720599682955/153c4afc-ba67-483b-87a8-b62b16047df5.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1720599682955/153c4afc-ba67-483b-87a8-b62b16047df5.png)

<center><small>Release initiation with Slack bot</small></center>

Regarding the e2e-tests, we did the following:

* **Reducing unnecessary e2e-tests demand**: Instead of running e2e-tests on each service individually, we implemented a pipeline that runs tests and generates release artifacts for all 14 services at once. In cases like updating dependency across all services it reduces test demand by **up to** **14 times**. Additionally, we eliminated the `develop` branch that was practically redundant to us. That helped us avoid running e2e-tests twice on identical code and reduced unnecessary merges.
    
* **Making tests optional**: initially, e2e-tests were running automatically on each branch. We decided to run e2e-tests for feature branches only if the developer explicitly specified that. After three months of monitoring, we confirmed that the stability of the `main` branch was unaffected by this change. Unit tests, which we began prioritizing, proved sufficient for verifying changes before merging.
    
* **Optimizing test execution**: We stabilized test execution time from 30-90 minutes to the stable **25 minutes.**
    

Let’s look at how we would handle the same Log4j vulnerability after applying all these changes:

1. **Create 14 pull requests and get 14 successful builds**: Without the need for e2e-tests on each branch, it takes about **1 hour in total**.
    
2. **Merge the 14 pull requests into the** `main` **branch and get one “green” e2e-tests**: While we've optimized the test execution time, we still face occasional stability issues. As a result, this step may take up to **two e2e-test runs,** resulting in **30-60 minutes**.
    
3. **Deploying all 14 services**: now, with the Slack bot and the ability to deploy all services in a row, this step takes **30 minutes**.
    

Thus, what once took **12 long workdays** now only takes **a couple of hours**. Isn’t that great?

We implemented all described changes throughout December 2022-January 2023. The graphs with 4 DORA-metrics below show clearly that starting from February 2023, the situation has changed dramatically:

[![Deployment frequency. December 2022-December 2023](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600682253/35a36c6c-abc1-406f-bc84-9c978c33d69c.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600682253/35a36c6c-abc1-406f-bc84-9c978c33d69c.png)

<center><small>Deployment frequency by months</small></center>

[![Commit delivery lead time. December 2022-December 2023](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600770145/72312470-38e0-434b-996b-2708bf7a68b4.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600770145/72312470-38e0-434b-996b-2708bf7a68b4.png)

<center><small>Commit delivery lead time by months</small></center>

[![Deployment failure rate. December 2022-December 2023](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600805303/6432c8c7-ea52-4971-9e0d-3b6d1b6d4cef.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600805303/6432c8c7-ea52-4971-9e0d-3b6d1b6d4cef.png)

<center><small>Deployment failure rate by months</small></center>

[![Mean time to recover. December 2022-December 2023](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600849703/850f2d38-55e4-41da-ab8b-52e9c118752b.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1720600849703/850f2d38-55e4-41da-ab8b-52e9c118752b.png)

<center><small>Mean time to recover by months</small></center>

These changes opened the door to continuous improvement and we aimed to make our pipeline even more efficient. Here's what we did:

* **Implemented health checks and alerts**: We added additional checks to detect issues immediately after releases. We scrutinized every deployment failure over the next three months and steadily made our process more robust.
    
* **Enhanced e2e-test debugging**: we invested in the e2e-tests debugging experience and reduced the average time to fix flaky tests **from** **five days to one day**.
    
* **Optimized e2e-test triggers**: We discovered that most of our e2e-test runs generated release artifacts for the monolith. Even when a push to a microservice repository triggered e2e-tests, another push to the monolith often existed, triggering the tests again. So, we limited the e2e-test triggers to just two scenarios: pushes to the **monolith’s** `main` **branch** and **twice-daily scheduled runs**. This simple adjustment additionally reduced demand for e2e-tests without compromising our throughput or overall efficiency.
    

To tell the full story, these improvements significantly boosted our throughput, but this increased the frequency of changes and caused e2e-tests to remain a bottleneck. Because of hardware limitations, we could only run one e2e-test at a time, and the average idle time before starting e2e-tests was about 30 minutes.

So, we reused the hardware set to be retired and built a second test environment, which reduced the wait time for running e2e-tests from 30 minutes to 0 minutes. This persuaded leadership to acquire more hardware, enabling us to run two e2e-test pipelines simultaneously and remove the bottleneck completely.

Another point worth mentioning is that we haven’t been able to eliminate flaky tests — they still crop up occasionally. Our long-term strategy is to reduce reliance on e2e-tests by gradually replacing them with more efficient integration and unit tests.

# The outcome

To illustrate the transformation, here’s a comparison between our initial metrics and where we stand now:

* **Deployment Frequency**: Once every two days (before) → multiple times a day (now)
    
* **Lead Time for Changes**: 10 days (before) → 14 hours (now)
    
* **Change Failure Rate**: 25% (before) → 0.7% (now)
    
* **Time to Restore Service**: 29 hours (before) → 26 minutes (now)
    

[![DORA-metrics for the first month of measurements (April 2024)](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824807730/a2fa2d27-a584-43aa-be34-8e3bf722e9fa.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824807730/a2fa2d27-a584-43aa-be34-8e3bf722e9fa.png)

<center><small>DORA-metrics before the changes applied</small></center>

[![DORA-metrics for March-June 2024](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824356388/3f16c4c4-1da0-48d0-9733-98d31854b7e5.png align="center")](https://cdn.hashnode.com/res/hashnode/image/upload/v1719824356388/3f16c4c4-1da0-48d0-9733-98d31854b7e5.png)

<center><small>DORA-metrics ath the time of writing</small></center>

Beyond the improvement in DORA metrics, I’d like to share my personal observations on how these changes impacted the team and the project as a whole:

* **Team morale** has drastically improved. Without the constant overhead and firefighting, we can make steady improvements while meeting business demands. We still have technical debt from the past 25 years to tackle, but now we have the confidence and momentum to resolve these issues over time.
    
* **Releases no longer require all-hand coordination**. Previously, we had to schedule releases for specific times, and the whole team had to gather together. The necessity to release in the evening, on Friday, or before the holiday was highly stressful. Now, any team member can initiate a release at any convenient time.
    
* **Development throughput** also has increased:
    
    * The time from the first commit on a feature branch to merging into the main branch has dropped from 17 days to less than a day — a clear indicator of improved code maintainability and tool quality.
        
    * Task completion has gone from an average of 15 days to 5 days.
        
* **Emergency fixes** have all but disappeared. Previously, we handled about 6 "expedited" monthly tasks to fix critical issues. These disruptions are so rare and swiftly managed now that we no longer log them in Jira.
    
* We’ve achieved things that seemed impossible before. In just five months, we [resolved 97% of the security debt](https://ordinarytech.blog/caring-for-your-team-user-experience) accumulated over the previous 25 years. During that period, we deployed over **400 releases** without disturbing a single user. This level of throughput and stability was unimaginable before we applied DORA.
    

# Conclusion

The right approach and tools can revive even the most challenging project. I hope this story inspires you to take on technical debt in your projects confidently.

In addition to the *Accelerate* book, here are some resources I found invaluable:

* Brian Finster's talk: [*How to Misuse DORA DevOps Metrics*](https://www.youtube.com/watch?v=0vi7XW15UIg)
    
* *Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation* book by Jez Humble and David Farley
    
* *Continuous Delivery Pipelines: How to Build Better Software Faster* by David Farley
    

Finally, if you're interested, [here’s the GitHub repository](https://github.com/plesk/engineering-efficiency-assets) containing the database structure and Grafana dashboard we implemented to track and visualize our DORA metrics.

---

Life is so beautiful,

Fedor
