NELKINDA SOFTWARE CRAFT

Transforming Delivery: How Small Practices Drove Big Engineering Wins

A quick case study about an XP transformation

Author:
Siddhesh Nikude, CEO / COO at Nelkinda Software Craft Pvt Ltd
First Published:
by NNelkinda Software Craft Private Limited
Last Modified:
by Siddhesh Nikude
Approximate reading time:

1 Introduction

This case study highlights the transformation of a remote engineering team that moved from shallow practices and fragmented workflows to a culture rooted in Test-Driven Development (TDD), automation, and continuous improvement.

At the start, the team faced several challenges. Test coverage numbers were high but misleading—most tests lacked meaningful assertions, and infrastructure behavior was often mocked instead of validated. Automation tests weren’t integrated into the CI pipeline and lacked proper failure signals. Silos between frontend, backend, and QA created inefficiencies and frequent ticket reopens. Pairing was inconsistent, and stand-ups had become status rituals rather than alignment tools.

Despite these issues, the team had several strengths that laid the foundation for change. Developers were open to working across silos and already had a solid understanding of DevOps. They had recently transitioned from Java to a modern stack with React and Node.js, and showed commendable commitment to learning while delivering under pressure. Most importantly, the organization supported the shift fully—offering dedicated Slack Time, trusting the team's process, and accepting a short-term dip in productivity for long-term gains.

This case study documents how these conditions enabled a disciplined shift: replacing fake coverage with meaningful tests, integrating automation into delivery pipelines, embracing cross-functional collaboration, and establishing quality as a shared, everyday responsibility.

2 Empowering Stand-ups

Empowering Stand-ups Illustration
Figure 2-1: Empowering Stand-ups

Converted stand-ups from status updates into proactive learning sessions:

A Daily Stand-up meeting is a lot more than mere status updates. In fact, given that Jira board reflects the current status of the team, a well-maintained Jira board reduces the need of status updates drastically. It already provides a clear view of who is working on what, along with the current status of each task. We moved beyond this traditional format by transforming stand-ups into proactive learning and improvement sessions.

Instead of just sharing updates, we treated problems from the previous day as learning opportunities. Each time an issue surfaced—be it a defect, a test failure, or a misunderstanding—we asked ourselves:

Failing is fine—but repeating the same failure is not. This small mindset shift made our stand-ups much more valuable.

Identified patterns causing reopened tickets:

Reopened tickets are expensive. A developer completes a ticket, QA reviews it, and it gets reopened due to unmet expectations. At this point, the developer might already be deep into a new task, and now has to switch context. QA also needs to reprioritize and revalidate. It's costly for both sides.

Instead of just fixing them, we studied them. Reopen discussions were not about blame—they were about pattern discovery. Here's what we found:

Interestingly, we saw more reopens than bugs. This meant the internal feedback loop was working—but needed better structure.

Improved alignment on acceptance criteria and QA-dev collaboration:

When needed, we conducted focused "Three Amigos" sessions involving a developer, QA, and Product Owner. These short discussions helped establish a shared understanding of the story before development began. The acceptance criteria were refined during the call—made more specific, backed with concrete test data, and supplemented with relevant edge cases and alternate flows.

This early collaboration significantly reduced the chances of tickets being reopened. Instead of validating after the fact, the team started aligning on expectations up front.

Traditionally, a QA's effectiveness is measured by the number of bugs found. But in this organization, we shifted QA from a "bug finder" role to a "bug preventer" responsibility. QAs worked proactively with developers to prevent defects before they occurred as that is quite cheaper. The Product Owner was often available in the same call, providing feedback on the validity of certain scenarios and helping the team prioritize what mattered most.

This collaborative model produced two key outcomes:

It's worth noting that this wasn’t intended as a long-term dependency. The goal was to build the developers’ ability to think in test cases. Once they could independently identify most scenarios, the need for detailed Three Amigos sessions diminished naturally.

And what about the QAs? As developers matured in their testing mindset, we helped QAs transition into development roles by pairing them with engineers through pair and mob programming. This shift maintained their impact while growing their technical skills—and helped us build a more cross-functional, future-ready team.

Implemented ticket demos and leveraged AI for test scenarios:

To catch issues early, we introduced a small but powerful habit: developers demoed their changes to QAs before pushing to the pipeline. This informal preview helped verify alignment with designs and scenarios discussed earlier. Any mismatch was corrected on the spot—saving later churn.

In addition, developers had access to tools like ChatGPT to assist with generating edge cases, identifying test gaps, or validating assumptions—essentially using AI as a virtual pairing partner to sharpen quality.

Shifted stand-up focus to ticket-centric, promoting task completion and board hygiene:

Running the stand-up around tickets rather than people improved the board hygiene significantly, and reinforces the idea of team ownership

A simple change of running from right to left, i.e., starting from what is closer to completion and then moving towards what is yet to be started, reinforces the idea of start finishing and stop starting. Finish whatever is closer to completion before picking up a new item.

We also paid attention to board hygiene. Tickets had clear owners, were kept up to date, and blockers were made explicit. The board became a trustworthy reflection of team progress, not just a ritualistic artifact.

You can refer to great tweet regarding stand-up tweet

3 Busting 100% Test Coverage

Busting 100% Test Coverage Illustration
Figure 3-1: Busting 100% Test Coverage

Highlighted shortcomings of mandated coverage metrics:

Code coverage can give developers confidence, but only when it is a natural byproduct of Test-Driven Development (TDD). In our case, the team reported over 90% test coverage—but it came as a top-down mandate from management, not from engineering discipline. What we discovered was alarming:

As a result, even when entire function bodies were commented out, these tests still passed. The team realized quickly that such test coverage offers no real safety net. It was clear: mandating code coverage metrics led to superficial tests. True coverage—the kind that protects you—is best achieved as a side effect of good TDD practices.

Transitioned superficial unit tests to meaningful integration tests:

The codebase had many repository-layer tests that did nothing more than mock the response from the repository. This meant even if the entire logic was removed from the production method, the tests would still pass. These were not tests—they were illusions of safety.

We replaced these with proper integration tests that interacted with an in-memory database. Though slower than mocked unit tests, these integration tests offered real validation and were moved to a separate test suite. We didn’t chase high code coverage. Instead, we aimed for honest, valuable tests. Every test had to fail for the right reason before it passed for the right reason.

Introduced mutation testing to expose test weaknesses:

We introduced mutation testing—a method of evaluating the quality of your test suite by intentionally introducing changes (mutations) in the code under test and checking if the tests catch them. If a test still passes after the code has been mutated, it is a sign of weak or ineffective coverage.

This approach exposed several blind spots. It helped us move from "tests that run" to "tests that protect." It also served as a learning tool, helping the team understand how to write more meaningful and resilient test cases.

Emphasized Test-Driven Development (TDD) as the path to genuine coverage:

We doubled down on Test-Driven Development (TDD) as the primary way to build quality into the system. Writing tests before implementation helped us clearly define expected behavior and catch misunderstandings early. A key aspect of our approach was ensuring that every test initially failed for the right reason—a critical step in the red-green-refactor cycle.

This deliberate practice of making tests fail during the red phase not only strengthened new test cases but also exposed flaws in existing ones. In several cases, tests we expected to fail were passing—revealing false positives. This led to valuable improvements, including:

For existing tests that resembled these patterns, we intentionally broke the code under test to confirm the tests failed for the correct reasons, and then refactored the tests accordingly.

By reinforcing TDD and improving the surrounding toolchain and practices, we built a stronger foundation of trust in our test suite—and eliminated the false sense of security that comes from high, but shallow, coverage.

4 Automation

Manually Automated Illustration
Figure 4-1: Manually Automated

Expanded automation beyond limited local tests:

Initially, the QA team had written some automation scripts—but they covered only a small fraction of the workflows, and even those were executed manually on local machines. As a result, every release required half a day of manual validation across multiple environments. Production deployments happened just once every two weeks, creating a large batch size. If an issue occurred in production, it was often difficult to debug or required a full rollback.

We wanted to reduce the batch size and enable a faster feedback loop. So we identified the most business-critical workflows and prioritized automating them first. Our goal was not just automation for the sake of it—but automation that truly enabled agility, confidence, and early defect detection.

Reduced manual validation time drastically by pipeline integration and parallelization:

Once the automation scripts were ready, we integrated them into the Continuous Integration (CI) pipeline and parallelized their execution. Since each test workflow had its own dedicated test data, there were no data conflicts while running in parallel. This brought down the total test execution time from around 45 minutes to just 15 minutes.

Rather than waiting for validation results, QAs used those 15 minutes to focus on exploratory testing—the one area where human intuition and curiosity still outperform automation. This shifted QA from a repetitive gatekeeper role to a more strategic, value-adding function.

Increased deployment frequency significantly:

The combination of faster tests and reliable automation gave the team greater confidence in their releases. Deployment frequency jumped from once every two weeks to multiple times per day. Smaller, incremental changes meant less risk, easier debugging, and quicker recovery in case of failure. Automation became the backbone of rapid, safe delivery.

Improved clarity of automated test failure reports:

One often overlooked area of automation is reporting. Initially, the automation scripts used deeply nested if-else statements to log test outcomes. These logs were difficult to scan, and failures had to be manually searched for among layers of generic logs.

We replaced these conditional checks with proper assert statements. This not only made the failure messages more targeted and meaningful, but also cleaned up the test code by removing unnecessary nesting. The resulting automation suite was simpler to read, easier to debug, and far more useful when things went wrong.

5 Slack Time

Slack time Illustration
Figure 5-1: Slack Time

Dedicated daily improvement sessions post-stand-up:

You’ve probably heard the phrase: "Sharpen the saw before cutting the tree." That is exactly what our daily slack time was about. Immediately after stand-up, we dedicated 30–60 minutes purely to improving the codebase. These sessions were focused, non-negotiable windows of craftsmanship—unless a critical release was scheduled.

Even when releases did interfere, we used the slack time to retro on what went wrong and made targeted improvements to ensure smoother future releases. This daily rhythm created a powerful feedback loop of learning, refining, and evolving the codebase.

Systematic reduction of ESLint issues:

We integrated ESLint into the CI pipeline and discovered ~2000 linting issues across the frontend and backend. To stop things from getting worse, we introduced a simple but effective guard: a max-lint file tracked the current count of warnings, and the build was set to fail if new warnings were introduced beyond that number.

This file was version-controlled, making progress visible and enforceable. The team used slack time to chip away at the warnings, day by day, and eventually brought them down to under 15 across the entire codebase.

Continuous backlog of improvement tasks:

To maintain momentum, we maintained a curated backlog of small improvement tasks. This gave developers the flexibility to choose what to work on during slack time. Once lint issues were mostly resolved, developers naturally shifted to refactoring difficult areas, writing missing tests, or cleaning up tech debt they had been putting off.

This also gave them a safe space to explore complex refactorings without the usual pressure of delivery deadlines.

Regular merges to production enhancing team confidence:

One of the most impactful practices introduced during Slack Time was the habit of merging improvement changes to production on a daily basis. Initially, some developers were hesitant. They preferred keeping their changes in a long-lived branch until they felt they had "done enough" to justify a merge. But this led to large batch sizes, which increased the risk of conflicts, regressions, and rework.Sometimes, it takes a painful experience—like a messy merge or hard-to-debug issue—for developers to understand the value of small, frequent integrations. And that's okay. Letting them fail in a safe, recoverable way created long-term learning that no process documentation ever could.

As our automation test suite grew stronger, so did the team's confidence. Daily merges soon became the norm, and with it came faster feedback, fewer integration issues, and a much stronger sense of ownership over production code.

6 Pair/Mob Programming

E2E accountability Illustration
Figure 6-1: End-to-End Accountability

Broke functional silos by adopting vertical story slicing:

The team originally worked in silos—backend (Node.js), frontend (React), and QA (TypeScript with Selenium). However, they already shared a basic understanding of DevOps and managed without a dedicated DevOps engineer. This openness to shared responsibilities laid the groundwork for cross-functional collaboration.

Initially, pairing happened only within disciplines: backend developers paired with backend, frontend with frontend. Each story was split horizontally—separating frontend and backend—requiring handoffs and coordination between the two groups. When stories were reopened, both sides often had to jump back in, increasing context-switching and effort.

We aimed to change that by introducing vertical slicing: each story would represent a thin, functional slice that covered just enough backend and frontend to deliver value end-to-end. This reduced handoffs, encouraged shared ownership, and simplified delivery.

Fixed challenges around cross-functional skills:

At first, the team hesitated. Vertical stories felt more complex, especially to those unfamiliar with the other domain's language or tooling. But we pointed out that the current approach—splitting, handing off, re-integrating—wasn’t exactly simple either. Integration and debugging consumed a lot of effort.

We challenged the assumption that learning a new layer of the stack was too difficult. In reality, with pairing and a bit of curiosity, most developers can learn just enough to be productive. The 80/20 rule applies well to programming too—you only need 20% of the knowledge to accomplish 80% of the tasks. And pairing/mobbing can easily facilitate that learning.

To help with slicing, we introduced the SPIDR technique, which helped in breaking down stories into smaller, testable units. Eventually, the team became cross-functional not just across frontend and backend, but QA as well—with developers and testers collaborating on automation scripts using Selenium.

This cross-functionality led to better accountability, faster delivery, and a reduced bus factor.

Implemented effective pair programming and mob programming practices:

Although the team was remote and technically "collaborating," they weren’t truly pairing. Typically, one developer would drive while the other passively observed—an unbalanced dynamic that often led to driver fatigue and reduced effectiveness. As a result, early attempts at pairing faced some resistance.

We introduced structured pair programming with regular role switching, which helped balance cognitive load and engagement. The team started using mob.sh to automate driver-navigator rotations, making the process smoother and more enjoyable. Proper pair programming gradually became a daily habit. As confidence grew, the team also experimented with mob programming—particularly for complex stories or collaborative knowledge-sharing. While mob programming wasn’t used every day yet, the team developed the skill and mindset to switch to it effectively whenever needed.

Optimized remote collaboration via Zoom:

The team initially used Slack Huddles for pairing, but it had limitations. Slack is great for async communication—but not ideal for continuous collaboration. Developers in huddles were often unreachable until they saw the message, and jumping between conversations was cumbersome.

We switched to Zoom for collaboration. The team would join a shared Zoom session with breakout rooms. Pairs would move into their own rooms, and reaching someone became as simple as entering their room and saying hello—just like tapping them on the shoulder in an office. The Product Owner stayed in the main room, available for quick clarifications at any time.

This setup initially felt intense, but within a few days, it started to feel natural. The visual of everyone in breakout rooms, quietly working and occasionally jumping rooms, gave the team a strong sense of shared presence and togetherness—even while working remotely.

7 Miscellaneous Improvements

Informative Workspace: While Jira was effective for managing workflow, it lacked the flexibility needed for brainstorming and planning. To bridge this gap, we introduced an informative workspace—a visual board where work items could be broken into smaller, manageable tasks. These tasks were color-coded based on their status and enriched with snapshots, diagrams, or sketches. This workspace offered immediate context and proved especially useful when switching pairs or transitioning to mob programming, enabling new contributors to onboard quickly without losing context.

Mastering the IDE: Developers often underestimate the power of their IDEs. We ran sessions to help them fully leverage features like automated refactorings, context-aware suggestions, and efficient keyboard shortcuts. This not only improved developer speed but also reduced errors and made code navigation and modification much more seamless.

Smart Use of AI: Blindly copying code from ChatGPT created more problems than it solved. We encouraged strategic usage instead. Developers first used ChatGPT to generate a failing test (Red), then attempted the smallest working implementation (Green), and finally leveraged it for refactoring (Refactor). With prompts based on the Red-Green-Refactor cycle and the Three Laws of TDD, ChatGPT became a helpful assistant—when used with intention and discipline.

Figure 7-1: New Feedback Loop

Feedback Loops: We focused on increasing both the quantity and quality of feedback loops. IDEs became the first and fastest feedback mechanism—giving instant responses as developers typed. ESLint was introduced as a local quality gate, offering quicker insights than SonarQube, which was still integrated late in the CI/CD pipeline. Automation tests were moved to the pipeline. Unit tests and Pair Programming were improved to provide better feedback. Not only the number of loops increased, but also the duration it took for feedback reduced. Unit tests and automation tests got much faster. Production deployment was much frequent. ChatGPT also proved an important part of feedback loop in certain situations

Code Design: Our code improvements weren’t limited to production code. We focused equally on writing clear, maintainable, and well-structured test code. Well-named fixtures and helper functions made tests easier to write and understand. We embraced "squint tests"—tests that are readable even when slightly blurred, emphasizing structure and purpose. ESLint was used to guide design by highlighting deep nesting, long methods, and high cyclomatic complexity. We started with lenient thresholds and gradually tightened them as the team matured.

Support from the Organization: None of this would have been possible without strong support from the organization. XP practices like Pair Programming were not only accepted—they were embedded in the career growth path for developers. While some practices needed a push at first, developers were more willing to embrace them knowing these expectations were supported at all levels. This organizational alignment gave the team the psychological safety and motivation needed to step out of their comfort zones and evolve.

8 Conclusion

At the heart of this transformation lies a simple philosophy: teach them how to fish, not just give them a fish. Sustainable improvement doesn’t come from enforcing rigid processes—it comes from enabling teams to see problems clearly and equipping them with feedback loops to guide their own evolution.

There is no silver bullet. Every team is unique and must discover its own path to better practices. Our role was to help this team notice the friction, understand the cost of inaction, and recognize how seemingly small changes—like better tests, tighter feedback loops, and cross-functional collaboration—can lead to lasting results.

It's a lot like the frog-in-boiling-water analogy: if you don't notice the temperature rising, you'll stay stuck until it's too late. This case study is about learning to notice early, respond thoughtfully, and grow together—one improvement at a time.