4 Questions for Evaluating Experiments

Collaboration & Teams

When you try something new, when to you expect to see results? How do you evaluate an experiment decided on in a retro to know whether your hypothesis was correct?

This week I read a post that described a situation I see too often.

The poster described a retrospective in which the team evaluated their two-week-long experiment using Test Driven Development (TDD, a software development approach). Two of the senior members of the team declared it hadn’t reduced defects released by much but did slow them down.

For the sake of this little story, it doesn’t matter you have an opinion on TDD, or how these folks were thinking about TDD, or even if you know what TDD is.

What struck me was that these senior members were ready to drop the practice after two weeks. That’s not enough time to learn to do TDD, let alone see substantial results. They reached a premature conclusion, and missed out on potential benefits.

You can avoid that trap by asking these four questions when you evaluate your team’s experiments.

1. Have we allowed enough time to learn?

If your team is trying a new practice or approach, consider how long it takes to develop proficiency. I have several friends who are experts in software development. They invested months practicing before TDD felt natural and comfortable. Learning almost always entails a period of reduced productivity before realizing benefits.

This is normal, natural, entirely expected.

4 Questions for Evaluating Experiments. Sometimes, it is obvious. You now within days that your experiment isn't headed in the right direction. Other times, you need to ask these 4 questions.

For example, the second time I make a dish, always goes faster than the first time. By the time I’ve made it a half dozen times, I can make it without looking at the recipe, while talking on the phone. And it takes far less time than my first attempt. We don’t expect a kid to ride a bike with skill and speed the first few times. We mostly know this–at least at home.

But somehow, we forget this at work.

2. Is it reasonable to expect desired outcomes yet?

I do advocate tiny experiments with fast feedback. AND you need to consider the time scale in which you can reasonably expect to see results. I’ve heard people say it can take months to fully realize the benefits of TDD, in terms of a greatly improved code base.

Now, that is a very long feedback loop. Long feedback loops mean big investments–in time and/or money–before you realize the outcome you hoped for. It’s not unusual to have a delay between action and results.

This can happen for a number of reasons. In some cases the effect isn’t visible until some other trigger event. Sometimes small effects accumulate and become visible over time. For example, most people don’t expect immediate results when they start a new gym routine. Changes at the cellular level take time to manifest in greater strength and cardiovascular performance. But it is possible to measure those effects. Things are changing long before the desired outcomes are visible.

3. What do our steering signals tell us?

Especially when realizing benefits take weeks or months, you need steering signals. What might you observe or detect that indicates you’re headed in a better direction? The signal might be small or subtle—different questions, slightly more or less time spent on an activity. It might be internal—are you thinking or feeling differently?

Outcome measures matter, but you need steering signals to help you get there.

4. Is it worth investing more time to see results?

Some decisions are obvious. You know within days that your experiment isn’t headed in the right direction. All your steering signals are shouting “wrong way.” Then its time to cut things short.

With this team, they didn’t invest enough time to develop proficiency, let along evaluate effeciveness.

But some cases aren’t so clear. Then the decision becomes “Should we invest more time?” That might involve reassessing whether your steering signals provide enough information or useful information. You might look at what you’ve learned so far. Or what your subjective experiences is with the practice now, versus when you started.

The team described in the post mentioned above, jumped to a conclusion. From the post it’s pretty clear they didn’t include a the learning curve in their assessment. They considered one aspect of improvement (lower defects) and expected immediate results. From the post, it sounds like they didn’t consider the whole team’s experience.

I have a hunch they would have come to a different conclusion, had they asked these questions in evaluating their experiment.