Archive for the ‘Metrics’ Category.

A Closer Look at Risk Burndown

I like the idea of the risk burndown chart. Burndown is an effective and satisfying visual indicator of progress and it’s relatively easy to calculate to boot. But does looking at a project’s risks through the lens of a burndown chart make sense?

I see several problems with thinking about risk in this way.

Numbers can be Misleading

The first key to effective risk management is to value accuracy over precision. This means that it’s better to be right in your predictions than it is to be spot on correct. Remember, risk is about assessing your likelihood for project success. It doesn’t matter if you miss your threshold of success by a little or a lot; either way you still fail the project!

Pop quiz. Say there are two risks in your project. There’s a 25% probability that Risk A will become a problem while Risk B only has a 20% probability. For now, assume the impact is the same for both risks. Which risk is a greater threat to the project?

That one’s easy. Risk A is a greater threat because, impacts aside, Risk A has a 5% greater probability of turning into a problem.  Ok.  What if I told you that I made up probabilities based on my gut feelings so I could easily rank risks? Now which risk is a greater threat to the project?

The real question I’m asking you is this. Are you willing to bet the success of your project on those numbers? Because if my best guess, gut feeling probabilities are off by more than 5%, the project could be in serious trouble depending on the risks’ impacts.

I know, I know. That was a trick question. Nobody on your team would make up numbers on one of your software projects. In all fairness, nobody goes out of their way to fabricate false values. Use your logics. If you were any good at guessing the probability of futures events occurring, you would not be reading this post right now. You would be a multi-millionaire, off enjoying your gambling winnings from the ponies. Too much precision gives folks too much confidence in the correctness of your assessment when the reality is that probability and impact are based on best guesses and gut feelings. Probability and impact numbers just make it easier to calculate exposure so risks can be ranked automatically.  Burndown is a fairly precise metric.

Not all Risks are Created Equal

If you are monitoring project risk with a risk burndown chart, how do you know whether the right risks are being reduced? Let’s take a look at an example.  Which of these sets of risks should be addressed?

Set 1 with a total exposure of 7 days made up of the following risks:

  • Risk A has a probability of 20% and an impact of 15 for an exposure of 3 days.
  • Risk B has a probability of 25% and an impact of 10 days for an exposure of 2.5 days.
  • Risk C has a probability of 30% and an impact of 5 days for an exposure of 1.5 days.

Or Set 2 with a total exposure of 7 days (6.7 rounded up) made of the the following risk:

  • Risk D has a probability of 95% and an impact of  7 days for an exposure of 6.7 days.

In the first set, I can mitigate 3 risks, each with very low probability of becoming problems. In the second set I mitigate only 1 risk that is almost certainly going to become a problem. Reducing the imminent risk seems to make the most sense but this choice is not reflected in a risk burndown chart. Simply reducing risk over time is not enough. You have to reduce the right risks.

Impact Isn’t Really About Money or Effort

The only way for a visual chart such as risk burndown to work is if we’re able to quantify risks. This is generally done with exposure. Exposure = probability x impact. Impact is a funny thing. Impact is an assessment of how much the consequence of a risk will affect the project if the risk becomes a problem. Traditionalists like to think about this from a money perspective (which makes sense since software engineers stole most of our risk management practices from the finance world, originally anyway). For small teams, effort is a better measure as in the number of person days a risk that becomes a problem will cost to fix. This is a quantifiable loss.

There’s a problem with thinking about impact in terms days of loss. Since not all risks are created equal, not all loss is truly equal either. Some kinds of loss can’t be measured in terms of effort. It really all depends on your project’s threshold of success. Some example risks (which don’t rely on ye olde life-critical system standby) from which you might never recover if they became problems include:

  • We don’t have a reliable backup solution; might lose all of our project data. (Lost yer data? You’re up a creek, son!)
  • We don’t have backup power for our data center; data centers might go offline for more than a few hours. (How many days will it take you to get those customers back?)
  • The demo has bugs and our contract renewal is based exclusively on how much the client likes our demo; a bug might occur during the demo. (HA! HA! You don’t have a job!)

In all of these cases you would reduce the risk by working on attributes other than impact (e.g. reduce probability, eliminate the condition, extend the time frame). Enough said. When it comes to calculating exposure, each of these risks has a catastrophic impact. That’s catastrophic, short for epic failure. No amount of days can really capture the essence of complete catastrophe.  Impact works best when considered in terms of success, not days or dollars lost.

Forget Risk Burndown

I want risk burndown to make sense, but given the problems I can’t help but think of it as a meaningless metric. Sure, some risks will be reduced and some will go away by converting into problems or being overcome by events. And a chart showing this would be really neat. But you’ll also uncover new risks as the project goes on. And some risks are just not worth caring about while others deserve a lot of attention. Risk management is about identifying the things that are most likely to kill your project so you can deal with them before it becomes too expensive (or impossible).  A burndown chart doesn’t reflect any of these things directly.

Burndown masks project risks too much and gives teams a false sense of confidence. To put it another way, there’s a risk with using risk burndown:

Our new risk management strategy assumes our estimation precision is better than it is; we may not mitigate the right risks.

Exposure is a ruse. And risk burndown is a metric for showing a reduction in exposure over time. To wax poetic, perception is reality and risk burndown provides a false perception.

That said, any risk management is better than none at all.  If a risk burndown chart helps to get your team thinking about risk, then so be it.  But there are other ways (might not be as fancy) to manage risk which are easier and more effective.

Project Signaling

Van Halen may have known more about project management than most program managers. Van Halen’s legendary “No Brown M&Ms Rider” is simultaneously the greatest example of rock star excess and project signaling I’ve ever seen. As David Lee Roth puts it:

The contract rider read like a version of the Chinese Yellow Pages because there was so much equipment, and so many human beings to make it function. So just as a little test, in the technical aspect of the rider, it would say “Article 148: There will be fifteen amperage voltage sockets at twenty-foot spaces, evenly, providing nineteen amperes . . .” This kind of thing. And article number 126, in the middle of nowhere, was: “There will be no brown M&M’s in the backstage area, upon pain of forfeiture of the show, with full compensation.”

So, when I would walk backstage, if I saw a brown M&M in that bowl . . . well, line-check the entire production. Guaranteed you’re going to arrive at a technical error. They didn’t read the contract. Guaranteed you’d run into a problem. Sometimes it would threaten to just destroy the whole show. Something like, literally, life-threatening.

In economics, signals are indicators that convey specific meaning between producers and consumers. For example, when you see THX on the side of a set of speakers, you know the speakers are going to probably be of audiophile quality. The THX logo is the speaker manufacturer’s signal to you, the consumer, that these speakers are really good. To David Lee Roth and the Van Halen road crew, the presence of brown M&Ms indicated that the hosting venue had not understood all details of the contract and had very likely made a mistake in configuring the set. One mistake in this case could cause malfunctions during the show or even the death of a crew member.

As it turns out, signaling software projects isn’t that difficult. The 12 step Joel Test is a reasonable signal for software development companies. While the Joel Test is nice for getting a feel for a company before you work for them, the concept is still useful once you’ve got the job and the project is in full swing.

Ultimately signals, also known as tripwires or triggers, are really just binary metrics for uncovering potential problems your project might be facing before the problems explode in your hands. When some condition is met (the signal), you know it has specific significance and prompts certain actions to prevent a problem from occurring. Triggers are most often used with risk management but their use should not be exclusive to that practice. In fact, if you’re collecting real data, you have even more opportunities for identifying signals outside of risk management.

On past projects I’ve used signals for a variety of issues. Here are some examples.

  • During the past 3 iterations the team identified between 15 and 20 defects. I expect a similar number of defects to be detected for this iteration. If more defects are detected, there may be a disconnection in understanding between requirements, design, and implementation. If fewer defects are detected, tests may not have been as rigorously defined as they should have been.
  • A Fagan inspection completed in less than one hour with a rate of 400 LOC/hour. Since most inspections have covered only 250 LOC/hour it is likely that this inspection was not effective and the results not reliable since the inspection team sped through the code.
  • When evaluating potential open source libraries, Source Forge projects without a website shows a general lack of dedication to the project and indicates that the software is probably of poor quality or ill-maintained; the library is worth neither the time nor effort to use.
  • Tasks that have been estimated to require longer than 9 hours have probably not been thoroughly thought through.
  • No risks have been identified for this project or risks have not been updated for several iterations. This implies that the team doesn’t have a realistic understanding of what problems the project faces.

In each of these examples, when the signal is heard, I knew there was going to be a problem on the project.

Work with your team to establish signals for your project. The best part is that once you’ve decided on the signals for your team, when triggers are tripped you can throw a Van Halen sized rock star fits in your cubicle! Well, try to resist throwing your monitor out the window anyway.

Binary is a Metric Too

Software developers are, in their heart of hearts, dataphiles – people who are absolutely in love with data. When was the last time you had a passionate discussion about frame rates, hardware benchmarks, gadget specs, sports statistics, dungeons and dragons, the merits of high def…the list goes on. Face it, you love data. You love comparing things using data. You don’t feel comfortable making decisions without a comprehensive comparison of data.

Why then do most software developers treat software development differently?

Tom DeMarco recently brought his own famous quote into question (pdf), musing that not only is it possible to control what you can’t measure, but the most important stuff you need to control on a software project is impossible to measure. Once again, DeMarco is wrong (in my opinion anyway). To prove his point DeMarco pointed at Wikipedia, something extremely valuable that was built without the use of metrics or formal control. This is a romanticized view of Wikipedia.

Wikipedia is one of the most controlled projects on the planet

On the surface, Wikipedia is the Wild West of online content. Not only can anyone edit any page, but content from Wikipedia is widely proliferated in the media and (sadly) school reports. Wikipedia is the single greatest success of user generated content in the history of mankind (“The Internet,” as the medium, doesn’t count). What started with a dozen humble articles has evolved into the most comprehensive encyclopedia ever created and includes everything from the fundamentals of science to the definitive source on Babylon 5.

What folks seem to forget is that even in the Wild West, there were laws and there were lawmen. Though we love to think romantically about such brigands and gunslingers as Jesse James, Billy the Kid, and Butch Cassidy, most stories about these historic figures are greatly exaggerated. So too is the case with Wikipedia.

Let’s take a closer look at the Wikipedia entry for Billy the Kid. This article belongs to a number of internal WikiProjects, visible from the top of the article’s talk page. The WikiProject Biography is not unlike most projects in Wikipedia. There are defined processes for assessing articles and conducting peer reviews. There are rubrics defined for assessing the quality of articles within the project. People even take on specific roles and responsibilities within the project. The collection of processes and information serves as the main means of coordination for contributors and helps the group control articles within the scope of the project.

The WikiProject Biography even collects metrics on articles which it then uses to make decisions concerning the articles under the project. The metrics are derived from quantifiable data and help control the project.

As it turns out, Wikipedia is not the lawless territory of the internet it has been made out to be.

You can measure the immeasurable

Wikipedia works because people were able to figure out ways to measure things that usually can’t be measured. The fundamental principle that many people overlook is that binary is a metric too. Yes or no questions can be just as effective a measure as any complex metric. Did everyone fill out their task data today? Yes or no. Did the estimate match the actual? Yes or no. Did the test pass? Yes or no. Is the project done? Yes or no. Have we identified risks? Yes or no. Has this risk become a problem? Yes or no.

At the heart of every complicated metric is really a series of yes or no, binary questions. When considering whether the project is done, you have to define done. One way of defining done is in terms of a checklist. Is feature 1 done? Is feature 2 done? Defining done for a feature could be as simple as checking whether all the tests have passed for the feature, again a binary measure.

For more subjective assessments, you can rely on observation-based, experience-defined rubrics. Does the team get along with one another? In the simplest form, this could be a binary metric (Am I friends with everyone on the team?) but it could also be more complicated relying on gut feelings and a guiding rubric (“we never hang out together and don’t trust one another” might indicate low harmony while “we hang out often and feel comfortable sharing personal stories” could indicate high harmony). Teachers use rubrics and experience to judge subjective assignments everyday. The difference is that they slap a grade on it and send it home as a report card.

While DeMarco is correct that many of most critical things in a project are the most difficult to measure, it is possible to create measurements if you feel it is important enough to do so. How would you assess whether you have a good architecture that solves the problem at hand? Rubrics might play a part but so too might binary gates based on quality attribute scenarios or intricate observations concerning design trends over time. If you think hard enough, you’ll find that it’s extremely easy to find measuring points for nearly every aspect of a software project.

Whatever you do, don’t become a mindless, data-driven robot

I love data and I know you do to. While it’s tempting to inject data collection and derive metrics for every aspect of a project (because it’s fun and informative!) don’t. Collecting data and calculating metrics can be expensive. Not so expensive that you shouldn’t use it, but expensive enough so that you shouldn’t use it on everything. I like to compare using metrics to eating out at restaurants. Once or twice a week isn’t that big a deal, but it’s not something you should do every day if you’re trying to watch your budget.

DeMarco is right about one thing: control is not the end-all-be-all of software engineering. Consider carefully, what are the most risky parts of my project? What are the parts of my project that even require control? What are the parts in which I need more insight or want to improve? Strategically develop metrics for these areas and don’t worry about measuring the rest. Trust me, the world won’t end. If you don’t know what you’re doing, start with a simple binary measure. And above all, if something isn’t working, change it.