Archive for the ‘Risk Management’ Category.

Choosing a Software Design Strategy

I was reading an article from the Joel on Software archives and was struck by this quote from The Project Aardvark Spec:

I can’t tell you how strongly I believe in Big Design Up Front, which the proponents of Extreme Programming consider anathema. I have consistently saved time and made better products by using BDUF and I’m proud to use it, no matter what the XP fanatics claim. They’re just wrong on this point and I can’t be any clearer than that.

When thinking about design there are two extremes pivoting around how much up front design work needs to be done before you start writing code. Proponents of up front design point out, as Joel has in the quote above, that making changes in the design is a lot cheaper in a 20 page design document than a partially or fully implemented system. With a solid plan you stand a better chance of not painting yourself into a corner and avoid having to make expensive design changes in the code later. Opponents of BDUF worry greatly about analysis paralysis, which can severely delay a team’s ability to provide working software to the client. Codified in this thinking are the assumptions that clients don’t really know what they want until they see it and that only by getting working software in the hands of users can we understand whether a proposed solution is a good enough solution. To these folks, writing a design document is wasted effort because they don’t know what they want and they believe to their core that things are going to change anyway.

Both of these perspectives are correct. Both of these perspectives are also wrong. Since everyone loses when thinking in extremes and ultimatums, let’s figure out what is really going on.

Choosing a Software Design Approach: Theory

Building software falls into an interesting case of problems known as wicked problems. This means basically that there are many possible solutions and that no solution is really “right” or “wrong” only “better” or “worse” given your current understanding of the problem. Further each problem is essentially unique, there isn’t an explicit stopping rule, and as the designer you will be held liable for consequences of your solution.

With this in mind, in thinking about software design there are two variables which will determine how you go about designing a solution.

  • How well do you understand the problem domain?
  • How well do you understand the solution space?

In the software world, understanding the problem domain means thinking about technical constraints, functional requirements, quality attributes, and business constraints. Understanding constraints is critically important since these will define some of the boundaries of the solution space. Functional requirements deal with what the system will do while quality attributes deal with how a system should behave when performing certain functions. Finally business constraints are things the customer simply requires of a solution usually for business reasons (a common example is a budget or delivery date).

The solution space is a multi-dimensional landscape filled with nearly limitless possibilities. As the designer it is your job to figure out how to navigate this space in search of a solution to a problem. Your current understanding of the problem limits your ability to see solutions and often the fitness of a solution can only be understood in reference to another possible solution. I imagine the solution space as a mountain range and my job as the designer is to find the “best” mountain for my client. Of course some mountains might block my view from others which means I may have to climb a peak just to see a better solution. This implies that my understanding of the problem dictates my understanding of potential solutions and that my understanding of the solution will yield further insights into the problem. This is one reason why it is often helpful to get working software in front of users quickly.

Graphical depiction showing how a team's knowledge of the soultion space and problem domain will be bounded by constraints prevent fully rational designs.

When designing something we have a natural tendency to prefer making rational design decisions where, after careful examination of all information we choose the best or optimized solution. To make a rational choice requires not only that we know everything about the problem domain but also everything about every possible solution to the problem. Both business constraints, by constraining time or money, and shortcomings in the human brain limit on our ability to seek this knowledge creating a boundary around which a solution must be found. And it is in this world of bounded rationality that we must find a solution for our customer’s problem. Searching for a solution then is not as much a matter of optimization (finding the “absolute best”) as it is a matter of satisficing, finding the best solution with the information currently available. To complicate matters further, an individual’s understanding of the problem domain and solution space is relative to that individual’s experience and capabilities, the size of the problem, and the complexity of the solution.

When designing software it’s natural to want to move as far up the knowledge curve, get as close to making a rational, optimal design, as possible. This tendency stems from our instinctual preference to avoid loss [1]. In other words, designers assume that an optimal solution exists and so are predetermined to reject less than optimal solutions.

Choosing a Software Design Approach: Application

In the software design world there are four basic types of design strategies.

  • Planned Design: All design is completed before beginning implementation. Often referred to as Big Design Up Front by detractors and associated with waterfall lifecycles. The essential assumption is that you can fully design a system before beginning construction.
  • Minimally Planned Design: Some design is completed before beginning implementation. Sometimes referred to as Little Design Up Front. In using this strategy you acknowledge that some change is likely to occur but still want to avoid painting yourself into the obvious corners.
  • Evolutionary Design: The design of the system grows as the system is implemented , but growth is deliberate and controlled by experienced engineers and proven engineering practices such as reference designs and refactoring. At least some requirements are usually specified in the beginning but it is not expected to be exhaustive.
  • Emergent Design: The design of the system is allowed to occur organically as the system is implemented without specific intentions.

Graphical representation showing when specific design strategies are applicable based on design stabilty and the amount of desired planning

The design strategies appropriate for your team to choose is going to be determined by the amount of risk remaining in the bounded knowledge graph. Less risk implies that you anticipate less change (or at least that you feel you can deal with the change reasonably) and the more options you will have in choosing a design strategy. Choosing a design strategy then is a matter of deciding the degree to which your team would like to plan within the determined appropriate strategies. The amount of planning will depends on your team’s preferences and experience, or can be driven by customers’ business needs. From a theory perspective, the amount of planning appropriate to the project is directly proportional to how much you know about the problem domain and the solution space. In other words, the less you know about a problem and the fuzzier the solution seems to you, the less capable you will be in making planned designs. As designers we run into problems in our designs when we are wrong about what we think we know about the problem domain or solution space. Unfortunately as eternal optimists, software engineers get these things wrong often.

Don’t feel comfortable with the design strategies appropriate for your project? Remember that the appropriateness of a design strategy is based on your current understanding of the solution space and problem domain. Therefore, to make more design strategies available to your team you must learn more about the solution space and problem domain. I find that examining the project risks is the best way to determine what you think you know about the problem domain and solution space. Understanding the engineering risks your project faces also serves as a blueprint for moving along the knowledge curve. For example, if your team feels that they don’t know enough about the problem based on the identified and prioritized risks, you can increase your knowledge about the problem domain. On the same token, if the team feels there is a lot of risk concerning the solution space you can explore different designs, run experiments, or create prototypes for your users to try. Another option other than looking at risk is to use a design process such as Architecture Centric Design Methodology (ACDM). ACDM is a staged design process that encourages teams to explore the design space through experiments by addressing issues in proposed designs. ACDM strongly focuses on quality attributes and will nudge your team toward a planned design strategy but the process can be adjusted to use a different design strategy, if that is something your team strongly desires, by adjusting the go/no-go criteria between stages.

Remember too that you learn more about the problem and solution as you implement the system. So by the end of the project there will be no risk, you’ll know everything about the system you can, and you’ll have all the information necessary for a great (and practically useless) planned design.

The Project Aardvark Design Approach

Let’s apply this information to Project Aardvark and figure out why Joel is such a strong supporter of BDUF at Fog Creek Software. Project Aardvark, like most other software systems developed at Fog Creek, is a developer tool. This means that the folks developing the system are also going to be at least one of the groups of end users. We can assume based on Fog Creek’s hiring strategies that the developers are pretty smart. Further, Aardvark falls into a class of problems which has been well studied meaning there are examples available for study and some folks at Fog Creek may have even implemented similar (but not identical) systems or subsystems.

Determining what design strategies are approporate for Project Aardvark based on knowledge concerning the design.

The implication is that the Project Aardvark problem domain is well understood and the solution space doesn’t need much exploration because Joel doesn’t perceive much risk in the solution. Since Joel has a firm grasp on the problem domain and since he feels confident about the solution space, he doesn’t anticipate much change in the design throughout development. Based on the information Joel has available about the project relative to his knowledge and experience with the problem domain and solution space, a planned design strategy is appropriate for this project. Joel could use other approaches further down the application curve too, but since Joel has relevant experience with planned design, the design probably won’t be that big (in fact it was less than 20 pages), and the Fog Creek team is used to following planned designs, planned design is probably a good fit for Project Aardvark.

One last note. Bounded rationality implies that the planned approach to design may not always be possible. There is a limit based on the size of the problem domain, solution space, and business constraints (especially time and resources available) that may prohibit you from effectively using Big Design Up Front. In other words, it may not always be possible to be high enough on the knowledge curve for a planned approach to make sense. Opponents of BDUF will tell you that most software falls into this category. In the case of Project Aardvark, the system was relatively small and Joel ensured that he had plenty of resources and time for exploring the solution space. It is critically important to understand where the boundaries that might prevent you from rational design are. Since we prefer rational optimization over decision making with less information, failing to recognize these boundaries will result in analysis paralysis.

By understanding some of the theory behind design decision making it’s easier to know where in the spectrum of possible design strategies your project belongs. Examining a project’s engineering risks is a relatively simple and repeatable way to determine where you are on the knowledge curve for a project. Now it’s possible to move your team up the knowledge curve, as necessary, to reach a point where you feel comfortable with the risks your project faces. You move along this curve by addressing the project’s risks through prototyping, research, experiments, evaluating proposed designs, and interacting with the customer. You could also use a specific design process such as ACDM. It is important to recognize that by moving up the knowledge curve you are necessarily moving your team closer to being able to use a planned design strategy. Your location on the knowledge curve will determine which design strategies you can use but you still need to determine which makes the most sense for your team and the project. Finally, these curves are relative and the scale is going to change from project to project and team to team. I think the principles behind the curves are more important than the curves themselves and you shouldn’t take the curves literally as a mathematical function.

Further Reading

End Notes

[1] This also explains why humans behave the way we do in free market economies, gambling, and other similar situations. I am attempting to apply information I learned watching this TED Talk on monkey economics and irrational decision making by Laurie Santos.

The Reality of Risk Exposure

Over the past few weeks I’ve been thinking a lot about risk exposure in the context of managing projects. Exposure is a technique used almost universally when managing risks, yet as I’ve already discussed, exposure can cause major problems because it’s a precise number based on mostly made-up information. At the same time, exposure is used widely and successfully – otherwise there wouldn’t be as much literature throughout the web telling you to calculate risk exposure.

This begs the question: is risk exposure really as meaningless as I’ve made it out to be? I’ve collected some data that helps answer this question.

Data Collection and Context

Risk management is one of the basic subjects covered in the Managing Software Development course, one of the five core courses students of the Carnegie Mellon Master of Software Engineering program take in completing their degree. Students learn about the continuous risk management paradigm from the Software Engineering Institute. Two of the cornerstones of this technique are threshold of success and condition-consequence based risk statements.

Having ready access to risk management experts at the SEI, nearly every team conducts a facilitated small team risk evaluation workshop in which risks are collected with the help of a taxonomy-based questionnaire (pdf), analyzed, and prioritized using group multi-voting. The basic workshop has been conducted the same way for close to a decade and many teams have put their risk data collected during the workshop in the MSE’s project archive.

I’ve gathered data from these small team risk evaluation workshops for 9 MSE Studio teams, a total of 164 identified, analyzed, prioritized risks.

What’s in the Data?

During a risk evaluation workshop, teams identify risks using their threshold of success as a guide. Once identified, risks are briefly analyzed and assigned an impact, probability, and time frame value based on a rough average from the team members’ initial gut feeling on the risk. These values are assigned simply so when a manager asks to see the probability, for example, there is a value to give him. Each of impact, probability, and time frame can only be one of 3-4 values. The idea is that by decreasing the precision we can increase the accuracy. Values are assigned based on a rubric. For the purposes of calculating a risk exposure I assigned each of the analysis categories a number. Time frame is not used in calculating exposure.

Impact

  • Catastrophic – The team will be unable to meet threshold of success. (numeric value 4)
  • Critical – The team can only meet the threshold of success with significant additional effort and stress. (numeric value 3)
  • Marginal – The team can meet the threshold of success with minimal extra effort. (numeric value 2)
  • Negligible – There is no real impact on achieve the threshold of success or little increase in effort. (numeric value 1)

Probability

  • High – Chance of becoming a problem is above about 80%. (numeric value .8)
  • Medium – Chance of becoming a problem is about 50/50. (numeric value .5)
  • Low – Chance of becoming a problem is below about 20%. (numeric value .2)

Time Frame

  • Short – May occur in about a month or less.
  • Medium – May occur in 1 to 3 months.
  • Long – May occur in more than 3 months.

Instead of relying on the results from the analysis, teams perform 3 to 4 rounds of multi-voting. The final multi-voting rank is shown. Not all teams ranked all risks since teams generally only deal with the top few risks, usually less than 10. This idea is captured in the priority. A risk is either a high priority, meaning the team is actively addressing it, or a low priority meaning the team is aware of it but it was not ranked high enough to deal with yet. Teams might choose different strategies for determining priority. The two most popular are to only examine the top X or to rely on consensus derived from how the risks clustered as a result of multi-voting. Usually there is strong team consensus for the top 4 to 5 risks and weak consensus after this.

Analysis and Discussion

My hypothesis is that teams’ rankings will generally match exposure, meaning that risks that are ranked highly will also have a high exposure. As the data shows, this is generally the case. On average nearly every team’s high priority risks were also the ones with the highest exposure.

Graph showing Teams' Average Risk Exposure by Priority.

Examining the risks rank and exposure tells a similar story but not convincingly. There is a relatively weak negative correlation (correlation coefficient of -0.22) between exposure and team assigned rank. Basically the best that can be said is that there is a general downward trend in exposure as the rank increases but there is enough variation that I can’t really say anything for certain.

Graph showing risk data for all teams.

I have two possible explanations for this. First, traditional risk exposure does not take into account time frame while teams evaluating risks in this data set do. So, all things equal from an exposure perspective, a long term risk might be ranked very low while a short term risk will be ranked much higher. If this were the case, we’d see more short-term risks assigned high ranks than long-term risks and this is indeed the case. In fact, the majority of risks identified are short-term risks with nearly three times more short-term risks being identified than long term risks. Mid-term risks are, unsurprisingly in the middle. A better exposure number might be had by taking into account risks’ time frame values.

Graph showing the count of risks per time frame by rank

The second possible explanation I have is that 3 – 4 buckets isn’t sufficient to allow for enough variation to form a strong correlation between rank and exposure. Indeed this is one of the greatest differences between this data set and traditional risk exposure calculations in which impact might take on nearly any number and exposure is usually a percentage from 10 – 100%. That said there still is a general trend which shows that most of the time, multi-vote ranking very roughly corresponds to exposure.

There is one more catch about this data and it’s a subtle but important one. Values for probability, impact, and time frame were determined as a team using a sort of rough average approach where team members vote and the approximate averages are rounded to the nearest bucket. Since all the values and rankings were determined through a group effort, it would make sense that they should roughly correspond.

Conclusions

As it turns out, risk exposure is a rough and somewhat accurate indicator for relative risk priority, at least when calculating exposure or rank using group-driven techniques. Teams relying only on exposure are likely to rank some risks higher than they otherwise might. Part of this is due to exclusion of the concept of time from traditional exposure, part of it might be differences of opinion within the group as far as impact or probability are concerned.

Talking with other MSE alumni, and I mostly agree with them, the most important thing about risk management is bringing up concerns and talking about them. Delphi mutli-voting is an easy way to encourage conversation since differences of opinion are addressed as part of the multi-voting process. No matter what technique you use, exposure (with time somehow included), multi-voting, or some combination, do not reduce risk management to simple numbers. It’s really all about communication. Encourage this communication using whatever techniques work for your team.

Raw data used for analysis in CSV format.

A Closer Look at Risk Burndown

I like the idea of the risk burndown chart. Burndown is an effective and satisfying visual indicator of progress and it’s relatively easy to calculate to boot. But does looking at a project’s risks through the lens of a burndown chart make sense?

I see several problems with thinking about risk in this way.

Numbers can be Misleading

The first key to effective risk management is to value accuracy over precision. This means that it’s better to be right in your predictions than it is to be spot on correct. Remember, risk is about assessing your likelihood for project success. It doesn’t matter if you miss your threshold of success by a little or a lot; either way you still fail the project!

Pop quiz. Say there are two risks in your project. There’s a 25% probability that Risk A will become a problem while Risk B only has a 20% probability. For now, assume the impact is the same for both risks. Which risk is a greater threat to the project?

That one’s easy. Risk A is a greater threat because, impacts aside, Risk A has a 5% greater probability of turning into a problem.  Ok.  What if I told you that I made up probabilities based on my gut feelings so I could easily rank risks? Now which risk is a greater threat to the project?

The real question I’m asking you is this. Are you willing to bet the success of your project on those numbers? Because if my best guess, gut feeling probabilities are off by more than 5%, the project could be in serious trouble depending on the risks’ impacts.

I know, I know. That was a trick question. Nobody on your team would make up numbers on one of your software projects. In all fairness, nobody goes out of their way to fabricate false values. Use your logics. If you were any good at guessing the probability of futures events occurring, you would not be reading this post right now. You would be a multi-millionaire, off enjoying your gambling winnings from the ponies. Too much precision gives folks too much confidence in the correctness of your assessment when the reality is that probability and impact are based on best guesses and gut feelings. Probability and impact numbers just make it easier to calculate exposure so risks can be ranked automatically.  Burndown is a fairly precise metric.

Not all Risks are Created Equal

If you are monitoring project risk with a risk burndown chart, how do you know whether the right risks are being reduced? Let’s take a look at an example.  Which of these sets of risks should be addressed?

Set 1 with a total exposure of 7 days made up of the following risks:

  • Risk A has a probability of 20% and an impact of 15 for an exposure of 3 days.
  • Risk B has a probability of 25% and an impact of 10 days for an exposure of 2.5 days.
  • Risk C has a probability of 30% and an impact of 5 days for an exposure of 1.5 days.

Or Set 2 with a total exposure of 7 days (6.7 rounded up) made of the the following risk:

  • Risk D has a probability of 95% and an impact of  7 days for an exposure of 6.7 days.

In the first set, I can mitigate 3 risks, each with very low probability of becoming problems. In the second set I mitigate only 1 risk that is almost certainly going to become a problem. Reducing the imminent risk seems to make the most sense but this choice is not reflected in a risk burndown chart. Simply reducing risk over time is not enough. You have to reduce the right risks.

Impact Isn’t Really About Money or Effort

The only way for a visual chart such as risk burndown to work is if we’re able to quantify risks. This is generally done with exposure. Exposure = probability x impact. Impact is a funny thing. Impact is an assessment of how much the consequence of a risk will affect the project if the risk becomes a problem. Traditionalists like to think about this from a money perspective (which makes sense since software engineers stole most of our risk management practices from the finance world, originally anyway). For small teams, effort is a better measure as in the number of person days a risk that becomes a problem will cost to fix. This is a quantifiable loss.

There’s a problem with thinking about impact in terms days of loss. Since not all risks are created equal, not all loss is truly equal either. Some kinds of loss can’t be measured in terms of effort. It really all depends on your project’s threshold of success. Some example risks (which don’t rely on ye olde life-critical system standby) from which you might never recover if they became problems include:

  • We don’t have a reliable backup solution; might lose all of our project data. (Lost yer data? You’re up a creek, son!)
  • We don’t have backup power for our data center; data centers might go offline for more than a few hours. (How many days will it take you to get those customers back?)
  • The demo has bugs and our contract renewal is based exclusively on how much the client likes our demo; a bug might occur during the demo. (HA! HA! You don’t have a job!)

In all of these cases you would reduce the risk by working on attributes other than impact (e.g. reduce probability, eliminate the condition, extend the time frame). Enough said. When it comes to calculating exposure, each of these risks has a catastrophic impact. That’s catastrophic, short for epic failure. No amount of days can really capture the essence of complete catastrophe.  Impact works best when considered in terms of success, not days or dollars lost.

Forget Risk Burndown

I want risk burndown to make sense, but given the problems I can’t help but think of it as a meaningless metric. Sure, some risks will be reduced and some will go away by converting into problems or being overcome by events. And a chart showing this would be really neat. But you’ll also uncover new risks as the project goes on. And some risks are just not worth caring about while others deserve a lot of attention. Risk management is about identifying the things that are most likely to kill your project so you can deal with them before it becomes too expensive (or impossible).  A burndown chart doesn’t reflect any of these things directly.

Burndown masks project risks too much and gives teams a false sense of confidence. To put it another way, there’s a risk with using risk burndown:

Our new risk management strategy assumes our estimation precision is better than it is; we may not mitigate the right risks.

Exposure is a ruse. And risk burndown is a metric for showing a reduction in exposure over time. To wax poetic, perception is reality and risk burndown provides a false perception.

That said, any risk management is better than none at all.  If a risk burndown chart helps to get your team thinking about risk, then so be it.  But there are other ways (might not be as fancy) to manage risk which are easier and more effective.

SWOT vs. Risk Management

I was recently asked by a coworker how software risk management is different from traditional SWOT analysis. SWOT is a technique commonly used for strategic planning where the strengths, weaknesses, opportunities, and threats facing a group are compiled and analyzed to determine an appropriate course of action. Software risk management (as defined using the continuous risk management paradigm from the Software Engineering Institute) is similar in that risk management can be used for strategic planning but risks yield much different information which is applied in a very different way.

The first step when performing a SWOT analysis is to define the business objectives. This is very similar to defining a threshold of success in software risk management. The main difference is a business objective takes the form of the desired end state whereas the threshold of success is the minimum objectives necessary for the project to be successful. For example, a perfectly valid business objective might be to deliver all 100 story points by the end of the year while the threshold of success might be to deliver the core functionality (worth only about 50 story points). Would more stories completed be better? Of course, but what if you end up only completing 75 story points by the end of the year? How did you do? You missed your goal, but you still succeeded right? It’s difficult to tell without understanding the difference between wants and needs.

The main part of a SWOT analysis consists of a group session where strengths and weaknesses internal to the group and opportunities and threats external the group are identified. People like to put SWOTs into a 4×4 grid so it’s easier to look at. While there is some great advice out there for understanding what goes into a SWOT, the analysis is largely subjective, relying on a teams’ gut feelings to know the strengths from the weaknesses, the opportunities from the threats. Software risk management can be a much more systematic approach to understanding the potential dangers that face a project based on known facts when tools such as the SEI’s Taxonomy Based Questionnaire for risks (pdf) are used. Guts still come into play, but there is enough engineering in place to help people make the right decisions.

Risks are specifically actionable – depending on the risk you might be able to mitigate it by manipulating the timeline, impact of the consequence, probability of the risk occurring, or by addressing the condition. You might transfer the risk to someone else or simply accept the risk. SWOT by itself is merely a collection of statements relative to internal or external entities which may or may not actually be true. Are you good at testing? How do you know that? Is Bing really a threat to Google Search? Should you do anything about your weaknesses? Will they prevent you from achieving your business objectives? Without further analysis there really is no way to know and other than prioritizing there really is no way to analyze a SWOT, nor is there any clear direction for next steps.

Look, when planning a project you really need both SWOT analysis and risk management. SWOT is a tool for assessing capabilities while risk management is a tool for assessing the likelihood of success. Each technique serves a very different purpose. SWOT is most useful at the beginning of a project to help you figure out what you’re doing and come up with an overall strategy. Risk management, though is an ongoing activity that makes sure you don’t fall flat on your face in trying to achieve your business objectives.

Threshold of Success

When I was a kid my brother and I used to play a game called Make Believe. My favorite variant of the game was simple. Together we would build some kind of fortress and then one person gets the fort and the other person tries to invade the fort. In theory, the game ends when the fort has been overtaken by the invader. What made the game fun was that as the invasion began, the rules of the game always changed. The first thing to go was any notion of death. If one of us was “killed” in battle then near instantaneous respawning was created. Shortly after that we skipped respawing and simply became invincible. Soon the fort became invisible which means the invader just has to run around trying to find it. Sometimes someone gained super strength or the ability to force other people to move in slow motion. We almost always created super weapons (such as a hand held Death Star) which for some reason could always be defended against. Nearly every game ended in tragedy, someone crying or upset: “That’s not fair! You can’t do that! I’m invisible! You can’t do that!”

Kid’s stuff right?

A lot of software projects with teams made up of working adults still play this game. The scenario goes something like this. A team is put together to build some software. Neither the clients nor the team talk about the objectives of the project other than building “some software”. After a few months, something goes wrong or someone doesn’t like what’s happening so someone changes the rules. Before too long, one side or the other is upset that they can’t win, somebody throws a fit, and goes home. Instead of summoning invisible armor, software projects change the rules by cutting features, adding more requirements, moving due dates, wasting resources, and things like that.

We make believe that we’re software engineers.

While Make Believe was a fun game as a kid, changing the rules when there’s real money on the line isn’t as fun. My brother and I ran into problems as kids because we got the objectives of the game wrong. Actually, there were no common objects, which is why we could change the rules so easily. The same thing happens on a software project when the objectives aren’t well known.

Defining and committing to a clear picture of success establishes the common ground rules for a project by making the basic project goals explicit. The technique is known as Threshold of Success.

Defining What Success Looks Like

The Threshold of Success for a project is the minimum set of conditions that must be met for the project to be considered successful. If the team fails to meet even one of the conditions then the project is a failure. A good Threshold of Success is made up of about 3-4 SMART goals (no more than a few bullets on a single PowerPoint slide). SMART is a mnemonic which stands for Short/Specific, Measurable, Achievable, Relevant, and Time bound.

Some other pointers for defining a Threshold of Success:

  • The Threshold of Success should be built as a team. Since this is the measure by which you will define success or failure, everyone on the team must buy into it. If you can include your client that’s even better.
  • Threshold of Success goals should be challenging, but it’s important that they are achievable. If the goals are too easy, victory will be meaningless, too difficult, elusive.
  • Once the Threshold is established, don’t change it! The only reason to modify the Threshold of Success is if the project has changed so drastically that the Threshold no longer makes sense (for example if someone leaves the project).
  • Revisit the Threshold of Success regularly (a good time is when planning iterations) so everyone remembers what success looks like. Put it on your team wiki so that it’s readily accessible.
  • Be sure that the goals in your Threshold are SMART! The point of defining a Threshold of Success is to take away the wiggle room for defining what it means to succeed or fail. The goals you define should make this black and white. The more specific the goal is the better.

Building a Threshold of Success

The easiest way to create a Threshold of Success is to first create a minimum picture of failure, then convert failure into success. Here’s an example:

Failure for my current project might look something like this.

  • Essential features are not ready by the end of the second quarter.
  • Team members are dissatisfied or bored with their jobs.
  • Newly hired team members don’t feel like they’re part of the team by March 31.
  • There isn’t enough money to continue development after this fiscal year and we have to fire people.

Now that I know what failure looks like, seeing success is easy. I don’t want any of these things to happen. The threshold of success for my current project might look something like this.

  • By the end of the second quarter, all “Must Have” features are implemented and pass acceptance tests with no known critical defects.
  • All team members give average score of 5 or better on a job satisfaction survey taken quarterly.
  • By March 31, the team has successfully executed at least three team building activities with all team members present.
  • Funds of at least $1 million are secured by December 31 to allow for future development without a reduction in team size.

Notice that only 1/4 of the success goals in this example are related to software functionality. While goals might come from anywhere, teams traditionally focus on goals related to people and relationships, process, resources (such as budget or schedule), and product (software functionality and quality).

As this technique originated with the Software Engineering Institute (pdf), nearly every studio team in the Carnegie Mellon Master of Software Engineering program creates a Threshold of Success for their projects. The MSE Studio Archive has extensive examples of both good and bad pictures of success that teams have created. The Square Root Team’s threshold (my team) is a good place to start, but there are plenty of other examples.

There might be many goals for a project. In the Team Software Process you actually identify at least three different kinds! But there is only one threshold of success for a project. Knowing what success looks like gives you a better chance of actually achieving it.  Without it, you’re just pretending that you know what’s going on.