Posts tagged ‘planning’

Lightweight Experiments for Process Improvement

[This post is a recap on the second talk I gave at XP2010. This was the big one, the experience report talk, one of 15 experience reports published at XP2010. You can download the slides (pdf) or the full paper (pdf) from this website or from XP2010.org.]

Process improvement is important for nearly all teams but it can sometimes be difficult for a team to know what is working, what isn’t working, and what to techniques or methods to try when attempting to improvement. Performing a scientific experiment is one way help overcome these problems but as academic research has shown us, while experimentation can yield interesting results, running an experiment is time consuming, expensive, and requires some serious thinking and control to pull off. From a practitioner’s standpoint this means that experimentation is a non-starter.

Of course, that’s only if you run experiments like an academic.

Banner from the XP2010 conference in front to the hotel.

Back Story

Just over a year ago, my MSE studio team at Carnegie Mellon had a problem. We had decided we would use Extreme Programming for the construction phase of our project but some team members had doubts concerning pair programming. We had decided that we would use some kind of peer review, having already seen the many benefits of inspection when reviewing other artifacts. The dispute arose over whether pair programming would give similar enough results. Also, not all team members had experience with pair programming but everyone on the team knew and enjoyed solo programming.

The number one concern was whether pair programming would allow us to meet our very strict deadline. We had just over three months to complete the construction phase of the project. According to our threshold of success this meant implementing all “must have” requirements with a minimum level of quality. Did we really have time to waste by having two people working on the same code at the same time? Wouldn’t working independently and inspecting code on an as needed basis allow us to get more work done faster?

At the time it just so happened that I was taking a reading class with Mary Shaw and in that class we discussed some research findings that might help settle this debate. Research from Laurie Williams, Ward Cunningham, Barry Boehm, and many others showed that pair programming requires more effort (but never double the effort) but is faster than programming alone (pdf). Also pair programming creates code of about the same quality as coding alone with inspection (pdf). Of course, the research may not apply to us since Square Root is closer to a professional team working on a large project with a real client, not undergrads working on short term toy projects.

After an iteration where some teammates used pair programming and others refused, we decided to try an experiment to see which practice actually worked better. The original idea was that we might be able to validate some of the research but decided instead that it was more important just to resolve our own internal conflicts and figure out which processes worked better.

Conducting a Lightweight Experiment

With the scientific method as our guide we planned and executed a lightweight experiment which pitted programming alone against pair programming. The results were amazing (and you can find the raw data in our project archive). In conducting the experiment we used a set of novel techniques which I think can be useful in conducting other lightweight experiments. There’s more background in the experience report so I’m only putting the meaty stuff in this post.

Focus narrowly on a single question – The essential key to keeping an experiment light is to only tackle one thing at a time. In this case we focused on comparing and contrasting a single technique, pair programming, rather than multiple techniques or an entire process (such XP vs. TSP).

Divide work, not teams – If I were comparing pair programming to programming alone in an academic setting, I would put together two teams of about the same experience and have them each build their own version of the same software, one team using pair programming, the other programming alone. In a business setting this is a complete waste and few companies can afford to have two teams duplicating effort. By dividing work instead of teams you may lose some control over variables in the experiment but in most cases isolating more variables doesn’t add any further clarity to helping answer the narrowly focused question. To divide work successfully you need to have some way of estimating work units for division. We used use case points as shown in the figure depicting our modified planning game.

Steps in modified planning game for dividing work into experiment groups

Continue making releases – Since we still needed to make a comparison, rather than dividing into teams and duplicating effort we divided the features that were released each iteration. In this way we built about half the features released during an iteration using each technique. Working on about half the features using pair programming meant that at least some features were being built by individuals. At the time this was a risk reduction decision to make sure that if pair programming completely failed we’d still have something to ship at the end of the iteration. Explicitly managing risks is the only way to know if the lightweight experiment may cause problems for making releases. Also, we had a strictly defined cut-off for stopping the experiment if it ever stopped us from shipping to our client.

Use the data you have – In almost all cases we were able to get the data we needed to evaluate our hypothesis from our current process. When we couldn’t, we only had to make minor modifications to our data collection practices, for example adding a check box to our SharePoint server for indicating whether a task was paired or individual.

One of the more interesting things we did was to create a “tally sheet” for collecting pair programming issue detection statistics in real time, as the issues were discovered. Given the near instantaneous code-inspect-fix cycle when programming in pairs, this was the only way to collect similar data for comparing pair programming to inspection.

Example of a real time tally sheet used for tracking issues discovered while pair programming.

Statistical significance is overrated – The whole point of running a lightweight experiment is to collect just enough data to help you make a better decision or validate your gut feeling. This technique is not meant for uncovering universal truths or proving something to the rest of the world. In exchange for keeping the experiment light, the results will only apply to your team. Over the course of an iteration or two, 4-6 weeks, you’ll only get enough data to start to see trends. In our case the results were not statistically significant using individual T-tests but that didn’t matter. The most important thing is that we had data that could be used for comparison, data that everyone felt good about and that helped us gain clarity into what we did and how well it worked.

Retrospectives get immediate value – The whole reason the experiment is light is to reduce cost and decrease the lag time to providing value to the team. Just to give you a little perspective, it took us 6 weeks to run the experiment and had enough data and casual observations to make a decision during the retrospective when the analyzed data was shared. That event occurred in early August of 2009. This experience report required almost nine full months of gestation from the paper proposal to the talk I gave at the conference. The gestation period on “universal truth” research can be even longer. We, as practitioners, don’t have to wait for those universal truths to be born to get value from research. By running your own quick and dirty, lightweight experiments, you can get results in a timely fashion that you know will apply to your team because your team was the subject of the experiment. It’s all about closing the gaps between research and practice and taking the information you need now instead of waiting for academic research to catch up.

Overall Conclusions

For the Square Root team it turned out that pair programming was faster, cheaper, and produced code that had more predictable albeit slightly worse quality. The more important lesson is that we discovered a technique, lightweight experimentation, for learning other interesting things about our team and about software engineering in general.

My paper and this blog post were all about trying to describe the technique, using our experiment as an example. I think it would be awesome if teams around the world conducted lightweight experiments on a variety of topics. If enough folks share what they learn, we might start to see trends emerge across teams that could lead to universal truths, validate research, or at least discover some great rules of thumb.

What else might make for a great experiment? Anything you’ve got a question about on your team!

  • What is the clearer way to write requirements, user stories or use cases?
  • Which estimation technique is more accurate of X and Y?
  • Can we skip unit testing if we use inspection (looking at quality, knowledge sharing)?
  • Is UML a better design notation than the one we made up as a team?
  • What else…?

If you do a lightweight experiment, let me know! Share what you learn as a blog post or whitepaper. Let others know what you’ve learned! Even if the specific results only apply to your team and the way you’ve executed your project, your experiences help form a baseline, a sort of shared understanding for how software development works, how some of these practices work. And there’s so much about software engineering that we have yet to learn.

Acknowledgements

This paper was my first experience report and it was an awesome journey. Naturally a lot of folks helped me along the way and I would like to take a moment to make sure they know that I appreciate their influences and support. The Square Root team: Marco Len, Yi-Ru Liao, Abin Shahab, and especially my fellow experiment co-champion Sneader Sequeira for having the guts to go along with this idea in the first place. Some of the faculty at Carnegie Mellon: Dave Root and John Robert (my studio mentors) for bringing up the idea of writing a paper, and Jonathan Aldrich for helping review my proposal. Artem Marchenko was my XP2010 paper shepherd after the proposal was accepted, and the quality of each draft only improved because of his inputs. A group of my fellow employees at Net Health Systems sat through an early draft of the presentation I gave and shared valuable feedback for improving it. And finally I thank, Marie, my wife, who was with me from start to finish and read more drafts and sat through more practice talks than anyone else. She’s probably as much an expert on this subject by now as I.

A Final Aside

I wrote the initial draft of this paper as my final reflection paper for my Master of Software Engineering degree (pdf). That draft has a very different tone, approach, conclusion, and direction than what I eventually published for XP2010. This is half due to there not being a hard page limit but also I had a lot more time to think about what was really important when writing for XP2010. There’s some interesting information, mostly in the lessons learned, that might prove interesting to those who are interested. You should check out my Square Root teammates’ reflection papers as well since they are all interesting and well written.

SWOT vs. Risk Management

I was recently asked by a coworker how software risk management is different from traditional SWOT analysis. SWOT is a technique commonly used for strategic planning where the strengths, weaknesses, opportunities, and threats facing a group are compiled and analyzed to determine an appropriate course of action. Software risk management (as defined using the continuous risk management paradigm from the Software Engineering Institute) is similar in that risk management can be used for strategic planning but risks yield much different information which is applied in a very different way.

The first step when performing a SWOT analysis is to define the business objectives. This is very similar to defining a threshold of success in software risk management. The main difference is a business objective takes the form of the desired end state whereas the threshold of success is the minimum objectives necessary for the project to be successful. For example, a perfectly valid business objective might be to deliver all 100 story points by the end of the year while the threshold of success might be to deliver the core functionality (worth only about 50 story points). Would more stories completed be better? Of course, but what if you end up only completing 75 story points by the end of the year? How did you do? You missed your goal, but you still succeeded right? It’s difficult to tell without understanding the difference between wants and needs.

The main part of a SWOT analysis consists of a group session where strengths and weaknesses internal to the group and opportunities and threats external the group are identified. People like to put SWOTs into a 4×4 grid so it’s easier to look at. While there is some great advice out there for understanding what goes into a SWOT, the analysis is largely subjective, relying on a teams’ gut feelings to know the strengths from the weaknesses, the opportunities from the threats. Software risk management can be a much more systematic approach to understanding the potential dangers that face a project based on known facts when tools such as the SEI’s Taxonomy Based Questionnaire for risks (pdf) are used. Guts still come into play, but there is enough engineering in place to help people make the right decisions.

Risks are specifically actionable – depending on the risk you might be able to mitigate it by manipulating the timeline, impact of the consequence, probability of the risk occurring, or by addressing the condition. You might transfer the risk to someone else or simply accept the risk. SWOT by itself is merely a collection of statements relative to internal or external entities which may or may not actually be true. Are you good at testing? How do you know that? Is Bing really a threat to Google Search? Should you do anything about your weaknesses? Will they prevent you from achieving your business objectives? Without further analysis there really is no way to know and other than prioritizing there really is no way to analyze a SWOT, nor is there any clear direction for next steps.

Look, when planning a project you really need both SWOT analysis and risk management. SWOT is a tool for assessing capabilities while risk management is a tool for assessing the likelihood of success. Each technique serves a very different purpose. SWOT is most useful at the beginning of a project to help you figure out what you’re doing and come up with an overall strategy. Risk management, though is an ongoing activity that makes sure you don’t fall flat on your face in trying to achieve your business objectives.

Threshold of Success

When I was a kid my brother and I used to play a game called Make Believe. My favorite variant of the game was simple. Together we would build some kind of fortress and then one person gets the fort and the other person tries to invade the fort. In theory, the game ends when the fort has been overtaken by the invader. What made the game fun was that as the invasion began, the rules of the game always changed. The first thing to go was any notion of death. If one of us was “killed” in battle then near instantaneous respawning was created. Shortly after that we skipped respawing and simply became invincible. Soon the fort became invisible which means the invader just has to run around trying to find it. Sometimes someone gained super strength or the ability to force other people to move in slow motion. We almost always created super weapons (such as a hand held Death Star) which for some reason could always be defended against. Nearly every game ended in tragedy, someone crying or upset: “That’s not fair! You can’t do that! I’m invisible! You can’t do that!”

Kid’s stuff right?

A lot of software projects with teams made up of working adults still play this game. The scenario goes something like this. A team is put together to build some software. Neither the clients nor the team talk about the objectives of the project other than building “some software”. After a few months, something goes wrong or someone doesn’t like what’s happening so someone changes the rules. Before too long, one side or the other is upset that they can’t win, somebody throws a fit, and goes home. Instead of summoning invisible armor, software projects change the rules by cutting features, adding more requirements, moving due dates, wasting resources, and things like that.

We make believe that we’re software engineers.

While Make Believe was a fun game as a kid, changing the rules when there’s real money on the line isn’t as fun. My brother and I ran into problems as kids because we got the objectives of the game wrong. Actually, there were no common objects, which is why we could change the rules so easily. The same thing happens on a software project when the objectives aren’t well known.

Defining and committing to a clear picture of success establishes the common ground rules for a project by making the basic project goals explicit. The technique is known as Threshold of Success.

Defining What Success Looks Like

The Threshold of Success for a project is the minimum set of conditions that must be met for the project to be considered successful. If the team fails to meet even one of the conditions then the project is a failure. A good Threshold of Success is made up of about 3-4 SMART goals (no more than a few bullets on a single PowerPoint slide). SMART is a mnemonic which stands for Short/Specific, Measurable, Achievable, Relevant, and Time bound.

Some other pointers for defining a Threshold of Success:

  • The Threshold of Success should be built as a team. Since this is the measure by which you will define success or failure, everyone on the team must buy into it. If you can include your client that’s even better.
  • Threshold of Success goals should be challenging, but it’s important that they are achievable. If the goals are too easy, victory will be meaningless, too difficult, elusive.
  • Once the Threshold is established, don’t change it! The only reason to modify the Threshold of Success is if the project has changed so drastically that the Threshold no longer makes sense (for example if someone leaves the project).
  • Revisit the Threshold of Success regularly (a good time is when planning iterations) so everyone remembers what success looks like. Put it on your team wiki so that it’s readily accessible.
  • Be sure that the goals in your Threshold are SMART! The point of defining a Threshold of Success is to take away the wiggle room for defining what it means to succeed or fail. The goals you define should make this black and white. The more specific the goal is the better.

Building a Threshold of Success

The easiest way to create a Threshold of Success is to first create a minimum picture of failure, then convert failure into success. Here’s an example:

Failure for my current project might look something like this.

  • Essential features are not ready by the end of the second quarter.
  • Team members are dissatisfied or bored with their jobs.
  • Newly hired team members don’t feel like they’re part of the team by March 31.
  • There isn’t enough money to continue development after this fiscal year and we have to fire people.

Now that I know what failure looks like, seeing success is easy. I don’t want any of these things to happen. The threshold of success for my current project might look something like this.

  • By the end of the second quarter, all “Must Have” features are implemented and pass acceptance tests with no known critical defects.
  • All team members give average score of 5 or better on a job satisfaction survey taken quarterly.
  • By March 31, the team has successfully executed at least three team building activities with all team members present.
  • Funds of at least $1 million are secured by December 31 to allow for future development without a reduction in team size.

Notice that only 1/4 of the success goals in this example are related to software functionality. While goals might come from anywhere, teams traditionally focus on goals related to people and relationships, process, resources (such as budget or schedule), and product (software functionality and quality).

As this technique originated with the Software Engineering Institute (pdf), nearly every studio team in the Carnegie Mellon Master of Software Engineering program creates a Threshold of Success for their projects. The MSE Studio Archive has extensive examples of both good and bad pictures of success that teams have created. The Square Root Team’s threshold (my team) is a good place to start, but there are plenty of other examples.

There might be many goals for a project. In the Team Software Process you actually identify at least three different kinds! But there is only one threshold of success for a project. Knowing what success looks like gives you a better chance of actually achieving it.  Without it, you’re just pretending that you know what’s going on.

Process Affordances: Ignore at Your own Peril

The Amsterdam airport was able to reduce the amount of urine “spillage” that hit the men’s room floor by 80% simply by etching a life-like image of a fly near the urinals’ drains. The fly was specifically engineered into the urinals to alter gentlemen’s behavior without their having to think about it. The concept is called nudging and it’s been used in domains other than restroom sanitation to encourage desired behavior. Other examples include the use of uncomfortable chairs in fast food restaurants to encourage people not to linger and real-time gas mileage displays in cars to encourage more economical driving. If you’ve read Donald Norman’s The Design of Everyday Things then you’ll know this as an affordance – a hint given to the user prompting them to take a specific action at a specific time.

Obviously the idea of affordances is directly applicable to devices as well as software usability but it wasn’t until I read about the urinal flies that I realized affordances don’t always have to have a physical representation. For example, a well designed software process should gently nudge a team to do the right thing. Since there is no one-size-fits-all process that works for all teams it is essential that the process complements the team and that the process’s affordances nudge team members to do what’s best for the project and the team.

Using a process that lacks the right affordances could have one of two possible outcomes. In the best case, the team abandons the process because they realize subconsciously that it is telling them to do the wrong things at the wrong times. This is bad because it sacrifices repeatability; you’ve regressed back to an ad hoc, “make it up as we go” state. In the worst case, the team sticks with the process and it leads them astray. This introduces risks into the project and could lead to complete project failure.

Software is already difficult enough to build successfully and processes are supposed to make software development easier. Unfortunately, knowing when something isn’t working is not an exact science, but with a dash of experience and little team reflection (for example from regular postmortems) it is possible to figure out when you are working for your process instead of your process working for you. To demonstrate this I am going to tell you a story.

Our Process

My studio team in the Carnegie Mellon software engineering program is charged with building a web-based requirements elicitation tool that helps users follow the SQUARE process out of the SEI. About halfway through the Elaboration Phase of the project (sometime in the spring semester) the project was going downhill. The warning signs were fairly apparent, we were missing milestones, tasking priorities were confusing, and a lot of work was stalling out at different levels of partial completion. Though we knew there was something wrong we weren’t really sure what was causing it, what we were doing wrong in our planning and tracking process.

The planning process we were using was fairly simple. At the beginning of the phase we looked at all the activities and artifacts that need to be completed by the end of that phase. For each identified milestone we enumerate specific entry criteria, general tasking, validation procedures, and exit criteria. This is a technique known as ETVX (entry, tasking, validation, and exit). Next we used planning poker to estimate how long we thought each milestone would take to complete. Finally, with this information we created a phase timeline which includes known due dates and dependencies between milestones.

Since we’re using an iterative approach to complete work in a phase, iterations follow largely the same planning process on a smaller scale. As a team we identify the milestones on which we will work during the iteration. Each milestone is assigned an owner whose job it is to ensure the milestone is completed by either delegating tasks or working on it themselves. The planning poker estimate is used to determine the approximate workload allocation on the team. This estimate is validated with bottom-up estimates that team members create based on their individual tasking.

There are several good things about this process. First, it’s written down and the team follows it. This is good because it means we can produce repeatable results over time. Second, this process makes use of several practices that are generally considered “good” by software experts. ETVX is a great way to clearly identify project milestones. Planning poker is similar to the wide-band Delphi estimation technique. Third, we’re using two forms of estimation to validate the plan as more information becomes known. Finally, the engineers responsible for the work determine the specific tasking and creating the bottom-up estimates.

You’re Good, but not That Good

In spite of all the good things we were doing, something still wasn’t connecting. The big aha! moment occurred about two weeks into the second iteration. Up to that point I had been working on my tasks that had carried over from the first iteration. The team leader noticed that almost no work had been started on the milestones I owned. [An aside: this, to me, says that at least our tracking process works somewhat well.] During the discussion that followed I became extremely defensive when the team leader asked me to shift priorities for the rest of the iteration. What should have been a simple request turned into a heated debate over tasking. I felt compelled to complete the past due work and here was this jerk trying to stop me. “Sure,” I thought, “I’ll do what you ask, buddy, but when this whole project comes crashing down it’s on your head, not mine.”

Later, as I looked back at the incident, I wondered to myself, “Why was I so defensive in light of such a simple request?” The reality was that the project wouldn’t come crashing down if I shifted priorities and I knew that. So why defend these older tasks when it was obvious that there were more immediate needs?

It turns out that the affordances built into the planning process were encouraging my behavior. There were a few simple things at play that, when combined, decreased our ability to plan effectively.

First, our process encouraged us to plan more work than time allowed. This was due to there being a missing connection between day-to-day progress and the “big picture,” the overall plan. Second, though the new team leader may have believed there was consensus, the team in fact did not wholly agree with the priorities for iterations. This behavior was not specifically discouraged by our planning process and so allowed to persist. Third, leftover work was not addressed during planning. Some tasks might simply expire while others may change priority, becoming more or less important with a new iteration. Since this wasn’t addressed it created a sense of urgency for individuals carrying over work from iteration to iteration. Finally, assigning milestone owners had unanticipated side effects. The goal was to ensure that someone was taking responsibility for coordinating and monitoring milestone work. This worked so effectively that milestone owners exhausted themselves attempting to finish milestones and resisted changes to the plan that prevented them from finishing what was promised.

When it came time to make a necessary modification to the plan, our process encouraged us to fight against the best course of action for the team. We didn’t have the level of flexibility needed due to our process’s affordances nudging us to do the wrong things. Milestones were slipping and people wanted to finish what they started. Project priorities were shifting as the project matured but team members were wearing blinders, ignoring the changing facts around us. To stand a chance at success we had to change the affordances in the planning process. We had to nudge the team in a new direction.

Our Solution

To try to solve this problem we decided to incorporate some of the planning principles from Scrum, specifically the product backlog, sprint backlog, and sprint planning meeting, into our planning process. Scrum takes a more task-oriented approach when planning iterations and correlates the sprint backlog with the product backlog. This better encourages the team to not plan more work than there is time to complete while connecting day-to-day work with the overall plan. Scrum also requires that the team reprioritize work when planning iterations and that we agree on the resulting priorities. This will hopefully eliminate the prioritization conflicts we experienced during iterations. With Scrum, leftover work from iterations is saved in the product backlog. This change decreases the anxiety team members feel when work is left undone (because the work is not forgotten) while simultaneously giving the team more flexibility to change direction as the project progresses. Finally, the team, rather than individuals, takes ownership over the milestones held in the product backlog. With each commitment made during iteration planning, the whole team buys in effectively shifting the passion and dedication individuals held for owned milestones to the commitments we agreed on as a team.

I’m not really sure how Scrum is going to turn out for us. I think the most important thing is that we recognized that something was not working and took action to correct it. I personally would rather see the team fail in a new and spectacular way rather than repeating the same mistakes again and again.

Add This to Your Silver Toolbox

Unfortunately, I don’t think there is a trick for detecting these sorts of process failures. Data and metrics can help but only if the process is repeatable and the team has the knowledge and discipline to collect the data in the first place. Team postmortems can help but if individuals are afraid to raise concerns, you’ll find yourself on a trip to Abilene before you realize it. In many cases, if you think something isn’t going well, others are probably thinking the same thing. Once I spoke up I found out that others thought something wasn’t working also. I was just the first person who was able to articulate it.

Affordances are powerful but subtle mechanisms. In well designed things, we aren’t supposed to be consciously aware of them. But that doesn’t mean they always nudge us to do the right thing.