Posts tagged ‘software engineering’

Improvisational Architecture

If you were to go to two improvisational jazz performances, even two concerts by the same band, you’d hear different music at each show. The thrill of experiencing something unique, as it is created, without any kind of real plan or rehearsal, is both exciting and entertaining. Sometimes the music is so spectacular you feel like you’re witnessing a miracle. Other times the group can’t seem to get it together and the show is one long, painful indulgence in artistic expression. No matter the outcome, what you hear during that performance will never be experienced again.

Music has always served as a great metaphor for thinking about software development. At XP2010 during one of the most original key notes I’ve ever seen, Bjørn Halterhaug and John Pål Inderberg, professors of musicology and improvisation respectively, from the University of Technology and Science in Trondheim, Norway and both veteran jazz musicians, discussed, through music and words how they approach improvisational jazz, the mechanics for making it work, and the general implications on collaboration among artists. It’s tempting to romanticize improvisational jazz as a truly spontaneous creation but this isn’t exactly how it works in reality. For any improvisational jazz band to be successful its members must exhibit seven key characteristics.

  1. Provocative Competence Interrupting Habit Patterns
  2. Embracing Errors as a Source of Learning
  3. Minimum Structures that allow Maximum Flexibility
  4. Distributed Task: Continual Negotiation toward Dynamic Synchronisation
  5. Reliance of Retrospective Sense Making as Form
  6. Hanging out: Membership in Communities of Practice
  7. Alternating between Soloing and Supporting

I propose that these seven characteristics apply directly to how teams who espouse emergent design should approach software architecture.

XP2010 Keynote: Bjørn Alterhaug and John Pål Inderberg improvising jazz music on stage.

Music and Design

Experienced musicians have an attenuated ear which enables them to hear music in ways that novices can’t. Drawing on a well-developed playbook of clichés, an expert in the standup bass for example can not only follow the minimally, well-defined structure of a song but also weave pleasing bass licks into the musical ramblings of his band mates, adjusting his actions based on the paths his band mates choose with appropriate responsiveness as the song develops. Part of it is musical maturity, a taste or style developed over years of playing, part of it is also trust both in himself as a musician but also in the rest of the group, that they too will not overreact to mistakes and are also able to follow a musical flow as it develops.

Well-defined but minimalist structure. Contrast this with the music played by a symphony which is also well-defined but has explicit and detailed structure.

When I hear this I immediately think of architecture as seen through the lens of agile software processes which advocate strongly for emergent design. Agile software development processes and agile culture in general strongly encourages teams to become improvisational architects. Create a minimalist, well-defined design of the system and then allow the developers to evolve the system, creating the design as they develop, playing off the code written by their fellow teammates as the system emerges. The romantic view of this is appealing to some: “Let’s go have a code jammin’ session, you dig?” But the results of an emergent design can be just as varied as improvisational jazz bands; some designs are elegant and spectacular successes, some designs are flaming piles of disaster, most turn out somewhere in between – usually good enough for customers who aren’t paying too much and who expected a somewhat unpredictable but functionally adequate outcome.

So if you enjoy the thrill of experiencing once-in-a-lifetime moments of creation, the outcome of which can never be predicted or known ahead of time, then emergent design is probably right up your alley. As demonstrated by the near exclusive emphasis on code and shipping, many agile developers fancy themselves as code jammin’, improvisational architects.

The exact opposite of emergent design, of course, is the easy to hate (and rightly so) enemy of creating things that actually work, Big Design Up Front, where a heavy cost is levied against a project in the beginning without producing any usable code whatsoever. Sticking with the music metaphor, the well-defined, explicit, and detailed structures of a symphony might take a composer months or even years to create. But in exchange for this planning you get a piece of music that is guaranteed to be executed (for all intents and purposes) in a nearly identical way in every performance in which it is played. I am not saying you should use a Big Design Up Front approach, in reality, even Mozart tried out music and made incremental deliveries to his sponsors as he wrote.

While certainly a degree of improvisation happens when we develop software, customers paying for software usually expect more predictability in the outcome. So, while it might be ok to spend 30 bucks for a fun first date with a girl at a jazz club – if only for the experience of it – when I pay 100 bucks a ticket to see the symphony play Mozarts’s Requiem, I would be more than a little disappointed, angry, and confused if the first chair violinist “felt a groove” in the middle of the second movement and began jamming. The greater the cost of a project, the more highly valued predictability in it will be.

Who can be an Improvisational Architect?

Mozart was a true genius, a master of composition. To play Mozart, you need to be well practiced but you don’t need to have his level of genius. On the other hand, most members of a great improvisational jazz group must be experienced musicians – experienced in composition as well as technical playing ability. Even then, a great group will usually avoid disaster but occasionally it will still happen because of the nature of improvisation.

In my experience, teams that plan to allow their designs to emerge do an ok job of providing the minimal structure that is essential for evolving a design. But only a few developers have the necessary experience and background to truly improvise a design. The implication? Most teams are terrible at improvising architecture because they have neither a solid enough understanding of the core fundamentals for software architecture nor the experience with using architectural design to reason about a system.

I believe that most of the seven characteristics of improvisation are deeply ingrained in agile culture, but agile teams often fail to fully embrace all seven characteristics of improvisation when designing a software system. A team without the experience, who doesn’t understand design cliché’s (architectural styles and patterns) will have an unpredictable outcome when allowing the design of a system to emerge as the system is developed. This is something the agile community must work to resolve.

Bonus

Cory Foy was thinking ahead and had the foresight to record video of the performance/keynote.  It comes in 6 parts.

Ingvald Skaug was also inspired by this talk and wrote an interesting essay which explains kanban as agile jazz. Check it out for a process perspective on the idea of “minimal structure for maximum flexibility.” If you believe Conway’s Law, then it isn’t coincidence that we would apply the characteristics of improvisation in jazz to both process and design.

Hacking is not a Dirty Word

I used to think hacking was bad. It was something you did when you didn’t have a plan, when you didn’t know what you were doing; it’s what amateurs do, noodling around the code without clear direction or intent. Or it was something you did, quick and dirty, just to get something out the door. Hacking is the best, surefire way to waste time and create mountains of unmaintainable code you’ll have to rewrite anyway. I’ve always struggled with this since I feel so drawn to hacking culture – DIY, playing with things just to see how they work, ignoring the instruction manual, reveling in the aesthetic of a job well done, making what you can within limited constraints – and many of the brightest minds in software development speak quite passionately about the merits of hacking.

Thinking about this for a while, I’ve come to the conclusion that there’s nothing wrong with hacking. Quite the contrary, hacking could be one of the most positive things a team can do. In fact, many of the agile processes and techniques that have worked for me in the past seem to be designed to cultivate teams of hackers.

It seems I had fallen for some of the myths about hackers.

Myth: Hackers don’t have a plan

Hackers do have a plan but they acknowledge that the plan will change and then plan with this in mind. This means creating plans only to the level of detail to which you are currently comfortable making plans. For example we all know that making a detailed plan at the start of a project, down to the code level, for all but the tiniest programs, is silly since the plan is mere conjecture of a positive outcome. Rather than writing a fantasy on paper, hackers lay plans based on the current knowledge. These plans then expose areas for further exploration (hacking!), the resultant knowledge of which can be turned into more detailed plans.

In XP these detailed plans are created just in time during the iteration planning game, sometimes even later than that. That’s not to say that there wasn’t a plan before the iteration began. Not so. A great degree of planning, communication, and coordination was required to be able to plan so adeptly, so quickly.

Myth: Hackers don’t know what they’re doing

In the literal sense, a lot of times hackers don’t know what they are doing. That is why they are hacking. The purpose is to figure out how something works or the best approach for something. And that’s the key. At a meta-level, hackers know exactly what they are doing – they are searching for information, for knowledge, for a means to achieve an end.

description of the cyclic nator of action research

Action research is a relatively new-to-software-engineering research method that seemed to take agile researchers by storm at XP2010. The basic premise is that a researcher works closely with a consenting organization to observe practices in software engineering. Rather than acting as a passive observer, as is traditionally done, the researcher injects practices directly into the team and observes before and after reactions. For example, in recent work by Tor Erlend Fægri, having observed issues with team collaboration he suggested several techniques to help solve the problem. Some suggestions worked, some didn’t. Fægri was able to take the time to understand what happened and why because he was dedicated to understanding almost as a participant, watching the events unfold. This take on research makes me think of Jane Goodall’s work with chimpanzees in which she directly interacted with wild primates in order to better observe their behavior.

Jane Goodall with a chimpanzee

When I first heard about action research it sounded a lot to me like hacking – trying things out to understand how something works.

Myth: Hackers don’t create value

Yes, the code that comes out of a hard core hacking session might be a big pile of spaghetti, but consider the purpose of this work in context. Another perspective is that hacking may incur technical debt which sooner or later will need to be paid off. But it’s important to keep in mind that not all technical debt is bad. When you are hacking, you are either in learning mode or in finish mode. Learning mode is never wasted effort. Finish mode, where you are just trying to get something finished so it can be used, can leave scars on your code, but the immediate value provided to customers can be enormous. When you have a desperate and immediate need, anything that works today is better than something perfect that works next week.

The same goes for architecture. Of course, many of the architecture researchers use words and phrases like “experiments” or “trade-off analysis” to describe this same act of quickly creating something for the sake of knowledge. That’s really just hacking too.

Sometimes the artifacts you produce are throwaway. Sometimes the working solution is good enough and you’ll never have to worry about it again. Sometimes you’ll have to pay off the technical debt. But hacking for the right reasons is never wasted effort.

Agile Development Helps Cultivate Good Hackers

Agile software development, be it controlled through time boxes or flow, is all about ensuring that work doesn’t teeter out of control a day at a time. The focus on providing customers with value creates an environment that encourages action and creation. The agile manifesto gently nudges hacking behaviors. Frequent releases of working software, welcoming changing requirements, focus on simplicity and continuous attention to technical excellence, helping the customer understand what they need and then doing your best to make it for them. Even the idea of continuous process improvement is about hacking your team to make yourselves better. Hacking is the very lifeblood of agile software development.

Hacking isn’t a dirty word. Hacking could be the most important thing you do all day.

Lightweight Experiments for Process Improvement

[This post is a recap on the second talk I gave at XP2010. This was the big one, the experience report talk, one of 15 experience reports published at XP2010. You can download the slides (pdf) or the full paper (pdf) from this website or from XP2010.org.]

Process improvement is important for nearly all teams but it can sometimes be difficult for a team to know what is working, what isn’t working, and what to techniques or methods to try when attempting to improvement. Performing a scientific experiment is one way help overcome these problems but as academic research has shown us, while experimentation can yield interesting results, running an experiment is time consuming, expensive, and requires some serious thinking and control to pull off. From a practitioner’s standpoint this means that experimentation is a non-starter.

Of course, that’s only if you run experiments like an academic.

Banner from the XP2010 conference in front to the hotel.

Back Story

Just over a year ago, my MSE studio team at Carnegie Mellon had a problem. We had decided we would use Extreme Programming for the construction phase of our project but some team members had doubts concerning pair programming. We had decided that we would use some kind of peer review, having already seen the many benefits of inspection when reviewing other artifacts. The dispute arose over whether pair programming would give similar enough results. Also, not all team members had experience with pair programming but everyone on the team knew and enjoyed solo programming.

The number one concern was whether pair programming would allow us to meet our very strict deadline. We had just over three months to complete the construction phase of the project. According to our threshold of success this meant implementing all “must have” requirements with a minimum level of quality. Did we really have time to waste by having two people working on the same code at the same time? Wouldn’t working independently and inspecting code on an as needed basis allow us to get more work done faster?

At the time it just so happened that I was taking a reading class with Mary Shaw and in that class we discussed some research findings that might help settle this debate. Research from Laurie Williams, Ward Cunningham, Barry Boehm, and many others showed that pair programming requires more effort (but never double the effort) but is faster than programming alone (pdf). Also pair programming creates code of about the same quality as coding alone with inspection (pdf). Of course, the research may not apply to us since Square Root is closer to a professional team working on a large project with a real client, not undergrads working on short term toy projects.

After an iteration where some teammates used pair programming and others refused, we decided to try an experiment to see which practice actually worked better. The original idea was that we might be able to validate some of the research but decided instead that it was more important just to resolve our own internal conflicts and figure out which processes worked better.

Conducting a Lightweight Experiment

With the scientific method as our guide we planned and executed a lightweight experiment which pitted programming alone against pair programming. The results were amazing (and you can find the raw data in our project archive). In conducting the experiment we used a set of novel techniques which I think can be useful in conducting other lightweight experiments. There’s more background in the experience report so I’m only putting the meaty stuff in this post.

Focus narrowly on a single question – The essential key to keeping an experiment light is to only tackle one thing at a time. In this case we focused on comparing and contrasting a single technique, pair programming, rather than multiple techniques or an entire process (such XP vs. TSP).

Divide work, not teams – If I were comparing pair programming to programming alone in an academic setting, I would put together two teams of about the same experience and have them each build their own version of the same software, one team using pair programming, the other programming alone. In a business setting this is a complete waste and few companies can afford to have two teams duplicating effort. By dividing work instead of teams you may lose some control over variables in the experiment but in most cases isolating more variables doesn’t add any further clarity to helping answer the narrowly focused question. To divide work successfully you need to have some way of estimating work units for division. We used use case points as shown in the figure depicting our modified planning game.

Steps in modified planning game for dividing work into experiment groups

Continue making releases – Since we still needed to make a comparison, rather than dividing into teams and duplicating effort we divided the features that were released each iteration. In this way we built about half the features released during an iteration using each technique. Working on about half the features using pair programming meant that at least some features were being built by individuals. At the time this was a risk reduction decision to make sure that if pair programming completely failed we’d still have something to ship at the end of the iteration. Explicitly managing risks is the only way to know if the lightweight experiment may cause problems for making releases. Also, we had a strictly defined cut-off for stopping the experiment if it ever stopped us from shipping to our client.

Use the data you have – In almost all cases we were able to get the data we needed to evaluate our hypothesis from our current process. When we couldn’t, we only had to make minor modifications to our data collection practices, for example adding a check box to our SharePoint server for indicating whether a task was paired or individual.

One of the more interesting things we did was to create a “tally sheet” for collecting pair programming issue detection statistics in real time, as the issues were discovered. Given the near instantaneous code-inspect-fix cycle when programming in pairs, this was the only way to collect similar data for comparing pair programming to inspection.

Example of a real time tally sheet used for tracking issues discovered while pair programming.

Statistical significance is overrated – The whole point of running a lightweight experiment is to collect just enough data to help you make a better decision or validate your gut feeling. This technique is not meant for uncovering universal truths or proving something to the rest of the world. In exchange for keeping the experiment light, the results will only apply to your team. Over the course of an iteration or two, 4-6 weeks, you’ll only get enough data to start to see trends. In our case the results were not statistically significant using individual T-tests but that didn’t matter. The most important thing is that we had data that could be used for comparison, data that everyone felt good about and that helped us gain clarity into what we did and how well it worked.

Retrospectives get immediate value – The whole reason the experiment is light is to reduce cost and decrease the lag time to providing value to the team. Just to give you a little perspective, it took us 6 weeks to run the experiment and had enough data and casual observations to make a decision during the retrospective when the analyzed data was shared. That event occurred in early August of 2009. This experience report required almost nine full months of gestation from the paper proposal to the talk I gave at the conference. The gestation period on “universal truth” research can be even longer. We, as practitioners, don’t have to wait for those universal truths to be born to get value from research. By running your own quick and dirty, lightweight experiments, you can get results in a timely fashion that you know will apply to your team because your team was the subject of the experiment. It’s all about closing the gaps between research and practice and taking the information you need now instead of waiting for academic research to catch up.

Overall Conclusions

For the Square Root team it turned out that pair programming was faster, cheaper, and produced code that had more predictable albeit slightly worse quality. The more important lesson is that we discovered a technique, lightweight experimentation, for learning other interesting things about our team and about software engineering in general.

My paper and this blog post were all about trying to describe the technique, using our experiment as an example. I think it would be awesome if teams around the world conducted lightweight experiments on a variety of topics. If enough folks share what they learn, we might start to see trends emerge across teams that could lead to universal truths, validate research, or at least discover some great rules of thumb.

What else might make for a great experiment? Anything you’ve got a question about on your team!

  • What is the clearer way to write requirements, user stories or use cases?
  • Which estimation technique is more accurate of X and Y?
  • Can we skip unit testing if we use inspection (looking at quality, knowledge sharing)?
  • Is UML a better design notation than the one we made up as a team?
  • What else…?

If you do a lightweight experiment, let me know! Share what you learn as a blog post or whitepaper. Let others know what you’ve learned! Even if the specific results only apply to your team and the way you’ve executed your project, your experiences help form a baseline, a sort of shared understanding for how software development works, how some of these practices work. And there’s so much about software engineering that we have yet to learn.

Acknowledgements

This paper was my first experience report and it was an awesome journey. Naturally a lot of folks helped me along the way and I would like to take a moment to make sure they know that I appreciate their influences and support. The Square Root team: Marco Len, Yi-Ru Liao, Abin Shahab, and especially my fellow experiment co-champion Sneader Sequeira for having the guts to go along with this idea in the first place. Some of the faculty at Carnegie Mellon: Dave Root and John Robert (my studio mentors) for bringing up the idea of writing a paper, and Jonathan Aldrich for helping review my proposal. Artem Marchenko was my XP2010 paper shepherd after the proposal was accepted, and the quality of each draft only improved because of his inputs. A group of my fellow employees at Net Health Systems sat through an early draft of the presentation I gave and shared valuable feedback for improving it. And finally I thank, Marie, my wife, who was with me from start to finish and read more drafts and sat through more practice talks than anyone else. She’s probably as much an expert on this subject by now as I.

A Final Aside

I wrote the initial draft of this paper as my final reflection paper for my Master of Software Engineering degree (pdf). That draft has a very different tone, approach, conclusion, and direction than what I eventually published for XP2010. This is half due to there not being a hard page limit but also I had a lot more time to think about what was really important when writing for XP2010. There’s some interesting information, mostly in the lessons learned, that might prove interesting to those who are interested. You should check out my Square Root teammates’ reflection papers as well since they are all interesting and well written.

Lessons from a Software Engineering Dojo

[Here's a recap of the first talk I gave at XP2010. Since it was a Lightning Talk, rather than posting slides I've summarized my talk and added references directly in this post. I welcome comments and discussion.]

Craftsmanship is an interesting model for thinking about how to teach someone to become a great software engineer. Industry hasn’t always done the best job taking advantage of this metaphor for enabling training and instruction. Sure, there’s agile coaches and conferences like XP2010 where peers can collaborate, but rarely does an organization, a business, deliberately encourage and enable engineering growth for the software engineers they hire.

As we learn how to build software we go through the three stages of craftsmanship.  For most of us, we are apprentices in university, taking courses and learning the basics of computer science and software development by imitating our professors and the books we read.  We are journeymen the first few years on the job as we start our careers, applying the lessons we learned in school in practical setting and trading tips with fellow journeymen.  Eventually some of us pass some kind of test under the tutelage of a master and are ourselves declared as such.  The frustrating part is that so few people find masters to help when attempting to cross the threshold from journeyman to master.  How do you know when you’ve made it?  Where are these great masters, these mentors for helping to learn how to be a great software engineer?

3 stages of craftmanship - apprentice, journeyman, master

Wouldn’t it be great if there were a place, a dojo if you will, that we could go to practice with other journeymen under the guidance of masters, interacting with apprentices just starting out on their craftsman journey?  As it turns out there is such a place.

The Master of Software Engineering program at Carnegie Mellon has been teaching professional software engineers how to build software better for just over 20 years now. The faculty and staff at the MSE have honed some practices that can be directly applied in industry. Normally I wouldn’t advocate transitioning academic education practices to an industrial environment but the MSE is a near perfect hybrid of industry and academia. The studio project, the capstone project which forms about 50% of the curriculum is a long duration (16 months), real project with business clients who expect software that will provide real business value. Commitment varies and during the summer semester, student teams are working on the studio project as a full time job, dedicating over 40 hours a week to the project. In addition, unlike most academic programs, all students are experienced engineers with at least 2 years of industry experience.

A dojo is a place for training, a place where a variety of students with different backgrounds come together to practice and become better at their craft. So while the MSE makes for an excellent dojo, it’s not easy for everyone to move to Pittsburgh for 16 months of intense study.  So, how can the success of the MSE be applied within industry? I think that there are six key practices where the MSE excels that industry should take note for training and professional development.  These are practices which can be applied in nearly any business setting with effective results.

Education – In school we can take classes. On the job we can read books, start discussion groups, read blogs, and go to conferences. Education becomes a catalyst for growth.

Mentoring – Mentors are guides who encourage growth by asking probing questions and pushing those being mentored out of their comfort zone. Mentors are there to dust you off when you fail and never directly solving problems for those being mentored (a favorite technique of the MSE mentors is to answers questions with questions).  In the MSE program, every student meets with a mentor once a week for 30 minutes to discuss how the project is going and thoughts on software engineering. This is a significant commitment for industry and so holding mentor meeting perhaps over lunch maybe once a month is sufficient. The point is to help novices and journeymen to find masters for guidance.

Proposals – Proposals help teams focus on the think-act-reflect cycle for approaching software from an engineering perspective. In a proposal, teams think through methods, processes, and techniques that will be used and this written (it may be brief or as detailed as necessary) proposal becomes the basis for evaluation and reflection for the team. In essence the proposal acts as a plan for determining an approach to software engineering practices.  The whole point is to get student engineers to start thinking in terms of the simple to see but takes a lifetime to master, cyclic think-act-reflect approach to problem solving.  See this article from my studio team’s reflection blog on understanding when decisions are made and the complete archive of proposals from my team and others are available from the MSE studio archive for some concrete examples of how proposals work.

think-act-reflect cycle

Presentation & Critique – Communication and collaboration is a powerful tool for learning. During a presentation and critique, a team presents a proposal and that proposal is the critiqued by both mentors and peers. The comments and questions are then taken into consideration when revising or changing proposals. This is a powerful tool that doesn’t cost much and fosters knowledge sharing across an organization.

Peer Collaboration – This is so obvious I shouldn’t have to say it but simply talking to peers is one of the most often overlooked sources of information and learning. Many professional environments inadvertently create physical barriers which further prevent collaboration. Team lunches are nice for getting to know each other, but genuine collaboration must involve asking hard questions and then collaborating with a diverse group of individuals to help answer those questions.  Presentation & Critique is one way to facilitate this.  Setting up the environment to encourage collaboration is another.

Reflection – I have come to believe that this is the single most important practice in software engineering. If only more professionals would take the time to reflect on what they do and use that reflection to drive improvements then many of the most difficult problems we face as an industry would begin to resolve themselves. Effective reflection is ongoing, mentally intense, and difficult to do well. It involves both hard data and soft feelings.

All of this combines to create a place filled with passionate software engineers of all different levels of mastery, each learning from one another, and taking the field of software engineering to an entirely new and better plane of existence.  If you are ever in the Pittsburgh area, please stop by the Cave (the place where the studio teams do their work) for a tour!  You can contact information and further details about the program on the MSE Website.

References

See you at XP2010 in Trondheim, Norway!

As I’m writing this I am making final preparations to leave for Trondheim, Norway to present an experience report at the 11th International Conference on Agile Software Development and Extreme Programming, or XP2010. After my experience report was accepted, the conference organizers opened up a second round for lightning talk proposals, and in a moment of whimsy I decided to propose another talk. That was accepted as well. So I’ll be giving two talks at XP210.

If you’re at the conference I would very much like to meet you! This is my first time presenting at a conference and only my second conference attended (my first was OOPSLA this past October) so I’m a little nervous and unsure what to expect. Above all I hope that I’m able to share some information that is useful, interesting, and inspirational. I think I have some interesting insights and perspectives that can help make the way we build software just a little bit better and I want to hear your opinions, your thoughts and experiences too. If you can’t make it to my talks, come find me, I’d be glad to discuss any ideas with you. I plan to have a copy of the paper and slide decks available on this website along with a brief synopsis of the talks shortly after the conference. And of course I’ll be looking for people to eat meals with and hang out while I’m here on Wednesday and Thursday. I’m here to present, to learn, and to discuss cool ideas with other practitioners.

Here’s a summary of the two talks I’m giving along with some background information. The background isn’t necessary but just interesting information and context related to the talks.

Put it to the Test: Using Lightweight Experiments to Drive Process Improvement

This experience report tells the story of how my team ran a lightweight experiment to figure out whether we should use pair programming or program alone and review our code with Fagan inspection. With only a few hours of work and only a few weeks time we discovered that pair programming was instrumental to our eventual success.

In this talk I will discuss what we learned about setting up and running the experiment so you can run lightweight experiments of your own on whatever topics your team finds most interesting or pressing. Experimentation doesn’t have to be this overbearing, lofty, academic thing that it has become. My hope is that teams around the world will use this technique to discover just a little more about how software engineering works and that they’ll share what they learn in white papers, blog posts, and future experience reports. We can help close the gap between research and industry with just a sprinkling of scientific thinking.

Background Information

Lessons from a Software Engineering Dojo: The MSE at Carnegie Mellon University

At OOPSLA in Orlando, Florida this past October I heard proclamations like this more than a few times: “The only way to teach software engineering is through experience. What we really need is a software engineering program that uses a capstone project, a non-trivial, long term project that lets you practice what you learn. No university programs currently have such a project in their curriculum.” I agree. In fact, I agree so much that I attended the only university in the world in which a long-term, realistic (in both scope and complexity), team-based capstone project is an essential part of the software engineering curriculum.

I have two goals with this lightning talk. First, I aim to spread the word about the existence of the Master of Software Engineering program at Carnegie Mellon University. Carnegie Mellon is on the forefront of software engineering education research and the Master of Software Engineering program has been teaching professional software engineers to become true masters in the field for over 20 years. Second, since the studio component (the capstone project) of the MSE degree is so similar to an industrial setting, there’s a lot industry teams can use for training and educating software engineers as you work. There are tons of lessons that can be taken from the MSE in the form of both research and battle tested experiences. This second goal will be the greater emphasis of the talk.

Background Information

The Reality of Risk Exposure

Over the past few weeks I’ve been thinking a lot about risk exposure in the context of managing projects. Exposure is a technique used almost universally when managing risks, yet as I’ve already discussed, exposure can cause major problems because it’s a precise number based on mostly made-up information. At the same time, exposure is used widely and successfully – otherwise there wouldn’t be as much literature throughout the web telling you to calculate risk exposure.

This begs the question: is risk exposure really as meaningless as I’ve made it out to be? I’ve collected some data that helps answer this question.

Data Collection and Context

Risk management is one of the basic subjects covered in the Managing Software Development course, one of the five core courses students of the Carnegie Mellon Master of Software Engineering program take in completing their degree. Students learn about the continuous risk management paradigm from the Software Engineering Institute. Two of the cornerstones of this technique are threshold of success and condition-consequence based risk statements.

Having ready access to risk management experts at the SEI, nearly every team conducts a facilitated small team risk evaluation workshop in which risks are collected with the help of a taxonomy-based questionnaire (pdf), analyzed, and prioritized using group multi-voting. The basic workshop has been conducted the same way for close to a decade and many teams have put their risk data collected during the workshop in the MSE’s project archive.

I’ve gathered data from these small team risk evaluation workshops for 9 MSE Studio teams, a total of 164 identified, analyzed, prioritized risks.

What’s in the Data?

During a risk evaluation workshop, teams identify risks using their threshold of success as a guide. Once identified, risks are briefly analyzed and assigned an impact, probability, and time frame value based on a rough average from the team members’ initial gut feeling on the risk. These values are assigned simply so when a manager asks to see the probability, for example, there is a value to give him. Each of impact, probability, and time frame can only be one of 3-4 values. The idea is that by decreasing the precision we can increase the accuracy. Values are assigned based on a rubric. For the purposes of calculating a risk exposure I assigned each of the analysis categories a number. Time frame is not used in calculating exposure.

Impact

  • Catastrophic – The team will be unable to meet threshold of success. (numeric value 4)
  • Critical – The team can only meet the threshold of success with significant additional effort and stress. (numeric value 3)
  • Marginal – The team can meet the threshold of success with minimal extra effort. (numeric value 2)
  • Negligible – There is no real impact on achieve the threshold of success or little increase in effort. (numeric value 1)

Probability

  • High – Chance of becoming a problem is above about 80%. (numeric value .8)
  • Medium – Chance of becoming a problem is about 50/50. (numeric value .5)
  • Low – Chance of becoming a problem is below about 20%. (numeric value .2)

Time Frame

  • Short – May occur in about a month or less.
  • Medium – May occur in 1 to 3 months.
  • Long – May occur in more than 3 months.

Instead of relying on the results from the analysis, teams perform 3 to 4 rounds of multi-voting. The final multi-voting rank is shown. Not all teams ranked all risks since teams generally only deal with the top few risks, usually less than 10. This idea is captured in the priority. A risk is either a high priority, meaning the team is actively addressing it, or a low priority meaning the team is aware of it but it was not ranked high enough to deal with yet. Teams might choose different strategies for determining priority. The two most popular are to only examine the top X or to rely on consensus derived from how the risks clustered as a result of multi-voting. Usually there is strong team consensus for the top 4 to 5 risks and weak consensus after this.

Analysis and Discussion

My hypothesis is that teams’ rankings will generally match exposure, meaning that risks that are ranked highly will also have a high exposure. As the data shows, this is generally the case. On average nearly every team’s high priority risks were also the ones with the highest exposure.

Graph showing Teams' Average Risk Exposure by Priority.

Examining the risks rank and exposure tells a similar story but not convincingly. There is a relatively weak negative correlation (correlation coefficient of -0.22) between exposure and team assigned rank. Basically the best that can be said is that there is a general downward trend in exposure as the rank increases but there is enough variation that I can’t really say anything for certain.

Graph showing risk data for all teams.

I have two possible explanations for this. First, traditional risk exposure does not take into account time frame while teams evaluating risks in this data set do. So, all things equal from an exposure perspective, a long term risk might be ranked very low while a short term risk will be ranked much higher. If this were the case, we’d see more short-term risks assigned high ranks than long-term risks and this is indeed the case. In fact, the majority of risks identified are short-term risks with nearly three times more short-term risks being identified than long term risks. Mid-term risks are, unsurprisingly in the middle. A better exposure number might be had by taking into account risks’ time frame values.

Graph showing the count of risks per time frame by rank

The second possible explanation I have is that 3 – 4 buckets isn’t sufficient to allow for enough variation to form a strong correlation between rank and exposure. Indeed this is one of the greatest differences between this data set and traditional risk exposure calculations in which impact might take on nearly any number and exposure is usually a percentage from 10 – 100%. That said there still is a general trend which shows that most of the time, multi-vote ranking very roughly corresponds to exposure.

There is one more catch about this data and it’s a subtle but important one. Values for probability, impact, and time frame were determined as a team using a sort of rough average approach where team members vote and the approximate averages are rounded to the nearest bucket. Since all the values and rankings were determined through a group effort, it would make sense that they should roughly correspond.

Conclusions

As it turns out, risk exposure is a rough and somewhat accurate indicator for relative risk priority, at least when calculating exposure or rank using group-driven techniques. Teams relying only on exposure are likely to rank some risks higher than they otherwise might. Part of this is due to exclusion of the concept of time from traditional exposure, part of it might be differences of opinion within the group as far as impact or probability are concerned.

Talking with other MSE alumni, and I mostly agree with them, the most important thing about risk management is bringing up concerns and talking about them. Delphi mutli-voting is an easy way to encourage conversation since differences of opinion are addressed as part of the multi-voting process. No matter what technique you use, exposure (with time somehow included), multi-voting, or some combination, do not reduce risk management to simple numbers. It’s really all about communication. Encourage this communication using whatever techniques work for your team.

Raw data used for analysis in CSV format.

A Closer Look at Risk Burndown

I like the idea of the risk burndown chart. Burndown is an effective and satisfying visual indicator of progress and it’s relatively easy to calculate to boot. But does looking at a project’s risks through the lens of a burndown chart make sense?

I see several problems with thinking about risk in this way.

Numbers can be Misleading

The first key to effective risk management is to value accuracy over precision. This means that it’s better to be right in your predictions than it is to be spot on correct. Remember, risk is about assessing your likelihood for project success. It doesn’t matter if you miss your threshold of success by a little or a lot; either way you still fail the project!

Pop quiz. Say there are two risks in your project. There’s a 25% probability that Risk A will become a problem while Risk B only has a 20% probability. For now, assume the impact is the same for both risks. Which risk is a greater threat to the project?

That one’s easy. Risk A is a greater threat because, impacts aside, Risk A has a 5% greater probability of turning into a problem.  Ok.  What if I told you that I made up probabilities based on my gut feelings so I could easily rank risks? Now which risk is a greater threat to the project?

The real question I’m asking you is this. Are you willing to bet the success of your project on those numbers? Because if my best guess, gut feeling probabilities are off by more than 5%, the project could be in serious trouble depending on the risks’ impacts.

I know, I know. That was a trick question. Nobody on your team would make up numbers on one of your software projects. In all fairness, nobody goes out of their way to fabricate false values. Use your logics. If you were any good at guessing the probability of futures events occurring, you would not be reading this post right now. You would be a multi-millionaire, off enjoying your gambling winnings from the ponies. Too much precision gives folks too much confidence in the correctness of your assessment when the reality is that probability and impact are based on best guesses and gut feelings. Probability and impact numbers just make it easier to calculate exposure so risks can be ranked automatically.  Burndown is a fairly precise metric.

Not all Risks are Created Equal

If you are monitoring project risk with a risk burndown chart, how do you know whether the right risks are being reduced? Let’s take a look at an example.  Which of these sets of risks should be addressed?

Set 1 with a total exposure of 7 days made up of the following risks:

  • Risk A has a probability of 20% and an impact of 15 for an exposure of 3 days.
  • Risk B has a probability of 25% and an impact of 10 days for an exposure of 2.5 days.
  • Risk C has a probability of 30% and an impact of 5 days for an exposure of 1.5 days.

Or Set 2 with a total exposure of 7 days (6.7 rounded up) made of the the following risk:

  • Risk D has a probability of 95% and an impact of  7 days for an exposure of 6.7 days.

In the first set, I can mitigate 3 risks, each with very low probability of becoming problems. In the second set I mitigate only 1 risk that is almost certainly going to become a problem. Reducing the imminent risk seems to make the most sense but this choice is not reflected in a risk burndown chart. Simply reducing risk over time is not enough. You have to reduce the right risks.

Impact Isn’t Really About Money or Effort

The only way for a visual chart such as risk burndown to work is if we’re able to quantify risks. This is generally done with exposure. Exposure = probability x impact. Impact is a funny thing. Impact is an assessment of how much the consequence of a risk will affect the project if the risk becomes a problem. Traditionalists like to think about this from a money perspective (which makes sense since software engineers stole most of our risk management practices from the finance world, originally anyway). For small teams, effort is a better measure as in the number of person days a risk that becomes a problem will cost to fix. This is a quantifiable loss.

There’s a problem with thinking about impact in terms days of loss. Since not all risks are created equal, not all loss is truly equal either. Some kinds of loss can’t be measured in terms of effort. It really all depends on your project’s threshold of success. Some example risks (which don’t rely on ye olde life-critical system standby) from which you might never recover if they became problems include:

  • We don’t have a reliable backup solution; might lose all of our project data. (Lost yer data? You’re up a creek, son!)
  • We don’t have backup power for our data center; data centers might go offline for more than a few hours. (How many days will it take you to get those customers back?)
  • The demo has bugs and our contract renewal is based exclusively on how much the client likes our demo; a bug might occur during the demo. (HA! HA! You don’t have a job!)

In all of these cases you would reduce the risk by working on attributes other than impact (e.g. reduce probability, eliminate the condition, extend the time frame). Enough said. When it comes to calculating exposure, each of these risks has a catastrophic impact. That’s catastrophic, short for epic failure. No amount of days can really capture the essence of complete catastrophe.  Impact works best when considered in terms of success, not days or dollars lost.

Forget Risk Burndown

I want risk burndown to make sense, but given the problems I can’t help but think of it as a meaningless metric. Sure, some risks will be reduced and some will go away by converting into problems or being overcome by events. And a chart showing this would be really neat. But you’ll also uncover new risks as the project goes on. And some risks are just not worth caring about while others deserve a lot of attention. Risk management is about identifying the things that are most likely to kill your project so you can deal with them before it becomes too expensive (or impossible).  A burndown chart doesn’t reflect any of these things directly.

Burndown masks project risks too much and gives teams a false sense of confidence. To put it another way, there’s a risk with using risk burndown:

Our new risk management strategy assumes our estimation precision is better than it is; we may not mitigate the right risks.

Exposure is a ruse. And risk burndown is a metric for showing a reduction in exposure over time. To wax poetic, perception is reality and risk burndown provides a false perception.

That said, any risk management is better than none at all.  If a risk burndown chart helps to get your team thinking about risk, then so be it.  But there are other ways (might not be as fancy) to manage risk which are easier and more effective.

Is Better the Enemy of Good Enough?

“Better is the enemy of good enough” is a phrase often held up as the reason for not making changes on a team. If everything seems “good enough,” the effort to make something better is regarded as waste. A lot of times, “good enough” is defied in terms of “providing value to the customer,” often stated as the “shipping working software” metric. So if you are shipping working software and receiving generally positive feedback from your customers, then what you’re doing is good enough and there is no need to do things differently.

But if good enough is really all you need, why is it so dissatisfying?

I’ve mentioned before that it’s a good idea to have a project threshold of success, a set of minimum goals that must be completed for a project to be considered successful. Failure to meet all goals in the threshold of success means you’ve failed the project. So while you will succeed if you meet your threshold goals, only meeting the goals means you’ve done the absolute minimum amount of work necessary to be successful. Satisfying the threshold of success for a project is like getting a C in school. It’s certainly good enough, but it isn’t exactly awesome.

While “good enough” is perfectly acceptable (You didn’t fail!), it always feels nice to achieve more. It feels good when I’m able to gently exceed my client’s expectations. It also feels good to do things right and not merely get things done. This is why merely shipping software, the minimum requirement for succeeding on a project, is not enough for me.

Equally important to me is how the work is done, not just that the work gets done. Is overtime or heroic effort necessary? Are we reflecting on our work and attempting to improve? Does my team work together well, take risks, and innovate in many different areas of the project? Do we try to use data to understand what is happening in the project? Are teammates given the support they need to grow as professionals? Am I having fun, looking forward to work every day, and happy with my contributions on the team? Is the software something I am proud of and actually useful to people? Is the product well supported by documentation? Is my code beautiful and maintainable? And most importantly, did I either learn something new or achieve something great while working on this project?

This is actually a lesson in understanding what “good enough” really means and why tools such as Threshold of Success are so important. Only when everyone agrees at a conceptual level that what the team is doing is “good enough” can everyone on the team move forward and be happy. Sometimes this will mean getting working software out the door no matter the cost. For most teams “good enough” will be a mix between a working software product, a happy and healthy team, and laying foundation for future work.

Making improvements in how my team operates or to the software itself just feels good. It’s fun. So while “better is the enemy of good enough,” avoiding change is paramount to avoiding the very things that makes me happy. The trick is making sure that I’m not making changes just for the sake of change. Sound engineering and a good understanding of the project’s threshold of success can help to avoid this fate. Once you’ve met your threshold, earn some extra credit by improving your code base or making process improvements. Or, better still, choose a threshold of success which requires you to do just a little more than the bare minimum. Because no one should settle for good enough when awesome is within reach.

Book Review: The Design of Design by Fred Brooks

During one of the last discussions in my great papers in software engineering reading course Mary Shaw, our discussion moderator, casually alluded to a follow-up class in the next fall semester, “Fred sent me an early draft manuscript of a book he’s been working on about design. It’s shaping up really well and I might be able to convince him to let us use it as the centerpiece for another discussion course. I’d be willing to put a class together if anyone is interested. Let me know.” Being a huge Fred Brooks fan I was one of the first people to sign up for the course. Throughout the fall of 2009 a group of professors, software professionals, and students met Wednesdays during lunch to talk about design and software engineering while reading Fred Brook’s new book, The Design of Design, in addition to a few other design classics.

The Design of Design by Fred Brooks in final book and special draft form.

Putting software aside for a moment, designing anything is challenging even for experienced professionals. Simply understanding the problem that needs to be solved requires a great deal of effort. It’s rare that all the requirements for a project are known up front – so rare that I haven’t seen such a project since my sophomore year of college! Design is as much about understanding the problem as it is about finding a solution to that problem. As you’ve probably experienced many times, your boss always wants changes made to the software after you’ve shown him something that works.

Throughout The Design of Design Brooks wrestles with the idea that design is an iterative exploration of both the solutions and problems. The notion that everything can be known up front about a problem is absurd yet that’s the way people tend to want to build software. As Brooks writes, “The waterfall model is wrong and harmful; we must outgrow it.” Amen, brother.

Design is not rational. Problems do not simply present themselves and from this, solutions flow forth. Instead, designs are born iteratively, initial problems beget partial solutions which lead to further insights concerning the problem and so on until a satisficing solution is reached – one that is, essentially, good enough for all intents and purposes. Of course, that assumes that the right intents and purposes were correctly understood and articulated.

Throughout the book, Brooks draws on examples from his own experience, some odd, such as the design of his dream home in Chapel Hill, others classic, such as the design of the O/S 360 Architecture. While Brooks’ sudden realization that entertaining guests would be awkward in his newly designed home since there was nowhere “to put the coats” seemed out of place, stories like these made the abstract verb/noun/concept of design more concrete and relatable, even when considering software design. Besides, design is supposed to be fun.

The Design of Design is one of the most important books for software engineers since The Mythical Man Month. Unlike The Mythical Man Month, however I found that I had more questions than answers by the end of the book. The Design of Design made me feel more confident as a designer and software engineer but also more unsure of what to do next. The book is full of amazing, empowering ideas but very little that can be applied practically today. Many concepts I thought I understood suddenly revealed additional dimensions for my consideration, new ways of thinking about the world. I love it.

The Design of Design by Frederick Brooks is available now from Amazon.com. I highly recommend it.

Carpool Musings on Women in Science and Engineering

Over the past few weeks there has been a rash of studies published discussing why there are so few women in the science and technology fields. On a high note, one of these studies noticed that recently about the same number of women are graduating with science, technology, engineering, and math bachelor’s degrees as men. Unfortunately researchers found that a disproportionately large number of those women choose not to continue studying science, technology, engineering, and math in graduate school. Nearly every study mentioned in the news recently concludes that women are discouraged, either directly (by peers or, worse mentors straight up telling them to avoid the fields) or indirectly (for example, through a lack of female role models) from entering an engineering, mathematically-inclined, technical, or scientific field.

My wife and I have been debating the results and implications of these studies based on what we learn from radio news snippets while carpooling to work. So when Ada Lovelace Day came across my Twitter stream I asked my wife if she would like to write an article with me. I thought it might be interesting to hear two different perspectives (one from a man, the other from a woman) on how women in science, engineering, math, or technology have influenced our thinking in some way.

Interview with Marie

Marie chose to discuss Dr. Martha Case.

Who is Dr. Martha Case?

Marie: She is a professor at the College of William and Mary and was my undergraduate advisor while working toward my degree in Biology.

What about Dr. Case inspires you?

Marie: She was one of my few female professors in college. She is well respected in her field, by students, other professors, and researchers. She was also given leadership roles in the biology department. What I admired most about Dr. Case is that she was able to maintain her femininity while being a woman of strength and great knowledge. Her ability to share her knowledge and passion inspired me to become a teacher so I could inspire others to love plants too.

What can young girls learn from Dr. Case’s example?

Marie: You should find something you love, learn everything you can about it, and then get out there and tell others. If you’re passionate then people will listen.

Interview with Michael

Michael chose to discuss Mary Shaw.

Who is Mary Shaw?

Michael: This is tough. Mary Shaw is a professor at Carnegie Mellon University. She has written a ton of papers on software engineering covering everything from software engineering education and research to architecture and design and everything in between. While I was working toward my Masters in Software Engineering, I had the opportunity to take two discussion courses – one on great papers in software engineering and one on design. Mary was the moderator of these discussions. I also had an opportunity to write a paper with Mary that was published in IEEE.

What about Mary Shaw inspires you?

Michael: She is ridiculously smart and the fact that she has put out a lot of really good ideas and is extremely influential in the software engineering world. And she’s able to articulate her ideas extremely well. Just being able to sit around and have these discussions with her and other PhD students was empowering. It has nothing to do with her as a woman and everything to do with her as a software engineer.

What can young girls learn from Mary Shaw’s example?

Michael: Carnegie Mellon isn’t run just by guys. And it doesn’t matter what gender you are – people value the ideas.

Wrap-up

That was much tougher than either of us thought it would be and will probably only add fuel to our carpool discussions. Interestingly, we both chose a college professor with whom we directly interacted; people we personally know.

Normally I [Michael speaking] wouldn’t have thought this sort of a discussion would have been necessary. Generally speaking, tech blogs like this are preaching to the choir – most folks reading this either already think similarly as me or have a strong desire to learn the information I’m sharing. It would be rare for people who hate software engineering, for example, to read this blog. So, if I’m trying to change your mind about a controversial topic, blogging isn’t the most effective way to do this. Sadly, I’m not sure that everyone in the software industry thinks that gender equality is something that needs attention. It’s one of those slow-change ideas and I’m happy to see inroads like Ada Lovelace Day.