f you have been in software development for any length of time you have been there.
A crisis arises. The management announces that the crisis threatens the very existence of the team / department / business.
The team works overtime. The team guru comes up with a ingenious workaround. A patch is rushed through QA. The crisis is resolved and the team celebrates.
But should they be celebrating?
While everyone is rushing around putting out fires and inventing workarounds for the suddenly realized risk,
The crisis happened for a reason. Unless root cause analysis is undertaken and follow up action designed to resolve the underlying issue is completed the crisis will reoccur. Chances are it already has occurred several times under different guises.
The standard response that is heard time after time to a suggestion of root cause analysis and a permanent resolution to the problem is "We do not have time". Of course the reason they do not have time is that they do not do root cause analysis therefore have to solve the same problems over and over again.
The trouble is that the players involved in this little drama may not be especially motivated to resolve the underlying causes. Though the business has suffered losses many of the actors involved may perceive gains.
Managers are seduced by the illusion of rapid progress. Team members get to be heroes.
The Crisis
The team works overtime. The team guru comes up with a ingenious workaround. A patch is rushed through QA. The crisis is resolved and the team celebrates.
But should they be celebrating?
While everyone is rushing around putting out fires and inventing workarounds for the suddenly realized risk,
- No-one is thinking strategically.
- No-one taking advantage of opportunities for process and productivity improvement.
- No-one is developing their skills.
- Compromises are made that damage the long term viability of the project, program or business in crisis in order to resolve the short term crisis.
- Quality, maintainability, automated tests, and manual testing are usually the first things to go.
- Effort and energy that could be used to drive the project forward is instead being used to prevent the project from going backwards.
The Root Cause
The crisis happened for a reason. Unless root cause analysis is undertaken and follow up action designed to resolve the underlying issue is completed the crisis will reoccur. Chances are it already has occurred several times under different guises.
The standard response that is heard time after time to a suggestion of root cause analysis and a permanent resolution to the problem is "We do not have time". Of course the reason they do not have time is that they do not do root cause analysis therefore have to solve the same problems over and over again.
The trouble is that the players involved in this little drama may not be especially motivated to resolve the underlying causes. Though the business has suffered losses many of the actors involved may perceive gains.
Managers are seduced by the illusion of rapid progress. Team members get to be heroes.
The Manager
On the management side there is even a term for this pursuit of illusionary progress (management by crisis).
Unfortunately the possible outcomes from this strategy are
- the employees run around in mindless panic undertaking counterproductive activities
- the employees start ignoring management as they have lost all credibility (like the boy who cried wolf)
neither situation is conducive to long term business survival.
The Heroic Team Member
If a team member indulges in an all nighter not only can he/she receive praise but the resulting functionality guaranties future employment as other developers will have difficulty in fixing, extending or even comprehending that area of the code.
Conceived in a flash of creativity and brilliance by a mind suffering from an excess of caffeine and a deficit of sleep this solution is likely to be
- overly complicated,
- inconsistent both internally and
- inconsistent with the rest of the project and
- is unlikely to have unit tests.
Further more mesmerized by memory of the feeling of power and genius that was imprinted by the surge of adrenaline that accompanied the birth of this monstrosity this hero will resist the idea of replacing it with something
- simpler,
- more reliable,
- more maintainable, and
- that has unit tests
The Confession
Now for a confession. I was once a hero, indulging in many of the counter-productive behaviors described above.
However I became tired of the fact that after saving the company, the company would get itself right back in trouble again. Convinced that there had to be a better way, I started thinking more long term.
- I found by fixing things permanently instead of a quick fix half solution every month or so
- I had more time to develop reusable solutions and tools which meant
- I no longer had to spend as much time with repetitive infrastructure, which meant
- I had time to learn techniques like test driven development, which meant
- I spent less time debugging and
- so on in a virtuous circle.
Over time I found that I had to save the company less frequently.
I have experience with both ways of doing things and I can tell you resisting panicking over a short term a crisis and taking a longer view is better for your health and better for your employers long term prospects.
Update
This is an example of the drama triangle. In this scenario the manager who is managing by crisis is the Victim and the heroic team member is the Rescuer. Their efforts only achieves temporary pain relief, as the underlying cause is not addressed.