Software Team Sociology: Developing Useful Applications (Part 2) Persistence

Part 1

Part 2 (this part)

Persistence

Part 3

ersistence: How Information Sticks Around and What Happens if it Doesn't

Software is all about manipulating information. Numbers, text, images and sounds, its all ones and zeroes. But when that information sticks around—when it persists—things start to get tricky.

Suddenly, you have to think about all sorts of headaches:

Where is the data stored?
Who can access it—and when?
What happens when two people try to change it at once?
Should we lock it? Merge it? Roll it back?
Are we aiming for ACID (strong consistency) or BASE (eventual consistency)?
What if the data isn’t what we expected? It needs validation and correction.
How do we migrate it safely when the APIs or internal schema change?

And once you've solved those, you're only a third of the way there.

Code is Data Too

One of the more profound and useful realisations in computer science is that code is data. It can be stored, moved, diffed, merged, and—most importantly—it can break or be misplaced if it is not managed.

That means everything you worry about with persistent data? You get to worry about it again when dealing with the source code. Version control isn't optional; it's survival. And not just for your code files. Think bigger.

You’ll want to track:

Sounds and images
Test datasets
Code generators
Build scripts
Image generation scripts

Entropy Never Sleeps

Even if you get all this right, there are other dependencies your software relies on.

Operating systems evolve (or vanish), programming languages shift, frameworks deprecate things without warning, and that obscure build tool you relied on might be abandoned.

Left unattended, even well-built software starts to rot. Not because the logic broke—but because the world moved on without it.

Analysing Version Control Data

There is information in your version control system that goes beyond your current build.

Are there files that get changed frequently? Those components and modules probably have too many responsibilities, or maybe they need to be more extensible.

Or maybe when you look back, the code changes, then changes back, looping back and forth. You have two sets of customers with two difference preferences, and both groups complain when they don't get their way. The problem will not go away, until both groups are satisfied.

Have you run static analysis on your code base. How much duplication is there? Are there code smells or unreachable code?

During a retrospective I raised the results of static analysis. When they didn't want to tackle item one, I said fine, each team and their circumstances are different. When there was push back on the top ten, I gave them homework. They needed to pick one of the top ten and give suggestions on improving the code base, using the issue as jumping off point. The suggestions where due in the next retrospective in two weeks.

If you are not improving, you are moving backward.

Guess What, Your Project has Data Too

Every project management methodology that I am aware of breaks down work, schedules that work, and tracks progress.

PMBOK has rolling wave planning and work breakdown structure (WBS), while Scrum has progressive elaboration and backlog refinement.

Whether you call them tickets, work items, stories or tasks you need to track work. Whether you use Jira, Trello or something else, that data needs to be treated with the same respect that you give your application data, or your version control data.

Analysing Work Item Data

Its worthwhile analysing your project management data. Agile has velocity, PMBOK has earned value management (EVM). You need to know where you are going and where you are in your journey to get there.

While you need metrics, you need to remember that all metrics are proxies for what you really what to measure. That is because there is zero overlap between what is easy or even possible to measure and what will make a real impact over the long period.

And anytime you tie metrics to performance, your metrics become worthless. The people evaluated will game the system in the most counter-productive way possible. This is just human nature.

The data you obtain is always subject to interpretation, therefor you need to run experiments. The experiments need to be simple, quick and cheap, because that will determine how often you can run them.

Garbage In, Garbage Out

I worked with a team that was constantly re-estimating their work items, and they were not including the QA time or time for rework. They were also experiencing all the symptoms of stories that were too large. By the time stories were finished their estimates were 0 points. This made estimating velocity impossible.

To get any half-way decent analysis out of the tracking software, I had assume all the stories were the same size, and that the differences would come out in the wash. To complicate things my manager would close items from 3 or 4 years ago, without backdating them. These were items that had been finished years ago, but had not been marked as finished in the tracking software. You would think that could not be that many of those, but no, somehow he seem to find 2 to 7 of them each sprint.

My manager insisted the team productivity had dropped, while I thought it had risen. I held my nose, rolled up sleeve and got too work on the data. The first thing I noticed, was that the issues that they with project management before I arrived, were a lot worse than I was led to believe. There was a change in the data when I arrived. Before I started, the data was disorganised and incomplete. Of course, they could have been doing a lot more than what they had recorded in the tool, and they pretty much would have had too.

After cleaning the data as best I could, it showed two dips in productivity, two to three months apart, about two years before I started. The team had never fully recovered from what ever happen at that time, through there had been improvement since I started. Still, with my increased awareness of what had happen before I joined, I was surprised the improvement was not more dramatic.

My analysis and conclusions were only as good as the data, and the quality of the data, even after I cleaned it, was not good.

Still, I saw enough in the data, to encourage me to push harder on on changes I knew were needed.

Starting with this:

Stop sabotaging your workflow tracking system.
Stop erasing estimates.
Stop injecting misleading data that makes it look like you’re shipping four-year-old work every sprint.

Because if your planning data is garbage, then all your decisions based on it will be garbage too.

Note: My manager did not want to tell me what had happen two years before I started.

In Summary

Persistence isn’t just about databases. It’s everywhere: in your code, your tooling, your workflows, and your team’s memory. If you don’t treat all of that data with care, your systems will degrade—even if no one touches a line of code.