Getting into shape after Scrum
Since the very beginning of my team, we followed Scrum as closely as we could. As time went by, however, the cracks in the method started to show. Pressure was mounting, productivity appeared to be lagging and quality was decreasing. Only long after these cracks started to appear, we realized a lot of our struggles were due to the nature of the Scrum process itself. This post will outline how and why we came up with a process that works for us, and why Scrum — in retrospect — was simply the worst possible match for me and my team. Note that we don’t work in a consumer market, but in an industrial field where things are even more regulated and restricted: bugs and carelessness could cost lives, and we don’t have the luxury of quickly pushing an update or big fix to devices.
Anyway, I mentioned we had struggles. But they were not just mere struggles: they were more like freight trains holding us back. To name but a few of the issues we had:
- A 3-week sprint was too tight to fully understand, develop, test and release the complex features and/or bug fixes we committed to. There was always left-over work spanning up to 3 sprints later;
- Due to personnel changes and high workload at the time, our team was lacking both a technically minded Product Owner and focused architect;
- The backlog was continually growing at too high a pace for the team to adequately process. Because of this, the backlog became an administrative burden filled with stale, obsolete or — even worse — long overdue items;
- There was no Scrum Master, but it didn’t seem we needed one either;
- Daily stand-ups often devolved into overly detailed long winded discussions because of the overly detailed detailed task board;
- Because of the workload, sprint planning & backlog refinements were often abandoned leading to unclear work items;
- Honoring project deadlines, maintaining technical upkeep, providing 3rd line support and developing new features is too much to handle for 1 small ~5 FTE development team.
Long story short: we could never get to fully utilize Scrum to its fullest (or at least, in so far it’s advertised) and as a result, it started hindering us instead of giving us a steady rhythm. To us, it seems Scrum thrives on “excessive” administration and communication: our day-to-day activities in order to keep projects and products afloat prevented us from giving the process itself the attention it requires. Scrum is restrictive despite what it claims to be, and we needed something more… agile.
Trying Kanban
A Kanban board is a very simple concept. The team identifies the major stages any work item passes, and visualizes those stages as columns on a whiteboard. The work items — represented by post-it notes — pass from one column to the other as the work progresses. Whereas Scrum is dependent on upfront planning and estimates, Kanban is more flexible in this respect. Also, Kanban encourages an ongoing rhythm in the team’s process, whereas Scrum with its sprints features fairly slow starts with pressure mounting as the end of the sprint comes closer. The steady pace encouraged by Kanban seemed to be a better fit for our team as it can better accommodate urgent issues: there is no sprint to be cancelled or goal to be missed.
Unfortunately, Kanban introduced some problems of its own. As mentioned, Kanban is less restrictive than Scrum. In some — important — ways this was a much better fit than Scrum. However, the work items were also increasingly less defined, leaving lots of room for interpretation which, in turn, lead to development work that was not entirely what was needed. Significant time was lost in reworking these items into proper shape. To combat this, we introduced the concept of blueprints along the way. Any work item a developer picked up would need to be investigated by the developer first, after which he or she would write up a technical implementation plan (dubbed the blueprint) of how the work item would be resolved. This blueprint is first reviewed by a senior developer, and only after the senior developer approves this blueprint would any code be written.
The blueprints were helpful in building the work items as intended from the start, but even after this, the high-level nature of Kanban still proved too open for the team. Work items would linger and stay on the board for much longer than was desired. Even though Scrum was too restrictive and demanding overall, it offered lots of support during actual development. Kanban on the other hand demanded nothing up front but was too loose or vague when actual development was underway. The blueprints only remedied this partially.
An introspective
Rather than put the blame solely on Scrum or Kanban, let’s list our own faults as well. Why were they impeding on our perceived productivity and how did they contribute to rising frustrations?
Note that these flaws amplified each other, making their negative effects on the team even stronger.
Dependence on individual people
Many work items were of such a nature or complexity that only 1 or 2 people knew enough about the subject matter to resolve the issue. Looming deadlines and fear of the unknown lead to unjust amounts of pressure on these people.
Necessity for clear-cut specs
Work items were often entered into the administration system with the absolute minimum of information. This left enormous room for interpretation, often resulting in development that missed the mark and thus rework.
Inaccurate estimates
When asked how much time a certain development would need, the team consistently underestimated the work. Even bug fixes that were marked as a single day’s work often needed 3 or 4 days.
Unpredictable, ad hoc release cycle
There was no clear cadence to releases. Instead, we often hurried a release when the moment called for it.
Understaffed team given the ubiquity of the product
Our main product is used for and with nearly every project our company has. It’s even the very backbone of our projects, yet the team’s size did not reflect this dependence.
Under-representation of support tickets
Incoming 3rd line support tickets (or even 1st and 2nd line) were never given proper attention at the beginning of sprints, so they were always disrupting.
Hitting our own stride
In our efforts to achieve a stable, continuously improving product with a steady and predictable release cycle, clear focus during development, room for both innovation and upkeep without getting lost in a project-versus-product discussion, we envisioned a method that would give us flexibility like Kanban, structure like Scrum and room for growth.
Some of our core considerations that went into this (other than the past experiences I mentioned earlier) are, surprisingly, quite against the grain of Scrum:
- Feature planning further than at most 3 months is never accurate;
- Don’t ask how much time a piece of work needs: ask how much it’s worth;
- Incomplete specifications lead to incorrect implementations;
- “First-time-right” requires a time investment;
- Strengthen the team by challenging their weak areas;
- Minimum Viable Product means knowing when to quit;
- Allowing personal input improves engagement and thus the product as a whole;
- The development team should not be burdened with administration or origin of a work item.
- Our industrial line of work is a way more restricted than your average consumer markets
This might just be an effect of the industry we’re in, but my team simply cannot afford to “move fast and break stuff” at an interval of 2 to 3 weeks: there can actually be lives at stake.
So, how do we bring these considerations into practice? How do we help the development team create the features we need while avoiding falling into the same pitfalls all over again? It is — almost as if it was meant to be — that I encountered Shape Up by the people at Basecamp. It seemed like a perfect fit, so we used Shape Up as the very solid foundation for our new process.
Know what to build…
The very first stage we pass through is where we gather context: problems are fully analysed so we know what it actually entails. With this knowledge, we aim to find and specify the solution. Only then can we fully understand and communicate what needs to be built. Looking back at our earlier efforts, this is very reminiscent of our blueprints. It stands to reason, then, that the very first stage of our iterative cycle involves a supercharged blueprint or, put more professionally, a functional feature specification. A small but important part of these specifications is their worth. The council does not estimate how much time a certain feature needs to be built: they decide how much time it is worth. This number comes into play when the feature is picked up.
For my team, feature specifications can have one of 5 origins. Each origin has a dedicated analyst with specific domain knowledge or access to the origin. These analysts together form the Feature Council. If you’re thinking “Wait a minute… isn’t this the shaping part from Shape Up?”, you’re exactly right.
Above it all stands the General: this single person has the overall responsibility over the product as well as the entire team. He is a free agent, allowed to intervene at any point during the process with a veto.
… then take the time to build.
We have experienced first hand how a work item that is not finished in its sprint has a cascade effect on the subsequent work items, impacting their planning as well as their quality. In order to build a feature first time right, there must be given ample time. Therefore, just like Shape Up, we use 6 weeks of full-on development time: this is our Focus Mode. The Feature Council & General trust the development team (or Squad) to develop the features in time, and will not inquire or attempt to influence the result of Focus mode.
Speaking of time: remember that features had a time worth assigned to them by the Council. The Squad in turn estimate the time needed to develop a feature. The difference between these two numbers offer room for individual team members to take on development work they’re not familiar with. Unfamiliar territory simply takes more time, but we also want our developers to broaden their knowledge and skills. The time difference between worth and estimate makes room for knowledge transfer and personal development.
After the 6 weeks are up, another 3 weeks are reserved to build a proper release, test the new features, tie up any loose ends and take on any open support tickets. Yup, this is Shape Up’s Cooldown. The transition from Focus Mode to Cooldown is the ideal moment for a retrospective. How did we do, and if we screwed up, how can we set things right?
If a feature development missed its 6-week deadline, Cooldown can also be used as a grace period to actually finish it, effectively extending Focus mode. However, a strong re-evaluation of the feature is required so a battle plan can be formed together with the Feature Council and General.
A tale of two cycles
The two paragraphs above made it clear that there are two separate tracks at play: a formative track where the Feature Council designs and specifies features, and a hands-on track where the Squad realises these features.
These two tracks are so completely different in their work, their cycles are not really compatible. For a Squad where the work to do is laid out in detail, 6 weeks is long enough to do it right and short enough to keep momentum going. For the Feature Council however, 6 weeks would be far too long. Design, prototype and specification work benefits much more from shorter cycles with frequent feedback loops: the iterative process this enables allows for much sturdier and fitting solutions.
So, the Feature Council and the Squad each have their own cycle: a weekly cycle with a weekly feedback session for the Feature Council, and a 6+3 week cycle for the Squad. So how do these two teams work together? There has to be a point where features are fully specified and ready for handover for development. Besides: feature specifications only describe the feature in functional terms. How are these functional specs translated into actionable, technical work items?
That’s where the Gatekeeper comes in.
Hold the door
In many ways, the Gatekeeper is similar to Scrum’s Product Owner. However, unlike a Product Owner, a Gatekeeper is an integral part and representative of the Squad instead of an outside force ”managing and expressing business and functional expectations for a product”. Ideally, he or she has experience as an architect but it can really be anyone with enough technical background to participate as a full member of the Squad. In fact, we actually switch who’s the Gatekeeper with every cycle.
The Gatekeeper’s role changes as the cycle progresses.
The Gatekeeper has multiple responsibilities. He or she:
- acts as an technical consultant,
- guards the team from distractions and overload,
- takes on work himself,
- judges and reviews the feature specifications and
- with the team, translates the functional specifications into actionable work items for the next 6 weeks.
As far as the Squad is concerned, there is no backlog to wade through: for them, there is only the current cycle and its features to develop.
Synchronous versus asynchronous work
Priorities change. As a result, any planning made prior goes out the window to accommodate the shifted priorities until the priorities shift yet again. That’s why Rhythm doesn’t rely on planning, but allows the priorities change freely. It is only during the negotiation between Feature Council and Gatekeeper that a plan for the upcoming 6–9 weeks is established. Any planning beyond 9 weeks is — as far as the Squad and Gatekeeper are concerned — non-existent. The Feature Council is responsible for prioritizing the work we know we have to do at some point in time and offering it to the team before it is due. This is the synchronous work: known work that follows the steady momentum of analysis, design, specification and development.
Note that synchronous work is not just features: known low to medium impact bugs follow the same rhythm. The main difference is that these bugs are not processed by the Feature Council in advance and thus not offered for resolution in Focus mode. Instead, the Cooldown period is ideal for these bugs to be picked one after another.
Developing features based on specifications and prototypes made in advance is all well and good but surprises, nondescript bugs or other sudden urgent matters (like contractually obligated 3rd line support) always lurk in the shadows. They cannot be planned in advance but do require immediate attention. This is the the asynchronous work: unknown issues that may appear at any point in the cycle.
Whenever asynchronous work appears, it has the potential to interfere with the current cycle and the team’s goal of developing the committed features. Remember that the Gatekeeper guards the Squad from being overloaded. The Gatekeeper keeps an eye on the workload of the Squad and the feasibility of successfully developing the negotiated features. But how, then, can we ensure the asynchronous work will be addressed in a timely manner? Who is the Gatekeeper for asynchronous work?
We call this role the Ambassador. He or she is responsible for handling the (3rd line) support tickets and, as such, acts as an envoy for the customer that filed the ticket. It is the Ambassador’s primary concern to have a support ticket resolved as soon as needed (not as soon as possible). Our customers don’t follow our cycles, so neither does the Ambassador. This person is a free agent just as the General is. However, instead of being allowed to immediately intervene a cycle to resolve an issue, the Ambassador can only request intervention from the Gatekeeper. He or she is the one who assesses if, when and at what cost an incoming support ticket can be addressed during the cycle. It is up to the Ambassador to convince the Gatekeeper of the urgency of tickets, and it is up to the Gatekeeper to fit it in his cycle. In the unlikely event the two roles cannot come to an agreement, the General steps in to decide with his right of veto.
Keeping the quality
The start of Cooldown is the moment when QA people (fancifully dubbed the Monitors) start testing the results of this cycle’s development work. They have 3 full weeks to find issues with the work and report it to the Squad to fix. Fortunately, since Cooldown means a period of relative rest for the Squad, they’ll be able to address these issues immediately and iteratively with an immediate feedback loop back to QA. At the end of Cooldown, the Squad will have a new release with new features fully tested and approved.
During Focus mode, the Monitors broaden their scope and give the entire product a properly extensive regression test. Any issues they may find are reported to the Gatekeeper who can then decide how and when to handle the issue: either immediately or during Cooldown and by the Gatekeeper him/herself or the original developer.
Wrapping up
We tried to take the best of each product development method we tried or investigated, and molded it into someting that should work for us.Instead of trying to shoehorn ourselves into a widespread method like Scrum “because it’s the thing to do and everybody uses it”, we formed a method around the practices that demonstrably benefitted us and our industry.
Shape Up already came *very* close, it’s just that due to the specific traits of our industry we added and tweaked some bits and bolts to the process.
It may not be fully agile as most people understand it, but at least we’re ready and willing to adapt this new process as we go: that alone makes it more agile than Scrum claims to be.
Does it have waterfall-y elements? Sure, but why did waterfall ever become a bad word? Apparently, just a small hint of waterfall helps us deliver value and quality to our customers and that’s what matters most, not how fast you can deploy. If you’re doing Scrum but failing to deliver, then maybe do what we we did and take the plunge: processes should never be leading.