We have been having many conversations about portfolio management with various people and this is how we came in contact with Hakan Altintepe. Through speaking with him on the specific topic of WSJF, we invited him to write a guest blog to share his thoughts on this topic. He will also be at our booth 107, Day 2 of the Global SAFe Summit to discuss this topic.
Weighted Shortest Job First (WSJF) is a job scheduling technique that is vital to the success of agile organizations. Without it, they cannot maximize outcomes or speed. Yet, it is arguably the least understood and most underappreciated part of their agile playbook.
In the digital age, where the change of pace is rapid, and opportunities are unpredictable or short-lived, agile methodology is a game-changer and a superior management technique in delivering business outcomes through technology initiatives. However, it is an inherently complex methodology to implement at scale, and its superiority comes at a steep price when those complexities are not properly addressed. In this regard, the leading agile frameworks and enterprise agile tools have progressed significantly during the past few years, and there is more work to be done. Further advances can yield another 30% – 40% improvement in business outcomes at today’s agile organizations. By putting a spotlight on the remaining complexities of scaling agile organizations, we can accelerate the conversation on what framework and tool advances are necessary to reclaim this opportunity. This article is the first in a series to accomplish this goal.
Below, I will zero in on one of the most innovative management techniques – that is WSJF – to schedule agile jobs by prioritizing team backlogs and explain when it works best and when it doesn’t. I will assume that the reader has an advanced-level understanding of agile methodologies in general, and backlog prioritization and cost of delay concepts in particular .
An Overview of WSJF
WSJF has its origins in the queuing theory and was first offered by Donald Reinertsen in his book “Principles of Product Development Flow” to optimize the product development process from the perspective of product managers (Fig.1). What makes WSJF unique is that it aims to minimize the opportunity cost of unrealized outcomes due to delays – i.e., cost of delay (CoD) – per unit time such as day, week, or sprint. More formally, it is the partial derivative of the total expected value with respect to time. Reinertsen first wrote about CoD back in the ’80s, and it was recently adopted to agile development by the SAFe methodology. Today, enterprises rely on WSJF to sequence agile jobs on their product roadmaps and to prioritize team backlogs.
WSJF is a brilliant technique but its current definition is not perfect; it may become counterproductive if the underlying assumptions are not well understood.
Figure 1 –WSJF technique as defined in SAFe 4.6
A practical interpretation of WSJF for agile organizations
Fundamentally, the goal of a prioritization technique is to optimize value (numerator) per unit of the primary constraint of a system (denominator). In agile development, jobs are prioritized based on their WSJF rating defined as CoD (numerator) over the duration (denominator), and when duration is not available, job size is used as a proxy for duration. From the perspective of product managers, this definition serves well: for example, in a scenario where an insurance product is sequentially launched in three different geographies with ready-to-purchase customers, what matters most to product managers is the launch date. Hence, the opportunity cost of unrealized demand is the value to optimize, and the entire duration of the development process – work time and wait time – is the primary constraint.
For product owners, on the other hand, the primary constraint is team capacity, not job duration. By managing team productivity, they can influence the work time, but the wait time on backlogs is generally outside their purview as that depends on the size of the funded capacity, the portfolio work intake rate and the effective management of cross-team dependencies. Hence, product owners strive for finding the most valuable jobs for their entire team capacity every sprint.
Let’s revisit the scenario introduced in Fig.1 from the perspectives of product owners. In the illustration shown in Fig. 2 below, we only included feature A and B to simplify the scenario, and we added a new column to capture the job size information.
Figure 2 – An illustration of WSJF from the perspective of product owners
If this team has a 10-person capacity, feature A would consume the entire team for 1 day, but feature B would only require 2/3 resources for a duration of 3 days. After reviewing the rationale illustrated in the original scenario in Fig. 1, the product owner would wonder what the rest of the team would do between day 2 and 4. In a more realistic scenario, there are multiple feature A and feature B type jobs on the backlog. In this case, the team could work 15 feature B like jobs simultaneously to utilize all its capacity (Part-3 of Fig. 2).
In Part-1 of the above illustration, we show the original WSJF ratings based on job duration. Part-2, we calculate revised WSJF ratings based on job size. Part-4, we compare the total CoD of the consequent job sequences and realize that the team should prioritize feature B like jobs over feature A, the opposite conclusion shown in Fig. 1 .
Does this mean that the “S” in WSJF should read “smallest” rather than “shortest”?
The ratio of CoD to the expected delivery time, or the cycle-time, is a practical way for product managers to calculate WSJF ratings to sequence epics and features on roadmaps. For high-level planning purposes, job size estimates can also be used as a proxy for job duration.
To prioritize individual team backlogs, however, delivery time/cycle-time is a poor choice to calculate WSJF ratings, because it would implicitly include the wait time on backlogs, which cannot be directly improved via backlog prioritization.
The practical implication of the above illustration is limited as most agile organizations already use job size, not duration in calculating WSJF during backlog prioritization. However, by understanding the rationale behind the formulas, agile organizations can better conclude when and how to use a complex technique like WSJF effectively.
How about CoD?
In Fig. 1, we assumed that all three markets (A, B, and C) are ready for our product; and hence, a delay has an opportunity cost. What if some markets are not ready for launch yet, or IT is not the longest pole in the tent, e.g., demand needs to be first built via a lengthy marketing campaign? What happens in these scenarios to CoD?
CoD applies only when a delay is expected to occur, otherwise, there is no CoD. During portfolio planning, agile organizations focus on “doing the right things’ to minimize CoD by rightsizing investments, sequencing major jobs like epics and features on product roadmaps and funding team capacity. When investments, expected outcomes and teams’ ability to deliver are all aligned, jobs are placed on a program backlog. At that point, a best practice is to reset the CoD to 0 so that the agile teams can own the delays that incur during the program execution. A CoD greater than 0 at the onset of a job would indicate that demand and capacity are not yet aligned, and the delivery teams are expected to catch up as they go.
During program execution, backlog prioritization becomes a vital part of “doing things right”, because it directly influences the scope and schedule of an entire program; for example, lowering priority may put certain jobs out of scope and increasing priority may accelerate their execution.
During execution, an epic or feature is decomposed into stories, which are placed on multiple team backlogs. These stories progress at different speeds; some of them fall behind and start accumulating CoD. Technically, CoD applies to stories only when they are on the critical path of an epic or feature, which might be their parent job or a stranger to whom these stories relate through a dependency relationship with another job. In short, the actual CoD of a job continuously changes throughout execution, and it is not a simple feat to calculate CoD during program execution, even though it is when backlog prioritization matters most.
Typically, agile organizations process the critical path information at an epic level, and some at a feature level. When the critical path information is not available, product owners either manually guess CoD or directly inherit it from an appropriate parent job. Both methods are extremely risky and may result in prioritizing the wrong jobs at the expense of the right ones. Here is why:
- When a job is prioritized, other jobs are deprioritized. Due to the cross-team dependencies and shared constrained resources, all team backlogs work as an integrated dynamic system. A job with a miscalculated CoD may have significant unfavorable implications beyond its immediate backlog.
- When a parent job delays, only a portion of its components are typically on the critical path. If a parent CoD is applied equally to all components, jobs that are not on the critical path are prioritized at the expense of others.
- Delays usually start at the most granular component level, such as stories. By the time an epic or feature is tagged with a potential delay, the recovery options may be severely limited.
To mitigate these risks, agile organizations should avoid the overly simplified versions of the WSJF technique, which is inherently complex for a good reason. Once they complete the design an appropriate backlog prioritization method, they should test its effectiveness and assess its potential side effects and weaknesses specific to their operating environment in a controlled setting, such as a simulation model or a limited-scope pilot, before they roll it out to the entire organization. These tests should not only measure the value gain by the prioritized jobs but also articulate the impact on the deprioritized jobs. All results should be quantifiable and traceable. If these cautions are followed, agile organizations will realize a visible outcome and speed improvement by successfully implementing the WSJF technique.
About the Author
Hakan Altintepe is a technology strategy and digital transformation advisor to senior executives at global enterprises. For over 20 years, he has focused on maximizing the business value of technology. He designed and implemented numerous enterprise IT operating models at global companies and leading regional enterprises with up to 15,000 FTEs, $4B IT operating expense, and $2.5B program budget.
Hakan is the founder of Technology for Alpha LLC. Prior to that, he was a managing director at Accenture, a global technology consulting and services company, and worked as an engagement manager at A.T. Kearney, a global operations strategy firm.
Hakan specializes in business-IT alignment, strategy development, technology, data and analytics operating model design, investment planning/portfolio management, and transformation management.
Hakan’s current thought-leadership focus includes lean IT and agile organizations at scale. He authors the ‘Financial Services IT Goes Digital’ blog at the CIO.com.
Hakan holds an MBA degree from Carnegie Mellon University and M.A.Sc. degree in Electrical Engineering from the University of Ottawa.
 When job duration can be derived from job size, i.e., no wait time incorporated, both methods reveal identical sequences. For example, if the size of feature A and B were 10 and 30, respectively, both methods would say that feature A should be done first.