Time Estimates and Story Points

I would like to share an excellent write-up by the JIRA Agile Product Manager from Atlassian Shawn Clowes. This was written a while ago, but the depth and detail present in this reply from Shawn for one of the questions in the Answers site still amazes us. This was his explanation about time estimates and story points in the agile context.





Here it goes:

I'd like to provide a full explanation of why we we've offered 'Original Time Estimate' as an 'Estimate' value and not 'Remaining Estimate'. Some of my discussion refers to agile concepts that anyone reading probably knows well but I've included it because the context is important. Note that the discussion refers to the best practices we've implemented as the main path in GreenHopper, you can choose not to use this approach if you feel it's really not suitable.

Estimation is separate from Tracking

In Scrum there is a distinction between estimation and tracking. Estimation is typically performed against Primary Backlog Items (PBIs, usually stories) and is used to work out how long portions of the backlog might take to be delivered. Tracking refers to monitoring the progress of the sprint to be sure it will deliver all of the stories that were included. Tracking is often performed by breaking down stories in to tasks and applying hour estimates to them during the planning meeting then monitoring the remaining time in a burndown during the sprint. 

Estimation is all about Velocity

The primary purpose of applying estimates to the PBIs is to use that information to work out how long it will take to deliver portions of the backlog.

In traditional development environments teams would estimate items in 'man hours' and these would be assumed to be accurate. They could then count up the hours in the backlog for a project, divide by the number of people on the team and hours in the week to reach a forecast date. Of course, these estimates often proved to be wildly inaccurate because they did not take in to account the natural estimation characteristics of the team (for over/under estimation), unexpected interruptions or the development of team performance over time. The inaccuracy of the estimates combined with the significant cost of the time spent trying to 'force' them to be accurate makes the 'man hours' approach difficult if not impossible to make work.

So in the Scrum world most teams do not try to achieve estimation accuracy, instead they aim to achieve a reliable velocity. The velocity is a measure of the number of estimation units that a team tends to complete from sprint to sprint. After their first few sprints most teams will achieve a reasonably consistent velocity. Armed with velocity and estimates on the PBIs in the backlog teams can look forward to predict how long portions of the backlog will take to complete.

The key is that it does not matter what the estimation unit is, just that from sprint to sprint it becomes reasonably predictable. For example, teams can choose to use 'ideal hour' estimates but it's neither necessary or expected that those hours will have any close relationship to elapsed time. If a team has 'man hour' capacity of 120h in each sprint but a velocity of 60h that makes no difference because the 60h velocity can still be used to estimate the number of sprints that portions of the backlog will take to complete and therefore the elapsed time. Many people then start wondering where 'the other 60 hours' went and implying that there is something wrong with team productivity. But that's usually got nothing to do with it, a team's estimates merely represent their view of how hard items will be and they're always polluted by the team's natural behaviour (for example over/under estimation) as well as organisational overhead etc. The velocity is all that matters from a planning perspective.

Since the units are not related to time, most teams now choose to use story points (an arbitrary number that measures the complexity of one story relative to others) as their estimation unit. Story points clearly break the mental link with time.

Inaccurate Estimates are good, as long as they are equally Inaccurate

Velocity will only reach a stable state as long as the team estimates each backlog item with the same level of accuracy. In fact, it's probably better to say that each item should be estimated to exactly the same level of inaccuracy. At the risk of repeating the obvious, the goal of velocity is to be able to look at a backlog of not particularly well understood stories and understand how many sprints it will take to complete. This requires a similar level of uncertainty for all of the estimates that are in the backlog.

One of the counter intuitive implications is that teams should estimate each item once and not change that estimate even if they discover new information about the item that makes them feel their original estimate was wrong. If the team were to go ahead and update estimates this 'discovery of new information' will happen regularly and result in a backlog that has some items that have higher accuracy but most that don't. This would pollute velocity because sprints with a larger percentage of high accuracy estimates will complete a different number of units compared to those with a lower percentage of high accuracy estimates. As a result the velocity could not be used for its primary purpose, estimating the number of sprints it will take for a set of not well understood stories in the backlog to be completed. Therefore it's critical to use the first estimates so that the team's velocity realistically represents their ability to complete a certain number of units of not well understood work far ahead in to the future.

But what about when teams realise they've gotten it wrong?

Consider the following scenario:
  • Issue X has an original estimate of 5 days.
  • The issue's estimation was too optimistic and they realise it's actually 15 days before the next sprint is planned.

Some people would argue that using the original estimate will endanger the sprint's success because the team will take in what they think is 5 days of work in to the next sprint when it's actually 15

However, the inaccurate estimate of 5 days is unlikely to be an isolated occurrence, in fact the estimates are always going to be wrong (some very little, some wildly so). Often this will be discovered after the sprint has started rather than before. As long as the team estimates the same way across the whole backlog this will work it self out over time. For example if they always underestimate they may find that for a 10 day sprint with 4 team members they can only really commit to 20 days of their estimation unit. If they have established a stable velocity this has no effect because from a planning perspective we can still estimate how much work we'll get done in upcoming Sprints with good certainty.

But doesn't that break the Sprint Commitment?

When it comes time to start a sprint, the team can use the velocity as an indication of items from the backlog they can realistically commit to completing based on the amount they have successfully completed in the past. However, many people immediately question how that can be right when the original estimates will not include information about work that might have already been done or discovered information about how hard the work is.

As an example, consider the following scenario:
  • An issue has an original estimate of 10 days.
  • The team works 5 days on the issue in the current sprint.
  • The team discovers a bad bug somewhere else in the project and decide that fixing that bug in the current sprint is far more important than completing issue X as planned.
  • The sprint gets finished and the issue returns to the backlog.

In the next sprint the team would be tempted to update the estimate for the issue to 5 days and use that to make their decision about whether to include it in the sprint. The implication is that they might not include enough work in the next sprint if they used its original estimate of 10d. However, the reason that the task was not completed previously is because of unplanned work, it's unrealistic to assume that won't happen again in the future, perhaps even in the next sprint, thus the 10d is a realistic number to use in absence of certainty. As a result the cost of the unplanned work that may happen is eventually accounted for in the original estimate. Even if the work does turn out to be insufficient for the next sprint the team will correct that by dragging more work in to the sprint.

In the same example, consider if this were the only issue in that sprint and will be the only issue in the next. If the issue is completed in the second sprint and we use the remaining estimate the velocity will be (0d + 5d) / 2 = 2.5d, but the team can clearly complete more work than that in future sprints. If we use the original estimates the velocity will be (0d + 10d) / 2 = 5d. The use of the original estimate accounts for the fact that the team cannot commit to 10d in every sprint because unplanned work will likely make that impossible, it also realistically accounts for the fact that unplanned work will not happen in every sprint.

Why not estimate on sub-tasks and roll that up for Velocity and Commitment?

Many teams break down stories in to sub tasks shortly before the sprint begins so they can use the stories for tracking. This raises the possibility of using the sum of the estimates on the sub-tasks as a way to decide which issues to commit to in the sprint (and potentially for velocity).

As described above, tracking is really a separate process from estimation and velocity. The estimates that are applied to the sub tasks are clearly higher accuracy than those that were originally applied to the story. Using them for velocity would cause the velocity to have both high and low accuracy estimates, making it unusable for looking looking further out in the backlog where stories have only low accuracy estimates. In addition, only items near the top of the top of the backlog are likely to have been broken to tasks, so using task estimates for velocity means that the velocity value could only ever predict the time to complete the backlog up to the last story that has been broken in to tasks.

Using the sub task rollup to decide the sprint commitment would also be dangerous because unlike the velocity value it does not take in to account the overhead of unplanned work or interruptions.

Conclusion

Many industry leaders are moving away from hour estimates of any sort. This makes sense because the main questions to be answered are 'How much work can we realistically commit to completing this sprint?' and 'How long will this part of the backlog take to deliver?'. A story point approach based on original estimates can deliver the answers to these questions without the anxiety around 'accuracy' that teams feel when asked to estimate in hours.

The GreenHopper team itself uses the approach described in this article and has established a reliable velocity that we have used to plan months in advance, even when new work has been encountered during those months.

We recommend this approach because while it is sometimes counter-intuitive it is also powerful, fast and simple.

All of that said one of the key precepts of Agile is finding the way that works for you. So GreenHopper does support the alternatives described above including the use of remaining estimates for sprint commitment, hours for estimation and hour estimates on sub-tasks.

Comments

Popular Posts