Brian the BA learns about the INVEST criteria

31 Jan



Brian the BA – explains specification by example

29 Jan


User Story Smells

24 Jan


“User story smells” is a term used by Mike Cohn in User Stories Applied. It describes anti-patterns that happen when writing user stories. Mike Cohn provided a number of story smells.

With 9 years Business Analysis experience, I decided to write my top 10 story smells. They are based on my observations. I’ve even created a game for people to try.



Smell 1 – Everything in a Sprint should be written as a user story

This seems to happen with less experienced agile teams. They use the story format for everything in a Sprint (e.g. As a developer … I want … So that).

Why is it bad? User stories are written from the perspective of end users. They ensure what you build is anchored on a user need. Technical tasks can be sub-tasks of user stories (preferred option), or just tasks that need to be done to keep the lights on (e.g. renew a cert).

User stories are one type of item in the product backlog. Other types of item include: bugs, tasks, epics and spikes. Item can be in a Sprint without being user stories. Don’t spend time thinking how a technical sub-task can fit into the user story format. hammer-nail


Smell 2 – Stories should be sliced by technology layer, because that’s how our development team will approach them

Teams can have different groups of developers (e.g. front end and backend developers). There can be pressure to slice stories accordingly, because each story will be done by a different development team. Another reason is that breaking it down by technology layer removes a dependency on other developer teams. This is an artefact of how the development team is split.

The problem with this approach is that technology slides do not produce a valuable deliverable for the end user. The front end slice must plug into the backend to add value. Vertical slices of functionality are preferred to horizontal technology slices. Vertical slices are much more likely to be potentially shippable.



Smell 3 – Stories don’t need acceptance criteria

This is a strange one – I’ve seen it before. The idea is that the BA/product team should not solutionise. They should present the user need/story to the developer and not come with a list of acceptance criteria/constraints.

The problem is – you need a clear outcome for a story. And there are often clear requirements from the business, or constraints to be considered. Just putting AS I … I WANT … SO THAT and leaving out the acceptance criteria means you won’t know when a ticket is done. It’s not specific enough.

Collaborative specifications, or collaborative specification reviews (e.g. 3 Amigos) work around this. Stories have to have acceptance criteria in order to be testable and closeable.



Smell 4 – The product owner is a user

One of the most common smells. The product owner is a proxy for the user, but 9 times out of 10 they’re not the end user of the service.

The user in a user story has to be an end user of the system. They can be personas/types of user (e.g. admin, front line staff, loyal user etc). Product Owners, BAs, members of the dev team are not the end users.

Writing AS A product owner I WANT something SO THAT value isn’t a user story.



Smell 5 – Acceptance criteria must specify how features look & behave

Some developers like lots of detail. And that’s OK … but generally speaking acceptance criteria specify behaviour (i.e. what the system does in certain scenarios). They don’t need to specify how it looks.

There can be times when describing how a feature looks is useful – or even necessary. Generally attaching a visual or link to a component library is sufficient.

A picture is worth a thousand words.



Smell 6 – System-wide NFRs should be written as NFRs

NFRs are tricky. There are obviously NFRs that affect the end user e.g. system availability. They can be convincingly been written in the user story format.

One problem with writing system-wide NFRs as user stories (e.g. availability, system backups) is that they cut across the entire system. It’s difficult to test these NFRs until the entire system is built. I prefer to have system wide NFRs either as “definition of done criteria” which get tested against each ticket, or as items for regression testing at the end of a release.

Story -specific NFRs might be written as ACs against a ticket (e.g. audit log for a reduction decision).



Smell 7 – Specifying what the user wants is enough!

I’ve seen several people excluding the 3rd line of a user story. It’s the reason why the user wants something – the 3rd line helps us to understand why we’re doing the work.

The 3rd line of the user story (So that … ) can be driven from user research, or observations, or data analytics etc. Either way we need to understand the why before we start to solve the problem. At the minimum a story needs to include “So that”. This helps with prioritisation.



Smell 8 – User stories should be incredibly detailed

User stories should specify the appropriate level of information. There’s a tendency from BAs, and sometimes from the development team, to try to put all the information they have into a ticket.

Having a ticket that is too detailed adds little value. It makes it likely that people will scan over the ticket and miss the most important information. An incredibly detailed ticket is not necessarily better than a less detailed ticket – it’s about having the appropriate level of information.

As a story is worked on it might be that more detail emerges. But a story should contain enough information for the team to develop and test it.



Smell 9 – User stories can depend on other stories in the Sprint

Ideally user stories should meet the INVEST criteria. That means each story should be independent.

Unless it’s agreed at Sprint planning & made visible on the ticket – all user stories should be independent. There may be cases where two dependent stories are brought into the same Sprint – however the goal should be that stories do not depend on other stories.



Smell 10 – Stories should be very small

This is more for teams that are using Gherkin & TDD, however some teams aim to have very small user stories. Almost at the level of a handful of scenarios.

One advantage of smaller user stories is that we can track progress in a Sprint to a more granular level. But a note of caution – small user stories are essentially a grouping of scenarios. They can make the Sprint board less manageable and in themselves deliver very little value to a user. For very small stories it is difficult to make them to be independent and valuable.


The game

Here’s a link to a game we created. It lists the 10 smells + 10 example bad user stories. See if you can match them:

It’s a great team exercise – with either a product or a BA team. It helps reiterate some of the key points above. And makes examples tangible.

Any smells I’ve missed? Enjoy!


Bit of BA humour from me …

24 Jan



Try this your next Agile meetup. Let me know your feedback!!

24 Jan


New cartoon published

26 Nov

My latest cartoon was published on Modern Analyst. Very happy with it:

10 lessons from A/B testing

24 Aug


We implemented A/B testing into our product 6 months ago. During that time we conducted a variety of A/B tests to generate insights about our user’s behaviour. We learnt a lot about our specific product. More generally, we learnt about how to run valuable A/B tests.

Below is a Buzzfeed-esque TOP 10 LESSONS I learnt RUNNING A/B TESTS. It’s tips & tricks – plus things to avoid doing. It’s written from a product/BA perspective.




Lesson 1: A/B vs MVT testing

Lesson 1

A/B and MVT testing are very similar. Infact the terms are sometimes used interchangeably.

A/B and MVT tests both serve up different experiences to the audience and measure which experience performs the best. They are both run with the same 3rd party tools (e.g. Optimizely, Maxymiser) and have a similar experiment lifecycles.

The key difference between A/B and MVT tests is how many elements they vary to the audience.

A/B test

This is where you change one element of a page (e.g. the colour of a button). You might compare a blue button (challenger) against a red button (control) and examine what effect the button’s colour has on user behaviour. For example:

Button Colour | Variant Name |

| Blue                  | Challenger       |

| Red                   | Control            |

Pros: Simple to build, faster results, easier to interpret results

Cons: limited to one element of a user experience (e.g. button colour)

As a note – A/B tests aren’t limited to 2 variants. You could show a blue button, red button, purple button etc; as long as you change only one element of an experience (button colour) it’s an A/B test.

MVT test

This is where you change a combination of elements. You might compare changing the button colour and its text label. You would test all combinations of those changes and see what effect it has on user behaviour. For example:

| Button Colour | Text Copy | Variant Name |

| Blue                  | Click here  | Challenger 1   |

| Blue                  | Click          | Challenger 2   |

| Red                  | Click here   | Challenger 3   |

| Red                  | Click           | Control           |

Pros: Greater insights, identifies the optimal user experience, more control

Cons: Longer to get results, more complex, requires more traffic

Which one to pick?

This depends on what you want to test & your testable hypothesis. In the early stages of running experiments you might start with A/B tests and then move onto MVT tests. This is because A/B tests are simpler to create & interpret. MVT tests are slightly more complex but provide greater product insight.

As an example: we ran an MVT experiment where we changed the promotional copy on a page and a CTA label. We thought both elements would impact the click-through rate. The result was that the winning promotional copy was emotive copy. The best CTA was “Get started“. However the optimal variant was descriptive copy with “Get started“.Why? Perhaps because the tone between the two elements was more aligned. If we had run this as 2 A/B tests then we wouldn’t have identified the optimal combination.


Lesson 2: Have a clear hypothesis


An experiment is designed to test a hypothesis. The purpose of an experiment is to make a change and analyse the effect. Tests need to have a clear reason and a measurable outcome.

When creating an A/B test its crucial to create a clear hypothesisWhat is the problem you’re trying to solve? What are the success metrics? Why do you think this change will have an effect?

We use a variation of the Thoughtworks format to write testable hypotheses:

We predict that <change>

Will significantly impact <KPI/user behaviour>

We will know this to be true when <measurable outcome>

By having clearly defined hypotheses we can:

  1. Compare the merits of different hypotheses and select the most valuable one first. For example if hypothesis 1 predicts a 5% uplift in a KPI and hypothesis 2 predicts a 50% uplift in the same KPI, then we would test hypothesis 2 first.
  2. Agree the success metric upfront before starting development. For example if changing the mobile navigation is the test, what are the success metrics: more users clicking on the menu button, more items in the menu being clicked, increased usage and retention of brand new users? Having clear success metrics/goals is key when trying to identify the winning variant later on.
  3. Ensure the test is focussed on solving a user problem or improving a KPI that matters to the product. We don’t want to run tests simply because we can – they need to solve problems and offer benefits. The above format aligns each test with business KPIs/user problems.
  4. Make it incredibly easy for anyone to generate a hypothesis. The Thoughtworks format means that anyone in our team can generate a hypothesis. Some of the best ideas we’ve had are from “non-creatives” such as QA.

Note – we often put a “background” section with research in the testable hypothesis (e.g. how many people currently use a feature, industry average, user feedback etc).


Lesson 3: Forecast sample size

Lesson 3

When designing an A/B experiment it’s crucial to calculate the sample size. You will need to forecast the sample size required to detect the MDE (Minimal Detectable Effect). This forecast will inform:

  1. Whether you can run the experiment (do you have enough users?)
  2. The maximum number of variants you can create
  3. What proportion of the audience will need to be in the experiment
  4. Potentially the experiment duration (e.g. it will take 2 weeks to get that many users)

There’s several tools online to help you forecast e.g. Without upfront forecasting you run the risk of creating an experiment that will never reach an outcome.

For example: imagine your product has 100k weekly users. You plug in the numbers and forecast that each variant requires 22k users to detect a 0.05 statistical effect size. That means you should build no more than 4 variants, otherwise you won’t detect a significant result. At least 44% of users need to be in the experiment (22% see a variant, 22% see a control). If the change is radical, based on these numbers you may only want to create one variant; this is because you don’t want to show the experiment/significant UX changes to a large proportion of the audience.


Lesson 4: More variants the better

Optimizely ran an analysis of their customers successful A/B tests. What they found was interesting. The more variants run in an experiment (up to a limit), the more likely you are to find an effect. Why?

One reason is that if you ask UX to create 2 variants they may create two similar visuals. If you ask them to create 8 there might be greater differences between them. It’s likely with 2 variants you’re playing it safe. The Optimizely results suggests running about 5 variants in a test:

Lesson 4


Lesson 5: Implement a health metric

The purpose of a health metric is to ensure that an experiment doesn’t maximise one KPI (the experiment’s primary goal) at the detriment of other KPIs. Popular health metrics include: average weekly visits, content consumption, session duration etc. Essentially health metrics are key business KPIs you don’t want to see go downduring an experiment. If the health metric fails, then you pull the experiment early, or do not release the winning variant.

For example: imagine you have 3 variants of a sign-in prompt. One variant of the prompt is non-dismissible. If your primary goal is to maximise sign-ins then this variant will win. However the variant could be so annoying that it reduces overall user engagement with the product. Your health metric ensures you don’t maximise sign-ins at the detriment of core product KPIs (e.g. average weekly sessions).

In our case – the BA worked with stakeholders/the product owner to identify & track the health metrics. The health metrics will vary depending on the product.



Lesson 6: Get management buy in

Based on experience, I recommend getting management buy-in early on. A/B testing is a significant culture change. It challenges the idea that a Product Owner/UX/Managers know what the best user experience is. It replaces gut decisions with data based decisions. Essentially A/B testing can transition a team from a HIPPO culture (HIghest Paid Persons Opinion) to a data driven culture.

Lesson 6

To get management buy in for A/B testing there’s a variety of tactics:

  1. Ensure the 1st A/B test you run offers real business value. Don’t run a minor/arbitrary change as your 1st test. Try to solve an important problem or turn the dial on a key business KPI. Even better if the result might challenge existing beliefs.
  2. Reiterate the benefits of A/B testing. These include:
    • Increasing collaboration by empowering the team to generate their own hypotheses, which can be delivered as “small bets”
    • Increasing openness by encouraging a data-driven culture to decision making, rather than a HIPPO culture
    • Increasing innovation by learning more about user behaviour and adapting the product
    • Increasing innovation because delivering changes to a sub-set of the live audience means you can experiment more and take more risks
    • Challenging assumptions and decisions to create a more valuable product. Gut feelings can be wrong
    • Small bets are better than big bets. They are less risky & can have significant user benefits
    • Empowering the team to improve the quality of solutions
  3. Create experiments in collaboration with the entire team so that it’s not seen as a threat to the PO/UX
  4. Create a fun testing environment. Get people to place bets on the winner.


Lesson 7: Assumptions can be wrong

lesson 7

We’ve had several examples of where our assumptions about user behaviour were wrong.

Our 1st A/B test was a prompt. We thought it would increase usage of a new service. We were so confident about it as an in-app notification that we were going to make it a re-usable component. We actually had 3 more prompts on the roadmap.

What did we find out with an A/B test? The prompt significantly reduced general usage of the app. It was a dramatic drop in usage. The results challenged our assumptions and changed our roadmap.

By having a control group that we could compare against & by serving the experiment to a sub-set of the audience we were able to challenge our assumptions early & with a relatively small subset of users.
We never put the prompt live. Test your assumptions.


Lesson 8:  Broadly it’s a 6 step process

This is a slight simplification – below is the typical lifecycle of an experiment.

lesson 8

STEP 1 – Business goals

Identify the business goals (KPIs) and significant user problems for your product.

STEP 2 – Generate hypotheses

Generate testable hypotheses to solve these goals/problems. Prioritise the most valuable tests.

STEP 3 – Create the test

  • Work with UX & developers to create n number of variants
  • Forecast the number of users required for the MDE
  • Decide on traffic allocation (e.g. 50% see A, 50% see B)
  • Identify target conditions (e.g. only signed in users, only 10% of users)
  • Implement conversion goals (one primary and optional secondary goals)
  • Implement the health check
  • Set the statistical significance level

STEP 4 – Run the experiment

  • Run the experiment for at least 1 business cycle
  • Actively monitor it
  • Potentially ramp up number of users

STEP 5 – Analyse results

  • Review the performance of variants
  • Analyse the health check
  • Identify winner

STEP 6 – Promote the winner

  • Promote the winner to 100% of the audience
  • Learn the lessons
  • Archive the experiment


Lesson 9: Make testing part of the process

lesson 9

When we started A/B tested we committed to run 3 tests in the first quarter. It was a realistic target. It meant we were either developing a test, or analysing the results of a test (tests typically ran for 2 weeks). The more tests we ran, the easier they were to create.

Getting into a regular cycle is important in the early stages. For any feature or change you should ask “Could we A/B test that?”

I have seen several teams “implement A/B testing” and only run 1-2 tests. The key to getting value from A/B testing is to make it part of the product development lifecycle.


Lesson 10: There’s a community out there …

There’s a huge number of resources out there:

lesson 10


I learnt a huge amount from Olivier Tatard, Sibbs Singh, Sam Brown, Toby Urff and the folks at Optimizely. Big thanks also to the rest of the app team, we all went on the journey together.

If you made it down this far then you get 10 bonus points.


Applying Build, Measure Learn to Sprints Demos

25 Jul


Like most Scrum teams, we held “Sprint Review Meeting” every two weeks. We would gather as a team to demo what was recently built & receive feedback. Although it was a great opportunity to showcase recent work, we identified a number of problems with “Sprint Review Meetings” for our mature product:

  1. Stakeholder attendance was poor. Stakeholders saw the Sprint Review Meetings as a technical show & tell. The demos often didn’t work fully & business value wasn’t necessarily communicated.
  2. Because developers demoed the work, it put disproportionate pressure on the development team. We presented recent work & we often had problems with test environments/connections/mock data etc.
  3. More generally – the development team wanted regular updates from the product team. Our retros identified a need for the product team to provide regular updates about recent features; did a recently released feature meet our hypothesis? What did we learn? Will we iterate? How did it impact our quarterly OKRs?
  4. Sprint Review Meetings felt like a conveyor belt. We would demonstrate work, get feedback about quality, and then watch it leave the factory. But we wanted to learn how customers actually used the new product. We wanted external as well as internal feedback.


Build, Measure, Learn (BMLs) sessions

To address the above issues, we replaced Sprint Review Meetings with “Build, Measure, Learn” sessions. As advocates of the Build, Measure, Learn approach – we were keen to review recently released features with the team. We launched features every 2 weeks – so the natural cadence was to report on features at the end of the following Sprint.

We created “Build, Measure, Learn” sessions. The basic format is simple:


Every 2 weeks. At the end of the Sprint. Replaces the Sprint Review Meeting. 


Team (Product, Devs, UX) & Stakeholders. 


1 hour.


The session is divided into two sections:

  1. Build = demo from the development team about what was built during the Sprint. It’s a chance to get feedback from the Product Owner/Stakeholders.
  2. Measure/Learn = product reporting back on stats/usage/insights of recently launched features. Typically on features & changes launched 2 & 4 weeks ago. This provides an external feedback loop.

The Measure/Learn section became as valuable as the demo section. It also provided practical breathing space for setting up/fixing demo’s – if we had problems we would start off with the Measure/Learn section 😉


Build section

As with the Sprint Review meeting – this section was the development team demoing what was built during the Sprint.

This was an opportunity for product/stakeholders to provide feedback and ask any questions. Changes were noted by the BA and put on the product backlog.

It was also an opportunity to praise the team & celebrate success.


Measure/Learn section

In the Measure/Learn section the BA or Product Owner would cover the following areas:

  1. General product performance: how we are performing against quarterly goals/OKRs
  2. For each recently released feature:
    • Present the testable hypothesis
    • Present the actuals. Key trends/unexpected findings/verbatim feedback from the audience about the feature
    • Present key learnings/actions: Build a v2/pivot/stop at v1/kill the feature?
  3. Wider insights (optional):
    • Present recent audience research/lab testing
    • Present upcoming work that UX are exploring & get feedback on it



We found that BML sessions were a great replacement to Sprint Review Meetings. They ensured we kept the measurement & learning part of the lifecycle front and center in the team. The Measure/Learn section also ensured we reported back on business value regularly.

Main benefits:

  1. Learnings/insights about recently released features were shared with the team – this kept us focused on our original hypotheses and business value. It enabled us to discuss the learnings based on external audience feedback.
  2. Encouraged a shared sense of ownership about the end of Sprint session and the performance of features
  3. Increased stakeholder attendance & stakeholder engagement as there was a focus on audience feedback and KPIs
  4. We were still able to demo the newly developed features & get Product Owner/Stakeholder feedback

Simpsons humour

15 Jul


How Might We … brainstorm ideas

13 Jul

Screen Shot 2016-07-13 at 18.46.35


“How Might We …” is a group brainstorming technique we have used for 6>months to solve creative challenges. It originated with Basadur at Procter & Gamble in the 1970s, and is used by IDEO/Facebook/Google/fans of Design Thinking.

“How Might We …” is a collaborative technique to generate lots of solutions to a challenge. Our team modified the technique slightly to ensure that we also prioritise those solutions. More on that below …

In essence “How Might We …” frames problems as opportunity statements in order to brainstorm solutions. For example:

  • How Might We promote our new service to the audience?
  • How Might We improve our membership offering?
  • How Might We completely re-imagine the personalisation experience?
  • How Might We find a new way to accomplish our download target?
  • How Might We get users excited & ready for the Rio Olympics?

How Might We works well with a range of problem statements. Ideally the question shouldn’t be too narrow or broad.



How Might We sessions involve a mixture of participants: product (Product Owner/BA), technical (Developers/Tech Lead/QA) and stakeholders. The duration is 1 – 1.5 hours.

The format is:

  1. Scene setup (background/constraints/goals)
  2. Introduce the question (How Might We …)
  3. Diverge (generate as many solutions as possible)
  4. Converge (prioritise the solutions)


1. Scene Setup

Scene setup is about introducing the background, constraints, goals & groundrules of the How Might We session.

For example we held a session about: “How Might We get app users excited & ready for the Rio Olympics?” We invited 10 participants across product, technical and stakeholder teams. For 5 minutes we setup the scene. As part of scene setup:

  • Background: Rio 2016 is the biggest sporting event. We expect record downloads & app traffic. There will be high expectations. There will be hundreds of events & hours of live coverage.
  • Constraints: We want to deliver the best possible experience without building a Rio specific app.
  • Session goal: Generate ideas for new features & to promote current features.
  • Commitment: We will take the best ideas forward to explore further.


2. Introduce the question

The How Might We question is presented to participants and put on a wall/physical board

The question shouldn’t be too restrictive; wording is incredibly important. Check the wording with others before the session. We circulate the question to participants ahead of the session – this allows them to generate some solutions before the meeting.

Framing the question in context/time will help. It makes the problem more tangible. For example:

“It’s 3 days before the Olympics. How Might We get users excited & ready for the Rio Olympics?”


3. Diverge

Use a technique like crazy 8’s to generate ideas. Give people 5-10 minutes to think of many solutions to the question.

These solutions are typically written on post-it notes. At the end of 10 minutes we ask each participant to stand up and present their post-it notes ideas to the group. Participants explain their ideas; common ideas are grouped together. For example:

Post it note ideas

With 10 users you can generate 50 – 80 ideas. Once ideas are grouped together you can have 20 – 30 unique ideas.


4. Converge

We ask people to pick their favourite idea. It can be there own idea, or another person’s post-it note idea.

For 10-15 minutes they explore that idea in more detail. Participants can add notes/draw user flows/write a description about the idea.

At the end of 10 minutes, each participant is asked to present back their idea to the group. For example:

Idea example

Once each participant has presented their idea (10 people = 10 ideas), participants are invited to dot vote. Each participant has 3 votes to select their favourite 3 ideas.

Typically this is where a HMW ends ….

BUT we would often find ourselves in a position where the top voted idea was the most difficult to implement. The top ideas were often elaborate & had a cool factor – but were very complicated to build/offered limited business value. For example: “We could build VR into the app. It would offer all sports in immersive 3D and recommend videos based on the user’s Facebook likes”.

AND we found that stakeholders weren’t comfortable having an equal say (3 dot votes) to QA/developers in terms of the product proposition.

SO we implemented a further step to converge on more realistic options. We took the top voted ideas + any ideas that stakeholders were particularly keen on from the How Might We session. We allowed UX to explore these ideas in more detail. An example of a more refined idea is an Olympics branded menu:

Screen Shot 2016-07-13 at 17.56.56

We took these ideas into the prioritisation session.



With the more refined ideas we held a prioritization session with the key stakeholders (product owner, tech lead, primary stakeholders).

As a group we would rank these ideas in terms of business value and technical complexity (1-5). The business value was driven by a KPI or agreed mission. The technical complexity was an estimate of effort.

Complexity 5 = hard

Complexity 1 = easy

Impact 5 = high impact

Impact 1 = low impact

We would end up with a relative ranking of the top ideas. For example:

Cost Value example

The top left quadrant is tempting (high impact, low effort). The bottom right quadrant is not tempting (low impact, high effort).

We used the relative weightings & dot voting to select the best idea. We would go on to shape & build the best idea.