Frank Hopkins

Introduction

Performing ad-hoc analysis for stakeholders can be time consuming. Furthermore, there are a few questions that I get asked on reasonably frequent basis. So I have been spending some time developing some tools for my “non-technical” colleagues to use in R.

One of the most commonly asked questions is “How big of a sample do I need to achieve significance?”, which is often followed by “How long do I need to run my experiment for?”. For this reason, I have developed some simple code for people to use when they need to answer these questions. All the user needs to do is pass some baseline numbers into some functions I have created and they can determine their sample size requirements and experiment duration on an ad-hoc basis.

Sample size, statistical power and experiment duration

Luckily, by knowing a few simple pieces of information the pwr() package in R can answer these two questions with a fair amount of ease. Pwr() helps you perform power analysis prior to conducting an experiment, which enables you to determine how big your sample size should be per experimental condition.

The four quantities required to compute power analysis have an intimate relationship and we are able to compute any one of these values if we have the remaining inputs:

1. sample size (n)

2. effect size

3. significance level (alpha)= P(Type I error) = probability of finding an effect that is not there

4. power = 1 — P(Type II error) = probability of finding an effect that is there

As your significance level (3) and power (4) are typically fixed values, as long as you can input the effects sizes (2) for your control and variant, you can determine your required sample size (1).

Thankfully, the ES.h() function in the pwr() package computes our effect size for us to pass into power analyses. We will typically know the current conversion rate/performance of our control condition but the effect of the variant is almost by definition an unknown. However, we can calculate an expected effect size, given a desired uplift. Once these effects are computed they are passed into the pwr.p.test() function which will compute our sample size, providing n is left blank. To make this sort of analysis user friendly, I have wrapped both aforementioned functions into a new function called sample_size_calculator().

Furthermore, as we will use this information to then calculate the number of days needed to run the experiment, I have created a days_calculator() function too, which will use the output from our sample size calculation:

sample_size_calculator <- function(control, uplift){
variant <- (uplift + 1) * control
baseline <- ES.h(control, variant)
sample_size_output <<- pwr.p.test(h = baseline,
n = ,
sig.level = 0.05,
power = 0.8)
if(variant >= 0)
{return(sample_size_output)}
else
{paste(“N/A”)}
}

days_calculator <- function(sample_size_output, average_daily_traffic){
days_required <- c(sample_size_output * 2)/(average_daily_traffic)
if(days_required >= 0)
{paste(“It will take this many days to reach significance with your current traffic:”, round(days_required, digits = 0))}
else
{paste(“N/A”)}
}

If you are using this tool, you simply specify your control conversion rate and desired uplift:

control <- 0.034567

uplift <- 0.01

And run the sample_size_calculator() function:

sample_size_calculator(control, uplift)

sample_size_output <- sample_size_output$n

sample_size_output

You will then get your required sample size output given these values (remember this sample size requirement is per variant):

[n]230345

Now we have this information we can determine how long the experiment needs to run for. All that you will need to input is your average daily traffic:

average_daily_traffic <- 42000

Run the days_calculator() function:

days_calculator(sample_size_output, average_daily_traffic)

And you will get the following output:

[1] It will take this many days to reach significance with your current traffic: 36

Used in conjunction with one another this information can be incredibly useful to stakeholders as they can efficiently plan their experimentation road-maps around these predetermined numbers. It can also aid you in determining the feasibility of certain experiments or whether the uplifts you desire are too idealistic.

Although this code is only relevant if you are conducting an experiment with an AB design (i.e with only two experimental conditions), the functions presented can be amended to calculate the required sample size given multiple experimental conditions, using the pwr.anova.test() function within sample_size_calculator(), replacing pwr.2p.test().



SOURCE

LEAVE A REPLY

Please enter your comment!
Please enter your name here