-
Notifications
You must be signed in to change notification settings - Fork 66
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Creating Power Analysis Through Posterior ROPE Estimation #368
base: main
Are you sure you want to change the base?
Conversation
pre-commit.ci autofix |
for more information, see https://pre-commit.ci
Next Steps:
|
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #368 +/- ##
==========================================
+ Coverage 85.60% 86.54% +0.93%
==========================================
Files 22 24 +2
Lines 1716 1895 +179
==========================================
+ Hits 1469 1640 +171
- Misses 247 255 +8 ☔ View full report in Codecov by Sentry. |
Cool. I think the ROPE and MDE framing makes much more sense. I'm going to play devil's advocate here in order to really distill this down into its raw essence 💎 Why is this needed?Let's say Bob already uses CausalPy Bayesian synthetic control methods. They run a synthetic control model after the experiment is done and they get a credible interval on the causal impact and they can use this to make a judgement about whether the intervention had a meaningful effect. At the moment, what you have under the What do we get (if anything) beyond a validation period approachIn the (currently WIP PR #367) we can do synthetic control after the intervention has happened, but do parameter estimation on a smaller training period before a validation period. For example: What you could do, for example, is build a ROPE from the intervention period on the causal impact space (not the raw outcome space) and use that to define when your observed causal impacts are 'meaningful'. Something like this: Let's set aside the goals we have for this in terms of multi-unit synthetic control, and focus just on traditional vanilla synthetic control. Can we really distill the core essence of why ROPE and MDE are important and what the proposed method can do better than what we do already, or what we could do when the intervention period PR is merged? Is there anything really crucial about running the analysis before we have the post intervention data? Hope this helps rather than frustrates! |
Hey team, here is the new PR (Follow up of #292 ) to create a experiments power estimation based a decided ROPE from our posterior distribution!
Context
Let's assume your intervention is scheduled for December 10. In the preceding week, you would use
CausalPy
to create a causal model based on interrupted time series methodologies. This model would then make predictions for a period before the launch of your experiment (say, the last week of November). If your model is well-calibrated, with accurately estimated factors, the mean of your predictions should align closely with the actual outcomes (The difference between reality and the posterior should be a distribution with mean zero and certain sigma).By making predictions over a period where no change is anticipated, we can use the posterior to estimate our potential mean or cumulative values on a regular basis. We can then establish a threshold area or region of practical equivalence (ROPE) to gauge the level of effect required for it to be deemed significant. In essence, we are determining the precise change necessary for the target value to deviate from the posterior. Applying this procedure, the MDE will be a value outside of the given ROPE which can be specified by our
alpha
.This estimation allows for an assessment of the model's sensitivity to changes and the experiment's feasibility.
Pre-Experimentation setup
By applying this method before the experiment period we will be able to determine what is the setup of our most optimal model to reduce our MDE and increase the power. Using this method we can answer questions like:
Assessment
We could end up with several results outside the ROPE area. How to determine which is more extreme than the other?
Think of our posterior distribution as a range of possible values that we might see, with the mean value representing the most probable outcome. In this way, we can evaluate the probability of a new value being part of this distribution by measuring how far it deviates from the mean value, and goes far from the respective ROPE.
If a value is precisely at the mean, it has a probability of 1 to fall within our posterior. As the value moves away from the average towards both extremes of the distribution, the probability decreases and approaches zero. This process allows us to determine how 'typical' or 'atypical' a new observation is, based on our model estimated posterior.
In simple terms, we are seeing the true effect size falls within the estimated posterior credible interval or not.
Few examples
This function is similar to how Google's CausalImpact estimates the "posterior tail area probability".
📚 Documentation preview 📚: https://causalpy--368.org.readthedocs.build/en/368/