A Python library for integrating model-based and judgmental forecasting
Quickstart | Docs | Examples
We'll relate three questions on the Metaculus crowd prediction platform using a generative model:
# Log into Metaculus
metaculus = ergo.Metaculus(username="ought", password="")
# Load three questions
q_infections = metaculus.get_question(3529, name="Covid-19 infections in 2020")
q_deaths = metaculus.get_question(3530, name="Covid-19 deaths in 2020")
q_ratio = metaculus.get_question(3755, name="Covid-19 ratio of fatalities to infections")
# Relate the three questions using a generative model
def deaths_from_infections():
infections = q_infections.sample_community()
ratio = q_ratio.sample_community()
deaths = infections * ratio
ergo.tag(deaths, "Covid-19 deaths in 2020")
return deaths
# Compute model predictions for the `deaths` question
samples = ergo.run(deaths_from_infections, num_samples=5000)
# Submit model predictions to Metaculus
q_deaths.submit_from_samples(samples)
You can run the model here.
- Open this Colab
- Select "Runtime > Run all" in the menu
- Edit the code to load other questions, improve the model, etc., and rerun
This notebook is closest to a tutorial right now:
- El Paso workflow
- This notebook shows multi-level decomposition, Metaculus community distributions, ensembling, and beta-binomial and log-normal distributions using part of the El Paso Covid-19 model.
The notebooks below have been created at different points in time and use Ergo in inconsistent ways. Most are rough scratchpads of work-in-progress and haven't been cleaned up for public consumption:
-
Relating Metaculus community distributions: Infections, Deaths, and IFR
- A notebook for the model shown above that uses a model to update Metaculus community distributions towards consistency
-
Model-based predictions of Covid-19 spread
- End-to-end example:
- Load multiple questions from Metaculus
- Compute model predictions based on assumptions and external data
- Submit predictions to Metaculus
- End-to-end example:
-
Model-based predictions of Covid-19 spread using inference from observed cases
- A version of the previous notebook that infers growth rates before and after lockdown decisions
-
- Get rich metadata on open Metaculus questions
-
- Show Metaculus prediction results as a dataframe
- Filter Metaculus questions by date and status.
-
- Illustrates how to load all questions for a Metaculus category (in this case for the El Paso series)
Outdated Ergo notebooks:
-
Predicting how long lockdowns will last in multiple locations
-
Estimating the number of active Covid-19 infections in each country using multiple sources
To install Ergo and its dependencies, we recommend PyEnv and Poetry:
Then:
mkdir my-ergo-project && cd my-ergo-project
pyenv install 3.6.9 && pyenv local 3.6.9
poetry init -n
# Edit pyproject.toml to set python = "~3.6.9"
poetry add git+https://github.com/oughtinc/ergo.git
poetry install
Now Ergo is available in your project:
poetry run python
>>> import ergo
>>> ergo.flip(.5)
DeviceArray(True, dtype=bool)
Ergo is an open source project and we love contributions!
See our instructions for contributors for more.
The theory behind Ergo:
- Many of the pieces necessary for good forecasting work are out there:
- Prediction platforms
- Probabilistic programming languages
- Superforecasters + qualitative human judgments
- Data science tools like numpy and pandas
- Deep neural nets as expressive function approximators
- But they haven't been connected yet in a productive workflow:
- It's difficult to get data in and out of prediction platforms
- Submitting questions to these platforms takes a long time
- The questions on prediction platforms aren't connected to decisions, or even to other questions on the same platform
- Human judgments don't scale
- Models often can't take into account all relevant considerations
- Workflows aren't made explicit so they can't be automated
- This limits their potential:
- Few people build models
- Few people submit questions to prediction platforms, or predict on these platforms
- Improvements to forecasting accrue slowly
- Most decisions are not informed by systematic forecasts
- Better infrastructure for forecasting can connect the pieces and help realize the potential of scalable high-quality forecasting
Ergo is still at an early stage. Pre-alpha, or whatever the earliest possible stage is. Functionality and API are in flux.
Here's what Ergo provides right now:
- Express generative models in a probabilistic programming language
- Ergo provides lightweight wrappers around Pyro functions to make the models more readable
- Specify distributions using 90% confidence intervals, e.g.
ergo.lognormal_from_interval(10, 100)
- For Bayesian inference, Ergo provides a wrapper around Pyro's variational inference algorithm
- Get model results as Pandas dataframes
- Interact with the Metaculus and Foretold prediction platforms
- Load question data given question ids
- Use community distributions as variables in generative models
- Submit model predictions to these platforms
- For Metaculus, we automatically fit a mixture of logistic distributions for continuous-valued questions
- Plot community distributions
WIP:
- Documentation
- Clearer modeling API
Planned:
- Interfaces for all prediction platforms
- Search questions on prediction platforms
- Use distributions from any platform
- Programmatically submit questions to platforms
- Track community distribution changes
- Common model components
- Index/ensemble models that summarize fuzzy large questions like "What's going to happen with the economy next year?"
- Model components for integrating qualitative adjustments into quantitative models
- Simple probability decomposition models
- E.g. see The Model Thinker (Scott Page)
- Better tools for integrating models and platforms
- Compute model-based predictions by constraining model variables to be close to the community distributions
- Push/pull to and from repositories for generative models
- Think Forest + Github
If there's something you want Ergo to do, let us know!