Interactive Composition Explorer 🧊

Decomposition of paper Q&A using humans and language models

Design principles

Recipes are decompositions of a task into subtasks.

The meaning of a recipe is: If a human executed these steps and did a good job at each workspace in isolation, the overall answer would be good. This decomposition may be informed by what we think ML can do at this point, but the recipe itself (as an abstraction) doesn’t know about specific agents.
Agents perform atomic subtasks of predefined shapes, like completion, scoring, or classification.

Agents don't know which recipe is calling them. Agents don’t maintain state between subtasks. Agents generally try to complete all subtasks they're asked to complete (however badly), but some will not have implementations for certain task types.
The mode in which a recipe runs is a global setting that can affect every agent call. For instance, whether to use humans or agents. Recipes can also run with certain RecipeSettings, which can map a task type to a specific agent_name, which can modify which agent is used for that specfic type of task.

Running ICE locally

Prerequisites

Install Docker Desktop

Setup

Add required secrets to .env. See .env.example for a model.
Start ICE in its own terminal and leave it running:
```
scripts/run-local.sh
```
Go through the tutorial.

Advanced command line usage

Gold standards

scripts/run-recipe.sh --mode machine

You can run on the iteration gold standards of a specific recipe like this:

scripts/run-recipe.sh --mode machine -r placebotree -q placebo -g iterate

To run over multiple gold standard splits, just provide them separated by spaces:

scripts/run-recipe.sh --mode machine -r placebotree -q placebo -g iterate validation

Streamlit

These require the streamlit variant of the Docker image:

STREAMLIT=1 scripts/run-local.sh

Run the streamlit apps like this:

scripts/run-streamlit.sh

This opens a multi-page app that lets you select specific scripts.

To add a page, simply create a script in the streamlits/pages folder.

Evaluation

When you run a recipe, ICE will evaluate the results based on the gold standards in gold_standards/. You'll see the results on-screen, and they'll be saved as CSVs in data/evaluation_csvs/. You can then upload the CSVs to the "Performance dashboard" and "Individual paper eval" tables in the ICE Airtable.

Evaluate in-app QA results

Set up both ice and elicit-next so that they can run on your computer
Switch to the eval branch of elicit-next, or a branch from the eval branch. This branch should contain the QA code and gold standards that you want to evaluate.
If the ice QA gold standards (gold_standards/gold_standards.csv) may not be up-to-date, download this Airtable view (all rows, all fields) as a CSV and save it as gold_standards/gold_standards.csv
Duplicate the All rows, all fields view in Airtable, then in your duplicated view, filter to exactly the gold standards you'd like to evaluate and download it as a CSV. Save that CSV as api/eval/gold_standards/gold_standards.csv in elicit-next
Make sure api/eval/papers in elicit-next contains all of the gold standard papers you want to evaluate
In ice, run scripts/eval-in-app-qa.sh <path to elicit-next>. If you have elicit-next cloned as a sibling of ice, this would be scripts/eval-in-app-qa.sh $(pwd)/../elicit-next/.

This will generate the same sort of eval as for ICE recipes.

Using PyTorch

TORCH=1 scripts/run-local.sh

Development

Running tests

Cheap integration tests:

scripts/run-recipe.sh --mode test

Unit tests:

scripts/run-tests.sh

Adding new Python dependencies

Manually add the dependency to pyproject.toml
Update the lock file and install the changes:

docker compose exec ice poetry lock --no-update
docker compose exec ice poetry install # if you're running a variant image, pass --extras streamlit or --extras torch

The lockfile update step will take about 15 minutes.

You do not need to stop, rebuild, and restart the docker containers.

Upgrading poetry

To upgrade poetry to a new version:

In the Dockerfile, temporarily change pip install -r poetry-requirements.txt to pip install poetry==DESIRED_VERSION

Generate a new poetry-requirements.txt:

BUILD=1 DETACH=1 scripts/run-local.sh
docker compose exec ice bash -c 'pip freeze > poetry-requirements.txt'

Revert the Dockerfile changes

Contributions

Before making a PR, check linting, types, tests, etc:

scripts/checks.sh

Sharing recipe traces

Reminder: Traces contain source code, so be sure you want to share all the code called by your recipe.

Publish the trace to https://github.com/oughtinc/static and wait for the github-pages workflow to finish.
Add the trace information to ui/helpers/recipes.ts.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.devcontainer		.devcontainer
.github/workflows		.github/workflows
cache		cache
gold_standards		gold_standards
ice		ice
papers		papers
scripts		scripts
streamlits		streamlits
tests		tests
ui		ui
weights		weights
.dockerignore		.dockerignore
.env.example		.env.example
.flake8		.flake8
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.prettierrc.json		.prettierrc.json
CLA.md		CLA.md
Dockerfile		Dockerfile
README.md		README.md
conftest.py		conftest.py
docker-compose.build-streamlit.yml		docker-compose.build-streamlit.yml
docker-compose.build-torch.yml		docker-compose.build-torch.yml
docker-compose.build.yml		docker-compose.build.yml
docker-compose.streamlit.yml		docker-compose.streamlit.yml
docker-compose.torch.yml		docker-compose.torch.yml
docker-compose.yml		docker-compose.yml
eval_gold_paragraphs.py		eval_gold_paragraphs.py
main.py		main.py
mypy.ini		mypy.ini
nodesource.gpg		nodesource.gpg
poetry-requirements.txt		poetry-requirements.txt
poetry.lock		poetry.lock
pyproject.toml		pyproject.toml
summarize_experiment_evals.py		summarize_experiment_evals.py
torch.Dockerfile		torch.Dockerfile

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Interactive Composition Explorer 🧊

Table of contents

Design principles

Running ICE locally

Prerequisites

Setup

Advanced command line usage

Gold standards

Streamlit

Evaluation

Evaluate in-app QA results

Using PyTorch

Development

Running tests

Adding new Python dependencies

Upgrading poetry

Contributions

Sharing recipe traces

About

Contributors 23

Languages

License

oughtinc/ice

Folders and files

Latest commit

History

Repository files navigation

Interactive Composition Explorer 🧊

Table of contents

Design principles

Running ICE locally

Prerequisites

Setup

Advanced command line usage

Gold standards

Streamlit

Evaluation

Evaluate in-app QA results

Using PyTorch

Development

Running tests

Adding new Python dependencies

Upgrading poetry

Contributions

Sharing recipe traces

About

Topics

Resources

License

Stars

Watchers

Forks

Contributors 23

Languages