Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Breaking] Update Evaluation Functionality #7388

Merged
merged 29 commits into from
Jul 13, 2023
Merged

[Breaking] Update Evaluation Functionality #7388

merged 29 commits into from
Jul 13, 2023

Conversation

hinthornw
Copy link
Collaborator

@hinthornw hinthornw commented Jul 8, 2023

  • Migrate from deprecated langchainplus_sdk to langsmith package
  • Update the run_on_dataset() API to use an eval config
  • Update a number of evaluators, as well as the loading logic
  • Update docstrings / reference docs
  • Update tracer to share single HTTP session

@vercel
Copy link

vercel bot commented Jul 8, 2023

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment
Name Status Preview Comments Updated (UTC)
langchain ⬜️ Ignored (Inspect) Jul 13, 2023 9:06am

@dosubot dosubot bot added the 🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features label Jul 8, 2023
@hinthornw hinthornw requested a review from agola11 July 8, 2023 03:40
@hinthornw
Copy link
Collaborator Author

Opened a PR to add langsmith to conda-forge conda-forge/staged-recipes#23337

- Add an enum
- Rm support for multi-criteria in single evaluator
Use `EvalConfig` n the `run_on_dataset` function, plus:
- Add proactive validation for compatibility with the evaluators ( e.g.,
check can be converted to prompt or messages for LLM or check example
input keys against the chain)
- Improve error messaging for the dataset <-> model you're testing
- Integration tests for the combinations of dataset formats and llm,
chat models, and chains


Deltas off [#7508](#7508)
(split the criteria evalutaor into the reference free and labeled
classes) which builds off
[#7388](#7388) which migrates
from langchainplus_sdk to langsmith package

<details> <summary>Dataset/ model setup</summary><pre><code>
#  """Evaluation chain for a single QA evaluator."""
from uuid import uuid4
import pandas as pd
from langchain.client.runner_utils import run_on_dataset

from langsmith import Client

client = Client()

dataset_name = f"Testing - {str(uuid4())[-8:]}"
df = pd.DataFrame(
    {
        "some_input": [
            "What's the capital of California?",
            "What's the capital of Nevada?",
            "What's the capital of Oregon?",
            "What's the capital of Washington?",
        ],
"some_output": ["Sacramento", "Carson City", "Salem", "Olympia"],
    }
)
ds = client.upload_dataframe(
df, dataset_name, input_keys=["some_input"], output_keys=["some_output"]
)


from langchain.chat_models import ChatOpenAI
from langchain.chains import LLMChain


def chain_constructor() -> None:
    """Evaluate a chain on a dataset."""
    llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0)
    chain = LLMChain.from_string(llm, "What's the capital of {input}?")
    return chain
</code></pre></details>



**Relevant snippet**
```
from langchain.evaluation.run_evaluators.config import (
    RunEvalConfig,
)

evaluation_config = RunEvalConfig(
    evaluator_configs=[
        RunEvalConfig.Criteria(criteria="helpfulness"),
        RunEvalConfig.Criteria(
            criteria={"my-criterion": "Is the answer fewer than 10 words?"}
        ),
        "qa", # Or could do RunEvalConfig.ContextQA(), etc.
        "context_qa",
       "embedding_distance",       
    ]
)


run_on_dataset(
    dataset_name,
    llm_or_chain_factory=chain_constructor,
    run_evaluator_config=evaluation_config,
)

``
@hinthornw hinthornw changed the title Switch to LangSmith [Breaking] Update LangSmith Evaluation Functionality Jul 13, 2023
@hinthornw hinthornw changed the title [Breaking] Update LangSmith Evaluation Functionality [Breaking] Update Evaluation Functionality Jul 13, 2023
@hinthornw hinthornw merged commit a673a51 into master Jul 13, 2023
@hinthornw hinthornw deleted the wfh/langsmith branch July 13, 2023 09:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
🤖:enhancement A large net-new component, integration, or chain. Use sparingly. The largest features
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants