[Breaking] Update Evaluation Functionality #7388

hinthornw · 2023-07-08T03:40:32Z

Migrate from deprecated langchainplus_sdk to langsmith package
Update the run_on_dataset() API to use an eval config
Update a number of evaluators, as well as the loading logic
Update docstrings / reference docs
Update tracer to share single HTTP session

vercel · 2023-07-08T03:40:36Z

The latest updates on your projects. Learn more about Vercel for Git ↗︎

1 Ignored Deployment

Name	Status	Preview	Comments	Updated (UTC)
langchain	⬜️ Ignored (Inspect)			Jul 13, 2023 9:06am

hinthornw · 2023-07-12T14:47:25Z

Opened a PR to add langsmith to conda-forge conda-forge/staged-recipes#23337

- Add an enum - Rm support for multi-criteria in single evaluator

Use `EvalConfig` n the `run_on_dataset` function, plus: - Add proactive validation for compatibility with the evaluators ( e.g., check can be converted to prompt or messages for LLM or check example input keys against the chain) - Improve error messaging for the dataset <-> model you're testing - Integration tests for the combinations of dataset formats and llm, chat models, and chains Deltas off [#7508](#7508) (split the criteria evalutaor into the reference free and labeled classes) which builds off [#7388](#7388) which migrates from langchainplus_sdk to langsmith package <details> <summary>Dataset/ model setup</summary><pre><code> # """Evaluation chain for a single QA evaluator.""" from uuid import uuid4 import pandas as pd from langchain.client.runner_utils import run_on_dataset from langsmith import Client client = Client() dataset_name = f"Testing - {str(uuid4())[-8:]}" df = pd.DataFrame( { "some_input": [ "What's the capital of California?", "What's the capital of Nevada?", "What's the capital of Oregon?", "What's the capital of Washington?", ], "some_output": ["Sacramento", "Carson City", "Salem", "Olympia"], } ) ds = client.upload_dataframe( df, dataset_name, input_keys=["some_input"], output_keys=["some_output"] ) from langchain.chat_models import ChatOpenAI from langchain.chains import LLMChain def chain_constructor() -> None: """Evaluate a chain on a dataset.""" llm = ChatOpenAI(model="gpt-3.5-turbo", temperature=0) chain = LLMChain.from_string(llm, "What's the capital of {input}?") return chain </code></pre></details> **Relevant snippet** ``` from langchain.evaluation.run_evaluators.config import ( RunEvalConfig, ) evaluation_config = RunEvalConfig( evaluator_configs=[ RunEvalConfig.Criteria(criteria="helpfulness"), RunEvalConfig.Criteria( criteria={"my-criterion": "Is the answer fewer than 10 words?"} ), "qa", # Or could do RunEvalConfig.ContextQA(), etc. "context_qa", "embedding_distance", ] ) run_on_dataset( dataset_name, llm_or_chain_factory=chain_constructor, run_evaluator_config=evaluation_config, ) ``

dosubot bot added the 🤖:enhancement label Jul 8, 2023

hinthornw requested a review from agola11 July 8, 2023 03:40

Switch to langsmith

4d50092

vowelparrot force-pushed the wfh/langsmith branch from a7bea23 to 4d50092 Compare July 10, 2023 17:49

hinthornw added 2 commits July 10, 2023 13:52

merge

fcfe67b

update test from merge

8c60e72

vowelparrot force-pushed the wfh/langsmith branch from 2c15e64 to 8c60e72 Compare July 11, 2023 00:30

hinthornw added 3 commits July 10, 2023 23:15

bump to 0.0.3

c29d505

Merge branch 'master' into wfh/langsmith

8d393cc

lint

9147b3f

hinthornw mentioned this pull request Jul 11, 2023

Use evaluator config in run_on_dataset #7498

Merged

hinthornw added 3 commits July 11, 2023 13:26

mv langsmith notebook

937ddd8

rename

c97346e

bump

8280bdf

hinthornw added 6 commits July 12, 2023 07:50

reformat notebook

61acd6b

Share client

424a8a0

Share executor

1144586

shared global

b8a5ce3

merge and update version

35c7666

merge

4a632f9

vowelparrot force-pushed the wfh/langsmith branch from b01a3b9 to 4a632f9 Compare July 12, 2023 21:16

agola11 approved these changes Jul 12, 2023

View reviewed changes

vowelparrot mentioned this pull request Jul 12, 2023

Issue: langchainplus-sdk dependency #5905

Closed

hinthornw added 5 commits July 12, 2023 16:20

Split labeled criteria evaluator to new class (#7508)

3722e85

- Add an enum - Rm support for multi-criteria in single evaluator

Merge branch 'wfh/shared_client' into wfh/langsmith

4fc3e20

merge with master

a10c571

mv dir

a71e302

hinthornw added 4 commits July 13, 2023 01:03

fix spelling

c55036c

update int test

0baacf0

mv notebook

c04c2b5

update langsmith flow

e7be266

hinthornw changed the title ~~Switch to LangSmith~~ [Breaking] Update LangSmith Evaluation Functionality Jul 13, 2023

hinthornw added 3 commits July 13, 2023 01:39

update docstrings

c4846bd

update ref docs to include async funcs

60144ad

Update arg docstring

8c8acf2

hinthornw force-pushed the wfh/langsmith branch from ad35ab2 to 8c8acf2 Compare July 13, 2023 08:54

hinthornw added 2 commits July 13, 2023 02:05

maxsplit

f5b206c

docstring

a6d422c

hinthornw changed the title ~~[Breaking] Update LangSmith Evaluation Functionality~~ [Breaking] Update Evaluation Functionality Jul 13, 2023

hinthornw merged commit a673a51 into master Jul 13, 2023

hinthornw deleted the wfh/langsmith branch July 13, 2023 09:13

nikhase mentioned this pull request Jul 31, 2023

[Bug]: LibreChat exit code 1 after docker-compose up danny-avila/LibreChat#735

Closed

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Breaking] Update Evaluation Functionality #7388

[Breaking] Update Evaluation Functionality #7388

hinthornw commented Jul 8, 2023 •

edited

Loading

vercel bot commented Jul 8, 2023 •

edited

Loading

hinthornw commented Jul 12, 2023

[Breaking] Update Evaluation Functionality #7388

[Breaking] Update Evaluation Functionality #7388

Conversation

hinthornw commented Jul 8, 2023 • edited Loading

vercel bot commented Jul 8, 2023 • edited Loading

hinthornw commented Jul 12, 2023

hinthornw commented Jul 8, 2023 •

edited

Loading

vercel bot commented Jul 8, 2023 •

edited

Loading