Making custom LLMs compatible with TestsetGenerator? #230

Data-drone · 2023-10-29T09:59:47Z

Hi Team,

I am trying to get the TestsetGenerator to work properly with a Langchain custom LLM.

It's _call method current expects a string prompt but it gets a ChatPromptTemplate from ragas. what is the best way to handle this?

The text was updated successfully, but these errors were encountered:

Data-drone · 2023-10-29T10:46:09Z

I got things working by just running format() against the prompt my LLM received if it was in ChatPromptTemplate format but not sure that is the best approach.

I have set the chat_qa=0.0 so shouldn't all the prompts be coming in as string?

shahules786 · 2023-10-29T11:29:52Z

Hey @Data-drone , can you share the ragas & python version you're using?

Data-drone · 2023-10-29T11:32:06Z

ragas==0.0.18 python 3.10.12

shahules786 · 2023-10-29T11:48:40Z

Hi @Data-drone , We will look into the langchain custom llms issue. Regarding setting chat_qa=0, it is used to control if the test set should contain conversational questions or not.

shahules786 · 2023-10-29T14:54:25Z

Hi @Data-drone , which model are you using? Are you using an instruction-tuned model (that is not chat based)?

Data-drone · 2023-10-29T16:02:39Z

Vicuna 1.5 13b

jjmachan · 2023-10-29T18:43:17Z

hey @Data-drone , so we do have this snipped inside RagasLLMs that should handle this logic for you

        if isinstance(self.llm, BaseLLM):
            ps = [p.format() for p in prompts]
            result = self.llm.generate(ps, callbacks=callbacks)
        else:  # if BaseChatModel
            ps = [p.format_messages() for p in prompts]
            result = self.llm.generate(ps, callbacks=callbacks)

if your custom model is a base class of either, it should do the conversation for you. but not sure why its not working here. Could you show us how you're calling the testsetgenerator and custom models code?

msunkarahend · 2023-11-09T22:07:57Z

@shahules786 @jjmachan In the same context, TestsetGenerator uses cross-encoder/stsb-TinyBERT-L-4/ from hugging face. Is there a way, i can avoid using that and just use gpt models from azure open ai alone and generate synthetic dataset?

jjmachan · 2023-11-29T14:32:15Z

hey @msunkarahend thanks for bringing that up. what you can do is change it via the class

class TestsetGenerator:

    """
    Ragas Test Set Generator

    Attributes
    ----------
    generator_llm: LangchainLLM
        LLM used for all the generator operations in the TestGeneration paradigm.
    critique_llm: LangchainLLM
        LLM used for all the filtering and scoring operations in TestGeneration
        paradigm.
    embeddings_model: Embeddings
        Embeddings used for vectorizing nodes when required.
    chat_qa: float
        Determines the fraction of conversational questions the resulting test set.
    chunk_size: int
        The chunk size of nodes created from data.
    test_distribution : dict
        Distribution of different types of questions to be generated from given
        set of documents. Defaults to {"easy":0.1, "reasoning":0.4, "conversation":0.5}
    """

    def __init__(
        self,
        generator_llm: RagasLLM,
        critic_llm: RagasLLM,
        embeddings_model: Embeddings,
        testset_distribution: t.Optional[t.Dict[str, float]] = None,
        chat_qa: float = 0.0,
        chunk_size: int = 1024,
        seed: int = 42,
    ) -> None:

so if you give a new different Embedding instance you will be able to change it.

let me know if you want the finished snippet and I'll share that too 😄

jjmachan · 2023-11-29T14:32:57Z

but @shahules786 probably we should change from_defaults class method to with_openai or something like that to make it easier?

msunkarahend · 2023-11-29T23:59:00Z

@jjmachan If you can share the finished snippet for the TestsetGenerator, it would be great. thanks in advance.

hbj52 · 2024-01-19T17:30:51Z

I notice that the if downloaded from source, the version of ragas on github is 0.0.23.dev44+g506ad60, while the latest version I down load from pip is 0.0.22, same as shown "latest realease" on github. And it seems like 0.0.22 will meet similar question about list(str) and list(ChatPromptTemplate).

mspronesti · 2024-02-28T11:23:43Z

Could you guys try #670?

…ngs (#670) ## **User description** The current version of `with_openai` contains a hardcoded instantiation of `langchain_openai.chat_models.ChatOpenAI`, which makes `TestsetGenerator` very limited and not compatible with completion models, Azure OpenAI models, and open-source models. This PR extends `TestsetGenerator` to any `BaseLanguageModel` and `Embeddings` from langchain for versatility, addressing #230, #342, #635, and #636. Lastly, I've removed all the occurrences of mutable default arguments (bad antipattern, read [here](https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)). --------- Co-authored-by: Shahules786 <Shahules786@gmail.com> Co-authored-by: jjmachan <jamesjithin97@gmail.com>

jjmachan added documentation Improvements or additions to documentation question Further information is requested labels Nov 29, 2023

jjmachan added this to the v0.1.0 milestone Nov 29, 2023

mspronesti mentioned this issue Feb 28, 2024

feat(generator): extend construction to any langchain LLM and Embeddings #670

Merged

jjmachan removed this from the v0.1.0 milestone May 1, 2024

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 8, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 8, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Making custom LLMs compatible with TestsetGenerator? #230

Making custom LLMs compatible with TestsetGenerator? #230

Data-drone commented Oct 29, 2023

Data-drone commented Oct 29, 2023

shahules786 commented Oct 29, 2023

Data-drone commented Oct 29, 2023

shahules786 commented Oct 29, 2023

shahules786 commented Oct 29, 2023

Data-drone commented Oct 29, 2023

jjmachan commented Oct 29, 2023

msunkarahend commented Nov 9, 2023

jjmachan commented Nov 29, 2023

jjmachan commented Nov 29, 2023

msunkarahend commented Nov 29, 2023

hbj52 commented Jan 19, 2024 •

edited

Loading

mspronesti commented Feb 28, 2024 •

edited

Loading

Making custom LLMs compatible with TestsetGenerator? #230

Making custom LLMs compatible with TestsetGenerator? #230

Comments

Data-drone commented Oct 29, 2023

Data-drone commented Oct 29, 2023

shahules786 commented Oct 29, 2023

Data-drone commented Oct 29, 2023

shahules786 commented Oct 29, 2023

shahules786 commented Oct 29, 2023

Data-drone commented Oct 29, 2023

jjmachan commented Oct 29, 2023

msunkarahend commented Nov 9, 2023

jjmachan commented Nov 29, 2023

jjmachan commented Nov 29, 2023

msunkarahend commented Nov 29, 2023

hbj52 commented Jan 19, 2024 • edited Loading

mspronesti commented Feb 28, 2024 • edited Loading

hbj52 commented Jan 19, 2024 •

edited

Loading

mspronesti commented Feb 28, 2024 •

edited

Loading