Testset generator not working using AzureOpenai key. #636

rahul1-995 · 2024-02-20T10:29:19Z

I am trying to generate synthetic data using azure openai api, taking long time to run and after that getting error.

Ragas version: 0.1.1
Python version: 3.10

Code to Reproduce
import os
os.environ["AZURE_OPENAI_API_KEY"] = "AZURE_OPENAI_API_KEY"

azure_configs_gen = {
"base_url": "",
"model_deployment": "gpt-35-turbo-16k",
"model_name": "gpt-35-turbo-16k",
"embedding_deployment": "text-embedding-ada-002",
"embedding_name": "text-embedding-ada-002",
}

azure_configs_critic = {
"base_url": "",
"model_deployment": "gpt-4",
"model_name": "gpt-4",
"embedding_deployment": "text-embedding-ada-002",
"embedding_name": "text-embedding-ada-002",
}
generator_llm = AzureChatOpenAI(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs_gen["base_url"],
azure_deployment=azure_configs_gen["model_deployment"],
model=azure_configs_gen["model_name"],
validate_base_url=False,
)
generator_llm = LangchainLLMWrapper(generator_llm)

critic_llm = AzureChatOpenAI(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs_critic["base_url"],
azure_deployment=azure_configs_critic["model_deployment"],
model=azure_configs_critic["model_name"],
validate_base_url=False,
)

critic_llm = LangchainLLMWrapper(generator_llm)

embed_model = AzureOpenAIEmbeddings(
openai_api_version="2023-05-15",
azure_endpoint=azure_configs_gen["base_url"],
azure_deployment=azure_configs_gen["embedding_deployment"],
model=azure_configs_gen["embedding_name"],
)
embed_model = LangchainEmbeddingsWrapper(embed_model)

pdf_path = r"machinelearning-lecture01.pdf"
documents = SimpleDirectoryReader(input_files=[pdf_path]).load_data()
#type(documents)

splitter = TokenTextSplitter(chunk_size=2000, chunk_overlap=100)
keyphrase_extractor = KeyphraseExtractor(llm=generator_llm)
docstore = InMemoryDocumentStore(
splitter=splitter,
embeddings=embed_model,
extractor=keyphrase_extractor,
)
from ragas.testset import TestsetGenerator
from ragas.testset.evolutions import simple, reasoning, multi_context

test_generator = TestsetGenerator(
generator_llm=generator_llm,
critic_llm=critic_llm,
embeddings=embed_model,
docstore=docstore,
)

testset = test_generator.generate_with_llamaindex_docs(documents=documents[:5],
test_size=3,distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

Error trace
Exception in thread Thread-7:
Traceback (most recent call last):
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\threading.py", line 1045, in _bootstrap_inner
self.run()
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 75, in run
results = self.loop.run_until_complete(self._aresults())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\asyncio\base_events.py", line 653, in run_until_complete
return future.result()
^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 63, in _aresults
raise e
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 58, in _aresults
r = await future
^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\asyncio\tasks.py", line 615, in _wait_for_one
return f.result() # May raise f.exception().
^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\executor.py", line 91, in wrapped_callable_async
return counter, await callable(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\evolutions.py", line 150, in evolve
) = await self.aevolve(current_tries, current_nodes)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\evolutions.py", line 253, in aevolve
passed = await self.node_filter.filter(current_nodes.root_node)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\filters.py", line 54, in filter
results = await self.llm.generate(prompt=prompt)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\llms\base.py", line 92, in generate
return await agenerate_text_with_retry(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 88, in async_wrapped
return await fn(*args, kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 47, in call
do = self.iter(retry_state=retry_state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_init.py", line 325, in iter
raise retry_exc.reraise()
^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_init.py", line 158, in reraise
raise self.last_attempt.result()
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\concurrent\futures_base.py", line 449, in result
return self.__get_result()
^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\concurrent\futures_base.py", line 401, in __get_result
raise self._exception
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\tenacity_asyncio.py", line 50, in call**
result = await fn(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^
File "C:\Users\rabalasa\Anaconda3\envs\genai\Lib\site-packages\ragas\llms\base.py", line 177, in agenerate_text
result = await self.langchain_llm.agenerate_prompt(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'. Did you mean: 'agenerate_text'?

ExceptionInRunner Traceback (most recent call last)
Cell In[4], line 18
9 from ragas.testset.evolutions import simple, reasoning, multi_context
11 test_generator = TestsetGenerator(
12 generator_llm=generator_llm,
13 critic_llm=critic_llm,
14 embeddings=embed_model,
15 docstore=docstore,
16 )
---> 18 testset = test_generator.generate_with_llamaindex_docs(documents=documents[:5],
19 test_size=3,distributions={simple: 0.5, reasoning: 0.25, multi_context: 0.25})

File ~\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\generator.py:128, in TestsetGenerator.generate_with_llamaindex_docs(self, documents, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config)
113 def generate_with_llamaindex_docs(
114 self,
115 documents: t.Sequence[LlamaindexDocument],
(...)
122 ):
123 # chunk documents and add to docstore
124 self.docstore.add_documents(
125 [Document.from_llamaindex_document(doc) for doc in documents]
126 )
--> 128 return self.generate(
129 test_size=test_size,
130 distributions=distributions,
131 with_debugging_logs=with_debugging_logs,
132 is_async=is_async,
133 run_config=run_config,
134 raise_exceptions=raise_exceptions,
135 )

File ~\Anaconda3\envs\genai\Lib\site-packages\ragas\testset\generator.py:246, in TestsetGenerator.generate(self, test_size, distributions, with_debugging_logs, is_async, raise_exceptions, run_config)
244 test_data_rows = exec.results()
245 if test_data_rows == []:
--> 246 raise ExceptionInRunner()
248 except ValueError as e:
249 raise e

ExceptionInRunner: The runner thread which was running the jobs raised an exeception. Read the traceback above to debug it. You can also pass raise_exception=False incase you want to show only a warning message instead.

Expected behavior
It should generate the test dataset from the input pdf..

Additional context
Same error occur sometimes when we are using openai key instead of azure openai key.

The text was updated successfully, but these errors were encountered:

rahul1-995 · 2024-02-21T12:10:30Z

Any updates or workaround over above problem?

shahules786 · 2024-02-22T02:45:43Z

Hey @rahul1-995 sorry for the late reply, are you able to make the evaluation work with azure open ai?
Can you try updating langchain-core

rahul1-995 · 2024-02-22T05:37:39Z

Hi @shahules786 , I am not facing problem while evaluating using Azureopenai, I am facing problem with testset generation using azure, I have given code snippet above, please refer error below:
AttributeError: 'LangchainLLMWrapper' object has no attribute 'agenerate_prompt'. Did you mean: 'agenerate_text'?
Can you please look into this??

Pranshul200 · 2024-02-22T11:08:44Z

Hi Rahul can you explain how you have used the evaluation using azure openai if you haven't got the test data generated I am also facing the same problem did you generated the test data from any other method than please tell I also need to create the synthetic test data

rahul1-995 · 2024-02-22T12:28:15Z

@Pranshul200, I am currently using openai api key for testset generation.

shahules786 · 2024-02-26T17:45:24Z

Hey @rahul1-995 did you try updating langchain-core as requested?

rahul1-995 · 2024-02-27T06:11:44Z

Yes @shahules786 , I have tried updating langchain-core, still not able to run the testset generator...

mspronesti · 2024-02-28T10:55:40Z

@rahul1-995 Can you try using the version of #670 (not merged yet)?

git clone https://github.com/mspronesti/ragas/
cd ragas
pip install .

The usage with Azure OpenAI would be

from langchain_openai import AzureChatOpenAI, AzureOpenAIEmbeddings
import os

os.environ["AZURE_OPENAI_API_KEY"] = "..."
os.environ["AZURE_OPENAI_ENDPOINT"] = "..."
os.environ["OPENAI_API_VERSION"] = "2023-12-01-preview"

generator_llm = AzureChatOpenAI(deployment_name="...")
critic_llm = AzureChatOpenAI(deployment_name="...")
embeddings = AzureOpenAIEmbeddings(deployment="...")

generator = TestsetGenerator.from_langchain(
    generator_llm,
    critic_llm,
    embeddings
)

wikp · 2024-02-29T05:43:36Z

I'm not the original poster but I had the same problem and it disappeared in this version. Thanks :)

mspronesti · 2024-02-29T10:24:24Z

@wikp Thanks for the confirmation!

…ngs (#670) ## **User description** The current version of `with_openai` contains a hardcoded instantiation of `langchain_openai.chat_models.ChatOpenAI`, which makes `TestsetGenerator` very limited and not compatible with completion models, Azure OpenAI models, and open-source models. This PR extends `TestsetGenerator` to any `BaseLanguageModel` and `Embeddings` from langchain for versatility, addressing #230, #342, #635, and #636. Lastly, I've removed all the occurrences of mutable default arguments (bad antipattern, read [here](https://docs.python-guide.org/writing/gotchas/#mutable-default-arguments)). --------- Co-authored-by: Shahules786 <Shahules786@gmail.com> Co-authored-by: jjmachan <jamesjithin97@gmail.com>

subho-das · 2024-09-17T19:41:48Z

I'm not the original poster but I had the same problem and it disappeared in this version. Thanks :)

which version?

rahul1-995 added the bug Something isn't working label Feb 20, 2024

mspronesti mentioned this issue Feb 27, 2024

feat(generator): extend construction to any langchain LLM and Embeddings #670

Merged

dosubot bot added the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label May 20, 2024

dosubot bot closed this as not planned Won't fix, can't repro, duplicate, stale Jun 1, 2024

dosubot bot removed the stale Issue has not had recent activity or appears to be solved. Stale issues will be automatically closed label Jun 1, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Testset generator not working using AzureOpenai key. #636

Testset generator not working using AzureOpenai key. #636

rahul1-995 commented Feb 20, 2024

rahul1-995 commented Feb 21, 2024

shahules786 commented Feb 22, 2024 •

edited

Loading

rahul1-995 commented Feb 22, 2024

Pranshul200 commented Feb 22, 2024 •

edited

Loading

rahul1-995 commented Feb 22, 2024 •

edited

Loading

shahules786 commented Feb 26, 2024

rahul1-995 commented Feb 27, 2024

mspronesti commented Feb 28, 2024 •

edited

Loading

wikp commented Feb 29, 2024

mspronesti commented Feb 29, 2024

subho-das commented Sep 17, 2024

Testset generator not working using AzureOpenai key. #636

Testset generator not working using AzureOpenai key. #636

Comments

rahul1-995 commented Feb 20, 2024

rahul1-995 commented Feb 21, 2024

shahules786 commented Feb 22, 2024 • edited Loading

rahul1-995 commented Feb 22, 2024

Pranshul200 commented Feb 22, 2024 • edited Loading

rahul1-995 commented Feb 22, 2024 • edited Loading

shahules786 commented Feb 26, 2024

rahul1-995 commented Feb 27, 2024

mspronesti commented Feb 28, 2024 • edited Loading

wikp commented Feb 29, 2024

mspronesti commented Feb 29, 2024

subho-das commented Sep 17, 2024

shahules786 commented Feb 22, 2024 •

edited

Loading

Pranshul200 commented Feb 22, 2024 •

edited

Loading

rahul1-995 commented Feb 22, 2024 •

edited

Loading

mspronesti commented Feb 28, 2024 •

edited

Loading