Skip to content

Latest commit

 

History

History

evaluate

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
page_type languages products description
sample
python
ai-services
azure-openai
Evaluate.

Evaluate

Overview

This tutorial provides a step-by-step guide on how to evaluate Generative AI base models or AI Applications with Azure. Each of these samples uses the azure-ai-evaluation SDK.

When selecting a base model for building an application—or after building an AI application (such as a Retrieval-Augmented Generation (RAG) system or a multi-agent framework)—evaluation plays a pivotal role. Effective evaluation ensures that the chosen or developed AI model or application meets the intended safety, quality, and performance benchmarks.

In both cases, running evaluations requires specific tools, methods, and datasets. Here’s a breakdown of the key components involved:

  • Testing with Evaluation Datasets

    • Bring Your Own Data: Use datasets tailored to your application or domain.
    • Redteaming Queries: Design adversarial prompts to test robustness.
    • Azure AI Simulators: Leverage Azure AI's context-specific or adversarial dataset generators to create relevant test cases.
  • Selecting the Appropriate Evaluators or Building Custom Ones

  • Generating and Visualizing Evaluation Results: Azure AI Evaluation SDK enables you to evaluate the target functions (such as endpoints of your AI application or your model endpoints on your dataset with either built-in or custom evaluators. You can run evaluations remotely in the cloud or locally on your own machine.

Objective

The main objective of this tutorial is to help users understand the process of evaluating an AI model in Azure. By the end of this tutorial, you should be able to:

  • Simulate interactions with an AI model
  • Evaluate both deployed model endpoints and applications
  • Evaluate using quantitative NLP metrics, qualitative metrics, and custom metrics

Our samples cover the following tools for evaluation of AI models in Azure:

Sample name adversarial simulator conversation starter index raw text against model endpoint against app qualitative metrics custom metrics quantitative NLP metrics
Simulate_Adversarial.ipynb X X X
Simulate_From_Conversation_Starter.ipynb X X X
Simulate_From_Azure_Search_Index.ipynb X X X
Simulate_From_Input_Text.ipynb X X X
Evaluate_Base_Model_Endpoint.ipynb X X
Evaluate_App_Endpoint.ipynb X X
AI_Judge_Evaluators_Quality.ipynb X X
Custom_Evaluators.ipynb X X
NLP_Evaluators.ipynb X X
AI_Judge_Evaluators_Safety_Risks.ipynb X X
Simulate_Evaluate_Groundedness.py X X X X

Pre-requisites

To use the azure-ai-evaluation SDK, install withpython pip install azure-ai-evaluation Python 3.8 or later is required to use this package.- See our Python reference documentation for our azure-ai-evaluation SDKhere for more granular details on input/output requirements and usage instructions.- Check out our Github repo for azure-ai-evaluation SDK here.

Programming Languages

  • Python

Estimated Runtime: 30 mins