This is a Jupyter notebook

Build a LLM Chat UI with 🤗 Gradio and trace it with 🪢 Langfuse

This is a simple end-to-end example notebook which showcases how to integrate a Gradio application with Langfuse for LLM Observability and Evaluation.

Note: We recommend to run this notebook in Google Colab (see link above). This notebook is also avaliable as Hugging Face Space template here.

Thank you to @tkmamidi for the original implementation and contributions to this notebook.

Introduction

What is Gradio?

Gradio is an open-source Python library that enables quick creation of web interfaces for machine learning models, APIs, and Python functions. It allows developers to wrap any Python function with an interactive UI that can be easily shared or embedded, making it ideal for demos, prototypes, and ML model deployment. See docs for more details.

What is Langfuse?

Langfuse is an open-source LLM engineering platform that helps build reliable LLM applications via LLM Application Observability, Evaluation, Experiments, and Prompt Management. See docs for more details.

Walkthrough

We’ve recorded a walkthrough of the implementation below. You can follow along with the video or the notebook.

Outline

This notebook will show you how to

Build a simple chat interface in Python and rendering it in a Notebook using Gradio Chatbot
Add Langfuse Tracing to the chatbot
Implement additional Langfuse tracing features used frequently in chat applications: chat sessions, user feedback

Setup

Install requirements. We use OpenAI for this simple example. We could use any model here.

# pinning httpx as the latest version is not compatible with the OpenAI SDK at the time of creating this notebook
!pip install gradio langfuse openai httpx==0.27.2

Set credentials and initialize Langfuse SDK Client used to add user feedback later on.

You can either create a free Langfuse Cloud account or self-host Langfuse in a couple of minutes.

import os
 
# Get keys for your project from the project settings page
# https://cloud.langfuse.com
os.environ["LANGFUSE_PUBLIC_KEY"] = ""
os.environ["LANGFUSE_SECRET_KEY"] = ""
os.environ["LANGFUSE_HOST"] = "https://cloud.langfuse.com" # 🇪🇺 EU region
# os.environ["LANGFUSE_HOST"] = "https://us.cloud.langfuse.com" # 🇺🇸 US region
 
# Your openai key
# We use OpenAI for this demo, could easily change to other models
os.environ["OPENAI_API_KEY"] = ""

import gradio as gr
import json
import uuid
from langfuse import Langfuse
 
langfuse = Langfuse()

Implementation of Chat functions

Sessions/Threads

Each chat message belongs to a thread in the Gradio Chatbot which can be reset using clear (reference).

We implement the following method that creates a session_id that is used globally and can be reset via the set_new_session_id method. This session_id will be used for Langfuse Sessions.

session_id = None
def set_new_session_id():
    global session_id
    session_id = str(uuid.uuid4())
 
# Initialize
set_new_session_id()

Response handler

When implementing the respond method, we use the Langfuse @observe() decorator to automatically log each response to Langfuse Tracing.

In addition we use the openai integration as it simplifies instrumenting the LLM call to capture model parameters, token counts, and other metadata. Alternatively, we could use the integrations with LangChain, LlamaIndex, other frameworks, or instrument the call itself with the decorator (example).

# Langfuse decorator
from langfuse.decorators import observe, langfuse_context
# Optional: automated instrumentation via OpenAI SDK integration
# See note above regarding alternative implementations
from langfuse.openai import openai
 
# Global reference for the current trace_id which is used to later add user feedback
current_trace_id = None
 
# Add decorator here to capture overall timings, input/output, and manipulate trace metadata via `langfuse_context`
@observe()
async def create_response(
    prompt: str,
    history,
):
    # Save trace id in global var to add feedback later
    global current_trace_id
    current_trace_id = langfuse_context.get_current_trace_id()
 
    # Add session_id to Langfuse Trace to enable session tracking
    global session_id
    langfuse_context.update_current_trace(
        name="gradio_demo_chat",
        session_id=session_id,
        input=prompt,
    )
 
    # Add prompt to history
    if not history:
        history = [{"role": "system", "content": "You are a friendly chatbot"}]
    history.append({"role": "user", "content": prompt})
    yield history
 
    # Get completion via OpenAI SDK
    # Auto-instrumented by Langfuse via the import, see alternative in note above
    response = {"role": "assistant", "content": ""}
    oai_response = openai.chat.completions.create(
        messages=history,
        model="gpt-4o-mini",
    )
    response["content"] = oai_response.choices[0].message.content or ""
 
    # Customize trace ouput for better readability in Langfuse Sessions
    langfuse_context.update_current_trace(
        output=response["content"],
    )
 
    yield history + [response]
 
async def respond(prompt: str, history):
    async for message in create_response(prompt, history):
        yield message

User feedback handler

We implement user feedback tracking in Langfuse via the like event for the Gradio chatbot (reference). This methdod reuses the current trace id available in the global state of this application.

def handle_like(data: gr.LikeData):
    global current_trace_id
    if data.liked:
        langfuse.score(value=1, name="user-feedback", trace_id=current_trace_id)
    else:
        langfuse.score(value=0, name="user-feedback", trace_id=current_trace_id)

Retries

Allow to retry a completion via the Gradio Chatbot retry event (docs). This is not specific to the integration with Langfuse.

async def handle_retry(history, retry_data: gr.RetryData):
    new_history = history[: retry_data.index]
    previous_prompt = history[retry_data.index]["content"]
    async for message in respond(previous_prompt, new_history):
        yield message

Run Gradio Chatbot

After implementing all methods above, we can now put together the Gradio Chatbot and launch it. If run within Colab, you should see an embedded Chatbot interface.

with gr.Blocks() as demo:
    gr.Markdown("# Chatbot using 🤗 Gradio + 🪢 Langfuse")
    chatbot = gr.Chatbot(
        label="Chat",
        type="messages",
        show_copy_button=True,
        avatar_images=(
            None,
            "https://static.langfuse.com/cookbooks/gradio/hf-logo.png",
        ),
    )
    prompt = gr.Textbox(max_lines=1, label="Chat Message")
    prompt.submit(respond, [prompt, chatbot], [chatbot])
    chatbot.retry(handle_retry, chatbot, [chatbot])
    chatbot.like(handle_like, None, None)
    chatbot.clear(set_new_session_id)
 
 
if __name__ == "__main__":
    demo.launch(share=True, debug=True)

Explore data in Langfuse

When interacting with the Chatbot, you should see traces, sessions, and feedback scores in your Langfuse project. See video above for a walkthrough.

Example trace, session, and user feedback in Langfuse (public link):

Gradio Traces, sessions and user feedback in Langfuse

If you have any questions or feedback, please join the Langfuse Discord or create a new thread on GitHub Discussions.