Skip to content

Generative AI Examples v1.2 Release Notes

Latest
Compare
Choose a tag to compare
@chensuyue chensuyue released this 27 Jan 02:20
· 4 commits to main since this release

OPEA Release Notes v1.2

We are excited to announce the release of OPEA version 1.2, which includes significant contributions from the open-source community. This release addresses over 320 pull requests.

More information about how to get started with OPEA v1.2 can be found at Getting Started page. All project source code is maintained in the repository. To pull Docker images, please access the Docker Hub. For instructions on deploying Helm Charts, please refer to the guide.

What's New in OPEA v1.2

This release focuses on code refactoring for GenAIComps, the epic efforts aimed at reducing redundancy, addressing technical debt, and enhancing overall maintainability and code quality. As a result, OPEA users can expect a more robust and reliable OPEA with clearer guidance and improved documentation.

OPEA v1.2 also introduces more scenarios with general availability, including:

  • LlamaIndex and LangChain Integration: Enabling OPEA as a backend. LlamaIndex integration currently supports ChatQnA only.
  • Model Context Protocol(MCP) Support: Experimental support for MCP at Retriever.
  • Cloud Service Providers(CSP) Support: Supported automated Terraform deployment using Intel® Optimized Cloud Modules for Terraform, available for major cloud platforms, including Amazon Web Services (AWS), Google Cloud Platform (GCP), and Microsoft Azure.
  • Enhanced Security: Istio Mutual TLS (mTLS) and OIDC (Open ID Connect) based Authentication with APISIX.
  • Enhancements for GenAI Evaluation: Specialized evaluation benchmarks tailored for Chinese language models, focusing on their performance and accuracy within Chinese dataset.
  • Helm Charts Deployment: Add supports for the examples Text2Image, SearchQnA and their microservices.

Highlights

Code Factoring for GenAIComps

This is an epic task in v1.2. We refactored the entire GenAIComps codebase. This comprehensive effort focused on reducing redundancy, addressing accumulated technical debt, and enhancing the overall maintainability and code quality. The refactoring not only streamlined the architecture but also laid a stronger foundation for future scalability and development.

At the architecture level, OPEA introduces OpeaComponentRegistry and OpeaComponentLoader. The OpeaComponentRegistry manages the lifecycle of component classes, including their registration and deregistration, while the OpeaComponentLoader instantiates components based on the classes in the registry and execute as needed. Unlike previous implementations, this approach ensures that the lifecycle of a component class is transparent to the user, and components are instantiated only when actively used. This design enhances efficiency, clarity, and flexibility in the system.

At the component level, each OPEA component is structured into two layers: the service wrapper and the service provider (named as integrations in the code). The service wrapper, which is optional, acts as a protocol hub and manages service access, while the service provider delivers the actual functionality. This architecture allows components to be seamlessly integrated or removed without requiring code changes, enabling a modular and adaptable system. All the existing components have ported to the new architecture.

Additionally, we reduced code redundancy, merged overlapping modules, and implemented adjustments to align with the new architectural changes.

Note

We suggest users and contributors to review the documentation to understand the impacts of the code refactoring.

Supporting Cloud Service Providers

OPEA offers automated Terraform deployment using Intel® Optimized Cloud Modules for Terraform, available for major cloud platforms, including AWS, GCP, and Azure. To explore this option, check out the Terraform deployment guide.

Additionally, OPEA supports manual deployment on virtual servers across AWS, GCP, IBM Cloud, Azure, and Oracle Cloud Infrastructure (OCI). For detailed instructions, refer to the manual deployment guide.

Enhanced GenAI Components

  • vLLM support for embeddings and rerankings: Integrate vLLM as a serving framework to enhance the performance and scalability of embedding and reranking models.
  • Agent Microservice:
    • SQL agent strategy: Take user question, hints (optional) and history (when available), and think step by step to solve the problem by interacting with a SQL database. OPEA currently has two types of SQL agents: sql_agent_llama for using with open-source LLMs and sql_agent: for using with OpenAI models.
    • Enabled user-customized tool subsets: Added support for user-defined subsets of tools for the ChatCompletion API and Assistant APIs.
    • Enabled persistence: Introduced Redis to persist Agent configurations and historical messages for Agent recovery and multi-turn conversations.
  • Long-context Summarization: Supported multiple modes: auto, stuff, truncate, map_reduce, and refine.
  • Standalone Microservice Deployment: Enabled the deployment of OPEA components as independent services, allowing for greater flexibility, scalability, and modularity in various application scenarios.
  • PDF Inputs Support: Support PDF inputs for dataprep, embeddings, LVMs, and retrievers.

New GenAI Components

  • Bedrock: OPEA LLM now supports Amazon Bedrock as the backend of the text generation microservice. Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies through a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI.
  • OpenSearch Vector Database: OPEA vectorstores now supports AWS OpenSearch. OpenSearch is an open-source, enterprise-grade search and observability suite that brings order to unstructured data at scale.
  • Elasticsearch Vector Database: OPEA vectorestores now supports Elasticsearch vector database, Elasticsearch's open source vector database offering an efficient way to create, store, and search vector embeddings.
  • Guardrail Hallucination Detection: Added the capability of detecting Hallucination which spans a wide range of issues that can impact reliability, trustworthiness, and utility of AI-generated content.

Enhanced GenAI Examples

Enhanced GenAIStudio

In this release, GenAI Studio enables Keycloak for multi-user management, supporting sandbox environment for multi-workflow execution and enables Grafana based visualization dashboards with built-in performance metric on Prometheus for model evaluation and functional nodes performance.

Newly Supported Models

  • bge-base-zh-v1.5
  • Falcon2-40B/11B
  • Falcon3

Newly Supported Hardware

Deprecations and Behavior Changes

GenAIComps

Removals

  • Remove embedding microservices: MOSEC.
  • Remove reranking microservices: fastRAG, MOSEC.
  • Remove vector store microservices: LanceDB, Chroma.
  • Remove intent_detection microservice.
  • Remove module cores/mega/gateway.

Merges

  • TGI, vLLM, and Ollama integrations in LLM/text-generation have been merged into LLM/text-generation/integrations/native.

Movements

  • [vectore stores] Move redis, milvus, elasticsearch, opensearch, pathway, pgvector to comps/third_parties.

Renamings

  • Rename comps/reranks to comps/rerankings.

Versioning

  • [animations] Remove fixed version constraints from all dependencies, and use the latest versions instead.
  • Upgrade HabanaAI/vllm-fork to the latest.

Behavior Changes

  • [llm] Exclude yield/reply time from first token latency metric.

API Changes

  • Dataprep
    • Ingest a file: change from v1/dataprep to v1/dataprep/ingest.
    • Get a file: change from v1/dataprep/get_file to v1/dataprep/ingest/get.
    • Delete a file: change from v1/dataprep/delete_file to v1/dataprep/delete

GenAIExamples

Versioning

  • Upgrade tgi-gaudi to 2.3.1.

Behavior Changes

  • ChatQnA: Use vLLM as the default serving framework on Gaudi to leverage its optimized performance characteristics, thereby improving throughput and reducing latency in inference tasks.
  • DocSum: Change the default UI to a Gradio-based UI.

GenAIEval

Behavior Changes

  • Retriever benchmark: Change the default top k from 4 to 1.

GenAIInfra

Behavior Changes

  • Change the imagePullPolicy from IfNotPresent to default.

Docker Images

Deprecations

The following Docker images are deprecated, and will be removed since OPEA v1.3:

Merges

The merged Docker images are deprecated, and will be removed since OPEA v1.3. Please use the new images instead.

Notable Changes

GenAIExamples
  • Functionalities

    • New GenAI Examples

    • Enhanced GenAI Examples

      • [AgentQnA] Add SQL agent to AgentQnA. (287f03a8)
      • [ChatQnA] Enable OpenTelemetry Tracing for ChatQnA TGI serving on Gaudi. (81022355)
      • [DocIndexRetriever] Enable the without-rerank flavor. (a50e4e6f)
      • [DocSum] Support Long context for DocSum. (50dd959d)
      • [DocSum] Added URL summary option to DocSum Gradio-UI. (84a6a6e9)
      • [EdgeCraftRAG] Add new feature and bug fix for EC-RAG. (6169ea49)
      • [MultimodalQnA] Add audio querying to MultimodalQ&A Example. (c760cac2)
    • Changed Defaults

      • [DocSum] Changed Default UI to Gradio. (00b526c8)
  • Performance

    • [ChatQnA] Remove enforce-eager to enable HPU graphs for better vLLM perf. (4c01e146)
  • New Hardware Support

    • Added compose example for MultimodalQnA deployment on AMD ROCm systems. (236ea6bc)
    • Added docker compose example for AgentQnA deployment on AMD ROCm. (df7c1928)
    • Added compose example for VisualQnA deployment on AMD ROCm systems. (77e640e2)
  • Deployment

    • Use staged builds to minimize final image sizes. (0eae391f)
    • Check duplicated dockerfile. (aa5c91d7)
    • Add helm deployment instructions for GenAIExamples. (c795ef22)
    • Add helm deployment instructions for codegen. (5638075d)
  • Versioning

    • Remove vllm hpu commit id limit. (7d218b9f)
  • Bug Fixes

  • Documentation

    • Update README.md for adding K8S cluster link for Gaudi. (91ff520b)
    • Update README.md for supporting matrix. (41374d86)
    • Update README.md for quick start guide. (00241d01)
    • Add example for AudioQnA deploy in AMD ROCm. (006c61bc)
  • CI/CD/UT

    • CI: Add check for conflict image build definition. (8182a833)
    • Check image and service names and Dockerfile in build.yaml. (e8cffc61)
    • Detect dangerous command. (736155ca)
GenAIComps
  • Code Refactoring

    • Core & Components

      • GenAIComps microservices refactor. (f57e30dd)
      • Remove examples gateway. (f5efaf1f)
      • Refactor llm predictionguard. (4c21738a)
      • Refactor llm Docsum. (88f93733)
      • Refactor lvms. (feef30b0)
      • Refactor FaqGen. (ea72c943)
      • Refine embedding naming and move dependency to 3rd_party. (b91911a5)
      • Finetuning code refactor. (efd95780)
      • Text2image code refactor. (2587a297)
      • Refactor prompt registry microservice. (179b5da0)
      • Feedback management microservice refactor. (ec66b91c)
      • Refactor web retriever. (962e0978)
      • Refactor guardrails microservice. (631b5704)
      • Refactor reranking. (267cad1f)
      • Refine Component Interface. (bf097395)
      • Refine agent directories. (cf90932f)
      • Refactor text2sql based on ERAG. (2cfd014b)
      • Image2video code refactor. (90a86345)
      • Refactor asr/tts components. (a19c2226)
      • Refactor image2image. (10408750)
      • Refactor Animation based on ERAG. (a7888ab2)
      • [Reorg] Remove redundant file in retrievers/redis. (f3aaaebf)
    • Deployment

      • Add kubernetes deployment for GenAIComps. (1cc4d211)
  • Functionalities

    • New microservices:

      • Add opensearch integration for OPEA. (8d6b4b0a)
      • Feature/elasticsearch vector store integration - Infosys. (5ed041bd)
      • Build guardrail "Hallucination Detection" microservice. (4db13298)
    • Enhanced microservices:

      • [agent] Add tool choices for agent. (3a7ccb0a)
      • [agent] Add SQL agent strategy. (717c3c10)
      • [llm] Modify Params to Support Falcon3 Model. (6acefae7)
      • [llm/summarization] Add auto mode for long context. (45d00020)
    • Removed microservices

  • Performance

    • Remove enforce-eager to enable HPU graphs for better vLLM perf. (ddd372d3)
  • Behavior Changes

    • Exclude yield/reply time from first token latency metric. (5663e168)
  • Dependency Versioning

    • [animations] Remove version restrictions. (3f23bf58)
    • [asr] Add the dependency to pydantic. (145f3fb8)
  • Bug Fixes

    • Fix docker compose health check issue. (fe24decd)
    • Fix OpenAI API compatible issue: embedding. (c955e5e4)
    • Fix OpenAI API compatible issue: vllm comps support openai API ChatCompletionRequest. (48ed5898)
    • Fix OpenAI API compatible issue: ASR. (c3948ad5)
  • CI/CD/UT

    • Add dangerous cmd check. (766c757f)
    • Enhance asr/tts tests. (9a0d91a5)
    • CI: Add check for conflict image build definition. (0e94eecb)
GenAIEval
  • Bug Fixes

    • [FaqGen] Fix the metrics parse and statistics for benchmark. (5d717e8)
    • Update upload_file_no_rerank.txt. (0155ec3)
    • Update crag eval with benchmark results. (6f7c3bc)
  • Changed Defaults

    • Modify retrieval top_k parameter to 1 for benchmark. (30e32ba)
GenAIInfra
  • HelmChart

    • helm chart: Add service account support. (9bb7c3a)
    • Add vLLM support for DocSum. (0943764)
    • Modify embedding-usvc to support multimodal embedding. (ecb4866)
    • Add minimal resource requests for tgi. (3b7f28b)
    • Add text2image microservice support. (7b35326)
    • Adapt latest changes in asr/tts related components. (9f9b1d5)
    • Add lvm related microservices. (b0c760f)
    • Adapt rerank/web-retriever to latest changes. (386d6d6)
    • Adapt to latest changes in llm microservice family. (70ad650)
    • docsum: reduce microservices in docsum. (68e7d06)
    • audioqna: reduce microservice numbers. (07c163b)
    • Add vLLM+HPA support to ChatQnA Helm chart. (baed0b5)
    • Helm: Add audioqna UI support. (7a26d06)
  • CSP

    • Azure automated deployment for OPEA applications - Infosys. (e9dc58a)
  • Monitoring

    • Add monitoring for rest of ChatQnA + DocSum components. (590991b)
  • Changed Defaults

    • docsum: Use docsum-gradio-ui by default. (95d6398)
    • Use default kubernetes imagePullPolicy. (0f21681)
  • Documentation

  • Bug Fixes

    • [AgentQnA] Fix OpenAI compatible issue: streaming -> stream. (88a7b52)
    • Fix model-downloader and tgi in multi shard case. (a4a96ab)
  • CI/CD/UT

GenAIStudio
  • Add Keycloak theme under assets. (00da22d)
  • Add new basic workflow after solving the bug. (96f6590)
  • Let initial inputs at least match one key for prompt. (e6c4229)
  • Add more keywords, and retry another question. (c2a6e70)
  • Update openai version in studio-frontend. (11ac0ba)
  • Update readme and removed deprecated chromium version. (62a35ea)

Full Changelogs

Contributors

This release would not have been possible without the contributions of the following organizations and individuals.

Contributing Organizations

  • Amazon: Bedrock and OpenSearch vector database integration.
  • AMD: AMD CPU/GPU support for GenAIExamples.
  • Infosys: Elasticsearch vector database integration.
  • Intel: Development and improvements to GenAI examples, components, infrastructure, and evaluation.

Individual Contributors

For a comprehensive list of individual contributors, please refer to the Full Changelogs section.