Skip to content

Latest commit

 

History

History

Llama2

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Azure SignalR Service with LLAMA2 integration

This is a chatroom sample integrated with LLAMA2 languange model to demonstrates SignalR Service integrate with local languange model and group chat with language model. Llama2 is a large language model. And in this sample, we use llama.cpp, which is a runtime of llama2 and it can run on a normal desktop with 4-bit integer quantization. Llama.cpp has many languange bindings, we will use LlamaSharp in this sample.

Prerequisites

The following softwares are required to build this tutorial.

Run the sample

Acquire language model

This repo doesn't contains the language model itself(It's too huge). You can get a LLAMA2 language model from huggingface, for example llama-2-7b-chat.Q2_K. You can also choose larger model according to your needs and machine.

Put the language model in model folder and update the config file src/appsettings.json

{
  "LLamaOptions": {
    "Models": {
        "ModelPath": "<path-to-model>"
    }
  }
}

Create SignalR Service

Create Azure SignalR Service using az cli

resourceGroup=myResourceGroup
signalrName=mySignalRName
region=eastus

# Create a resource group.
az group create --name $resourceGroup --location $region

az signalr create -n $signalrName -g $resourceGroup --sku Premium_P1

# Get connection string for later use.
connectionString=$(az signalr key list -n $signalrName -g $resourceGroup --query primaryConnectionString -o tsv)

Edit the src/appsettings.json and copy-paste the connectionString to the following property:

{
  "Azure": {
    "SignalR": {
      "ConnectionString": "<connection-string>"
    }
  }
}

Start the sample

cd src
dotnet run

Play with the sample

You can normally group chat with other people using the webpage. And you can also type in content starting with @llama, e.g., @llama how are you to contact with llama2 model. And llama2 will broadcast the response to all paticipants.

NOTE: Relatively small model will result in bad conversation quality. Use CPU only will result in very slow response.

Alt text

Details in the sample

The sample uses LlamaSharp which is a C# binding of llama.cpp. Llama.cpp is a runtime which is responsible for contacting with the languange model. LlamaSharp provides some high-level APIs and also provide stateful context which means it can "remeber" the context you've just ask.

The sample uses Azure SignalR which is a managed SignalR service which provides reliablity and scalibity. It shared the same protocol as self-hosted SignalR library. In the sample, we created a ChatSampleHub and defined several hub method. Inference is the one when invoked by client will send the message to Llama2 and wait for the response tokens. The server will generate a unique ID per invocation and streamingly broadcast tokens together with the ID to all clients. For clients, when receiving a message from the server, it will generate a new div for a new ID to show the response from Llama2 or append to the existing div if the ID is exist.