Run, create, and share large language models (LLMs).
Note: Ollama is in early preview. Please report any issues you find.
- Download for macOS
- Download for Windows and Linux (coming soon)
- Build from source
To run and chat with Llama 2, the new model by Meta:
ollama run llama2
Ollama supports a list of open-source models available on ollama.ai/library
Here are some example open-source models that can be downloaded:
Model | Parameters | Size | Download |
---|---|---|---|
Llama2 | 7B | 3.8GB | ollama pull llama2 |
Llama2 13B | 13B | 7.3GB | ollama pull llama2:13b |
Llama2 70B | 70B | 39GB | ollama pull llama2:70b |
Llama2 Uncensored | 7B | 3.8GB | ollama pull llama2-uncensored |
Code Llama | 7B | 3.8GB | ollama pull codellama |
Orca Mini | 3B | 1.9GB | ollama pull orca-mini |
Vicuna | 7B | 3.8GB | ollama pull vicuna |
Nous-Hermes | 7B | 3.8GB | ollama pull nous-hermes |
Nous-Hermes 13B | 13B | 7.3GB | ollama pull nous-hermes:13b |
Wizard Vicuna Uncensored | 13B | 7.3GB | ollama pull wizard-vicuna |
Note: You should have at least 8 GB of RAM to run the 3B models, 16 GB to run the 7B models, and 32 GB to run the 13B models.
ollama pull llama2
This command can also be used to update a local model. Only updated changes will be pulled.
ollama run llama2
>>> hi
Hello! How can I help you today?
For multiline input, you can wrap text with """
:
>>> """Hello,
... world!
... """
I'm a basic program that prints the famous "Hello, world!" message to the console.
$ ollama run llama2 'tell me a joke'
Sure! Here's a quick one:
Why did the scarecrow win an award? Because he was outstanding in his field!
$ cat <<EOF >prompts.txt
tell me a joke about llamas
tell me another one
EOF
$ ollama run llama2 <prompts.txt
>>> tell me a joke about llamas
Why did the llama refuse to play hide-and-seek?
nobody likes to be hided!
>>> tell me another one
Sure, here's another one:
Why did the llama go to the bar?
To have a hay-often good time!
$ ollama run llama2 "summarize this file:" "$(cat README.md)"
Ollama is a lightweight, extensible framework for building and running language models on the local machine. It provides a simple API for creating, running, and managing models, as well as a library of pre-built models that can be easily used in a variety of applications.
Pull a base model:
ollama pull llama2
Create a Modelfile
:
FROM llama2
# set the temperature to 1 [higher is more creative, lower is more coherent]
PARAMETER temperature 1
# set the system prompt
SYSTEM """
You are Mario from Super Mario Bros. Answer as Mario, the assistant, only.
"""
Next, create and run the model:
ollama create mario -f ./Modelfile
ollama run mario
>>> hi
Hello! It's your friend Mario.
For more examples, see the examples directory. For more information on creating a Modelfile, see the Modelfile documentation.
ollama list
ollama rm llama2
Ollama bundles model weights, configurations, and data into a single package, defined by a Modelfile.
Install cmake
and go
:
brew install cmake
brew install go
Then generate dependencies and build:
go generate ./...
go build .
Next, start the server:
./ollama serve
Finally, in a separate shell, run a model:
./ollama run llama2
See the API documentation for all endpoints.
Ollama has an API for running and managing models. For example to generate text from a model:
curl -X POST http://localhost:11434/api/generate -d '{
"model": "llama2",
"prompt":"Why is the sky blue?"
}'
- LangChain and LangChain.js with a question-answering example.
- Continue - embeds Ollama inside Visual Studio Code. The extension lets you highlight code to add to the prompt, ask questions in the sidebar, and generate code inline.
- LiteLLM a lightweight python package to simplify LLM API calls
- Discord AI Bot - interact with Ollama as a chatbot on Discord.
- Raycast Ollama - Raycast extension to use Ollama for local llama inference on Raycast.
- Simple HTML UI for Ollama
- Emacs client for Ollama