Skip to content
/ cria Public

Tiny inference-only implementation of LLaMA

Notifications You must be signed in to change notification settings

recmo/cria

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 

Repository files navigation

Inference-only implementation of LLaMA in plain NumPy

It uses NumPy, so it can run without a GPU. It also uses memory mapped files to load the weights, so you can run it with little memory.

Inspired by picoGPT.

Besides NumPy, it currently also has a dependency on Google's SentencePiece for tokenization.

About

Tiny inference-only implementation of LLaMA

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages