Why Developers Are Flocking to LLaMA, Meta’s Open Source LLM
When it comes to generative AI, the open source community has embraced Meta AI’s LLaMA (Large Language Model Meta AI), which was released in February. Meta made LLaMA available in several sizes (7B, 13B, 33B, and 65B parameters), but at first it was restricted to approved researchers and organizations. However, when it was leaked online in early March for anyone to download, it effectively became fully open source.
To get an understanding of how developers are using LLaMA, and what benefits it gives them over similar LLMs from the likes of OpenAI and Google, I spoke to Sebastian Raschka from Lightning AI. He told me that developers are attracted to Meta’s LLaMA because — unlike with GPT and other popular LLMs — LLaMA’s weights can be fine-tuned. This allows devs to create more advanced and natural language interactions with users, in applications such as chatbots and virtual assistants.
Raschka should know. His role at Lightning AI is “Lead AI Educator,” reflecting both his academic background (he was previously a University professor in statistics) and his high-profile social media presence (he has 192,000 followers on Twitter and runs a Substack newsletter entitled Ahead of AI).
LLaMA vs. GPT: Release the Weights!
LLaMA isn’t that different from OpenAI’s GPT 3 model, Raschka said, except that Meta has shared the weights. The other major LLMs have not done that.
In the context of AI models, “weights” refers to the parameters learned by a model during the training process. These parameters are stored in a file and used during the inference or prediction phase.
What Meta did, specifically, was release LLaMA’s model weights to the research community under a non-commercial license. Other powerful LLMs, such as GPT, are typically only accessible through limited APIs.
“So you have to go through OpenAI and access the API, but you cannot really, let’s say, download the model or run it on your computer,” said Raschka. “You cannot do anything custom, basically.”
In other words, LLaMA is much more adaptable for developers. This is potentially very disruptive to the current leaders in LLM, such as OpenAI and Google. Indeed, as revealed by a leaked internal Google memo this week, the big players are already concerned:
“Being able to personalize a language model in a few hours on consumer hardware is a big deal, particularly for aspirations that involve incorporating new and diverse knowledge in near real-time.”
As noted LLM developer Simon Willison put it, “while OpenAI and Google continue to race to build the most powerful language models, their efforts are rapidly being eclipsed by the work happening in the open source community.”
Use Cases
So what are some of the use cases for applications being built on top of LLaMA?
Raschka said that finance and legal use cases are good candidates for fine-tuning. However, he noted that larger companies may want to go beyond just fine-tuning and instead pre-train the entire model using their own data. Classification tasks are also popular so far — such as toxicity prediction, spam classification, and customer satisfaction ranking.
According to Raschka, using LLaMA can provide improved performance in apps compared to traditional machine learning algorithms, with accuracy improvements ranging from 5% to 10%. Mostly, this can be achieved just with fine-tuning.
“It’s something that is also accessible to people,” he said, “because you don’t need to pre-train the model. You can just fine-tune it, essentially.”
LoRA and Other Tools
One of the tools developers can use to fine-tune LLaMA is LoRA (Low-Rank Adaptation of Large Language Models), which is available for free on Microsoft’s GitHub account. I asked Raschka how this works.
He began by saying there are various techniques for fine-tuning LLMs, such as hard tuning, soft tuning, prefix tuning, and adapter methods. He explained that the adapter method is attractive because it allows training of the whole LLM, while keeping the rest of the transformer frozen — which results in smaller parameters and faster training time. LoRA is one type of adapter method and Raschka said it uses a mathematical trick to decompose large matrices into smaller matrices, resulting in fewer parameters and more storage efficiency. In effect, this means you can do the fine-tuning in much quicker time.
“When I do the smaller method, where I only have these intermediate layers like LoRA, it takes only one to three hours instead of 18 hours on the same data set, basically. So it’s an advantage because you have smaller parameters.”
Techniques like LoRA are useful for deploying LLMs to multiple customers, he added, as it only requires saving the small matrices.
Devs and Fine-Tuning
Fine-tuning is a step beyond prompt engineering, so I asked Raschka whether developers will need to learn how to do it?
Raschka thinks that understanding how to use language models will be a useful skill for developers, but it’s not necessary for them to be in charge of fine-tuning the models at their company unless they have very specific needs. For small companies, they can use a general tool like GPT, and for larger companies he thinks there will be a team member who is in charge of fine-tuning the models.
What developers are definitely interested in is implementing AI models into their existing applications. This is where Raschka’s employer, Lightning AI, comes in. It offers an open source framework called PyTorch Lightning, which is used for implementing deep learning models. Lightning AI also offers cloud access and helps users deploy machine learning systems on the cloud. Incidentally, the creator of PyTorch Lightning, William Falcon, was a Ph.D. intern at Facebook AI Research during 2019 — which likely influenced Lightning AI’s support of LLaMA.
Also worth noting: Lightning AI has its own implementation of the LLaMA language model called Lit-LLaMA, which is available under the Apache 2.0 license. Researchers from Stanford University have also trained a fine-tuned model based on LLaMA, called Alpaca.
Conclusion
LLaMA does seem like a great option for developers wanting more flexibility in using large language models. But as Raschka points out, while fine-tuning is becoming increasingly accessible, it is still a specialized skill that may not be necessary for every developer to learn.
Regardless of whether or not they do the fine-tuning, developers increasingly need to understand how to use LLMs to improve certain tasks and workflows in their applications. So LLaMA is worth checking out, especially since it’s more open than GPT and other popular LLMs.