Hands-On Expert-Level Contract Summarization Using LLMs
Discover how we use LLMs for robust high-quality contract summarization and understanding for our legal clients as well as other businesses.
In a February 2023 paper, Meta claimed that its 13-billion-parameter LLaMA large language model (LLM) outperformed GPT-3. Is its open-source clone, OpenLLaMA, equally good? How does it fare on common language tasks? Can you reliably use it for your business? Find out in this article.
LLaMA is an LLM that Meta licensed for research use. LLaMA grew popular when Meta’s model weights got leaked, possibly unlawfully. That enabled an ecosystem of custom LLaMA models to emerge. Such self-hosted, custom LLaMA models are potentially beneficial to businesses to automate many workflows, but the associated licensing and legal risks are not worth it.
In this scenario, businesses prefer less risky LLMs whose weights and code are open-sourced with clear licenses. OpenLLaMA is one such family of open-sourced, permissively licensed LLMs created by OpenLM Research. It aims to reproduce the LLaMA model accurately based on information from the LLaMA research paper and other sources.
The OpenLLaMA model weights as well as the EasyLM framework used to model and train them are published under the business-friendly Apache 2.0 license. Businesses are free to use, modify, or install them for commercial purposes without any restrictions.
In the rest of this article, we explore the OpenLLaMA models, their internals, and their capabilities.
As of August 2023, OpenLLaMA has published five OpenLLaMA models:
All five models are available for both the Hugging Face transformers and Flax libraries.
Keep in mind that these OpenLLaMA models are base language models (like OpenAI's GPT-3), not instruction-following models like ChatGPT. To turn them into conversational assistants, you must fine-tune them on chat datasets, like databricks-dolly-15k, using either supervised fine-tuning or reinforcement learning from human feedback.
The second version models are better in two aspects:
But first, what's under the hood of these OpenLLaMA models?
Like LLaMA, OpenLLaMA replicates the classic transformer decoder architecture but with some improvements. OpenLLaMA implements most of the same architectural improvements in the LLaMA paper:
The latest OpenLLaMA version two models are trained on the following datasets:
The older first-version models are trained on just the entire RedPajama collection.
The OpenLLaMA models are trained with the following configuration:
The native format of the model checkpoints is Flax. EasyLM provides a conversion script to convert Flax to PyTorch models that you can load with Hugging Face transformers.
In the next section, we find out how OpenLLaMA does on common language tasks.
Since OpenLLaMA is a pure language model, the prompts must either ask for text completions or provide few-shot examples if you don't plan on fine-tuning it.
The tasks below demonstrate these use cases against the second version OpenLLaMA-7bv2 model.
For text generation, we started by prompting it to generate something factual:
We observed that:
I was able to recreate this pretty easily in base davinci GPT-3. The color coding provided for log probabilities shows that “Risk” as the first word becomes more and more likely as the list goes on. The list eventually just eats itself and produces the same thing over and over. Okay back to OpenLLaMA.
Next, I asked OpenLLaMA for something more creative, like a poem:
The first few lines of the generated text aren't great but aren't too bad either. It starts out having a poem feel even without any prompt instructions or goal state definition. It does fall into the same trap we saw above where it begins to repeat itself. A few solutions for this:
For a few-shot language task, we picked customer intent recognition. The idea was to supply the LLM with pairs of customer questions and corresponding intent categories. Would OpenLLaMA behave like a text classifier and output an intent category? This is a very popular model used in chatbot workflows as it helps us understand if we can leverage the models underlying training to answer the query, or if we need to leverage an external API such as inventory, mortgage calculator, or other services.
We supplied 11 few-shot examples with "Q:" being the customer queries and "A:" being the corresponding intent categories:
Then we added the query we're interested in classifying:
The model generated the following:
Observations:
Another surprise was that it was quite sensitive to the syntax of the last query. Instead of the query above, we asked this query with the only change being that it ended with "A:":
We expected it to start the completion with the intent category. But instead, it didn't generate any category at all in the beginning and then continued with garbage text:
We conclude that:
Since OpenLLaMA is trained on the StarCoder coding dataset, we tested its code generation capabilities with this prompt:
It generated this code:
The code is syntactically perfect and runs but is functionally wrong. It doesn't parse basic URLs like "https://www.google.com/somepage" correctly. Nonetheless, getting the syntax right is impressive. Like the other tasks, some fine-tuning is recommended, or stronger prompting with frameworks such as ReAct.
We next examine fine-tuning of OpenLLaMA.
To get good results from general-purpose LLMs like OpenLLaMA or DollyV2, you almost always need to fine-tune them on your custom datasets for your specific language tasks. For example, if you want a custom chatbot for your banking or financial service business, you must fine-tune them on your specific frequently asked questions or sample customer dialogues.
In this section, we explain a fine-tuning and deployment workflow for custom LLMs using the MosaicML platform. MosaicML orchestrates all the infrastructure provisioning, training dataset streaming, and training session monitoring you'd need to easily adapt an LLM to your needs.
MosaicML brings several beneficial features:
MosaicML is pretty awesome and really does make it very easy to train any models. One of the most difficult things we work with clients on is the actual deployment of these models. Everyone is eager to train their own LLM for their specific use case, but nobody wants to talk about deployment! Deploying these and managing the infrastructure required is a challenging task on your own. MosaicML lets you do that with costs very similar to AWS.
This section is an end-to-end walkthrough of the MosaicML fine-tuning workflow for your LLM.
Upload your training data to cloud storage like AWS S3, Azure, or any popular S3-compatible provider.
MosaicML requires the data to be set up for data ingestion from multiple clouds using its Streaming framework.
MosaicML orchestrates distributed training using Docker containers. Since OpenLLaMA training requires the EasyLM software, create a Docker image that can run the EasyLM training script.
Create a configuration file similar to the one shown below, changing the details to match your environment and Docker images.
Infrastructure provisioning is as simple as two or three lines in that file. How many GPUs or TPUs do you need? And what type? MosaicML handles the rest behind the scenes.
The command to fine-tune an OpenLLaMA model (include it in the "command:" section of the configuration file) would look something like this:
Use the MosaicML command-line utility to start the fine-tuning run:
Use the same utility to monitor your runs:
Once the run ends, MosaicML uploads the trained model to the Hugging Face Hub or another repository. The next section covers how you can deploy your fine-tuned model.
You can also deploy your fine-tuned model to production with these steps.
The deployment configuration file tells MosaicML details like:
An example deployment configuration is shown below (for another model; change it to match OpenLLaMA):
Use the command-line utility to deploy the model. MosaicML does all the provisioning needed to publish it:
The model is automatically deployed and made available at an endpoint.
Each deployed model is given a unique name by MosaicML. You need that to send requests to the model. Run the utility to list all the deployments:
You'll see your deployments listed:
You can submit prompts to the deployed model from your applications. The deployed model is identified by its MosaicML name. Use the following code to submit user prompts and get completions from the LLM:
In this article, we saw that OpenLLaMA's results are far from perfect. Using strategies like fine-tuning and prompt optimization, we specialize in integrating LLMs into your business workflows to help your employees and customers ask natural language questions and get informative answers and insights. Contact us for a free consultation.