92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace
How we were able to reach 97% on top level categories, and 92% on bottom level categories with a multilingual dataset for a customer of Pumice.ai.
Contract review and understanding has become the focus point in the legal ai space for a number of reasons. With the rise of LLMs and domain focused fine-tuning the idea that billable hours need to be racked up understanding these documents has put the space at an inflection point of ROI. Many companies are turning to LLMs to create understanding and data extraction from these documents quickly to allow this time be spent completing other more high touch legal tasks.
In this article, we take a deep dive into our techniques for implementing contract understanding systems using LLMs and other popular methods. We cover all the important summarization approaches like extractive, abstractive, blended, and aspect-based. We also explain how we address common concerns we hear from our clients, like confidentiality and data security, when applying LLMs to contracts.
Extractive summarization focuses on selecting key sentences from contracts without altering the original contract text. It strives to retain all the legal language, legal jargon, stylized phrasing, punctuations, and even the typos and grammatical mistakes of contracts exactly as written.
Such verbatim reproduction of the original text is preferred by legal professionals like:
With the growth of ai in legal use cases, a new group of users is very interested in extractive summarization. Users of “chat with document” or “document Q&A” systems use these outputs frequently.
In contrast to verbatim reproduction, abstractive summarization tries to paraphrase and simplify the sentences in contracts to make them accessible and comprehensible to other professional stakeholders like:
These are diverse roles with very different interests, perspectives, and areas of focus. For example, a summary of a contract generated to assist an accountant will be very different from one for a procurement specialist. So, abstractive summarization techniques must always include additional knowledge and data relevant to the target stakeholders and industries, as we shall demonstrate later.
For our demonstrations, we use contracts from the following sources:
In this section, we explain and demonstrate our extractive summarization tips and tricks. As explained earlier, such summaries are preferred by legal professionals and associated roles who give a lot of importance to the exact wording of a contract.
An end user like a law intern or a paralegal who's been asked to summarize the key clauses of a contract will probably try ChatGPT or similar tools first since they're accessible, user-friendly, and often free for everyone including small and medium-sized law firms.
So we'll first try them out and show what happens.
Here’s a two page contract from the CUAD dataset:
We prompt ChatGPT as follows:
This time, ChatGPT doesn't complain and generates a well-formatted extractive summary with all the selected sentences kept verbatim.
But the big question here is: Are these sentences, selected by the LLM, the most "key" or "salient" from the point of view of the end user? We'll revisit this question later.
While ChatGPT is the preferred tool for end users, OpenAI provides another web application called the OpenAI Playground. Although it's for developers to prototype LLM experiments, any tech-savvy end user can sign up and use it.
Its advantages include letting you use the full 128K-token context, changing the LLM, and playing with control knobs like the maximum number of output tokens and the temperature (that decides whether the LLM will be precise or creative).
Here's what happens when we give it the long 106-page contract with the same prompt above:
The assistant extracts this summary:
The selected sentences are kept verbatim which is a good thing.
But we can also observe these shortcomings:
The above experiments to generate baseline summaries with accessible tools reveal the following problems:
For the length problems — both input and output — some kind of divide-and-conquer strategy is required. For personalizing the abstract idea of salience to each end user, additional knowledge about their perceptions must be integrated into the summarization workflows.
These are the solutions that we'll explain next.
The obvious approach is to break up long contracts into smaller chunks, summarize each chunk, and combine all the chunk summaries into a final summary as shown below.
Since the summary is extractive, the combiner need not do anything special to maintain continuity and coherence while joining chunk summaries. Instead, it simply reproduces the level of coherence already present in the original contract; if it's strong there, it'll be strong in the final summary as well.
The key decision here is how to break up the contract into smaller chunks. Luckily, most contracts are already structured as distinct legal sections (like the 11 articles in the long contract above). We just turn each top-level logical section into a chunk. In case a chunk is too long, we can divide it again into available subsections or groups of sentences. The extractive nature of the task means we don't have to bother about where we split these chunks as long as it's not arbitrary.
The chunking and recombining approach solves both the input and summary length restrictions of LLMs, enabling you to summarize even contracts that run into thousands of pages.
With chunking implemented, the extractive summary covers the entire contract, is more comprehensive, and isn't restricted by LLM output token limits.
Each chunk is separately summarized with a prompt like this followed by the chunk text:
A part of the combined extractive summary generated for the long contract is shown below:
We have been using the phrase "key salient sentences" to guide the LLM. However, nowhere have we explicitly told the LLM how to judge the salience or importance of a sentence. We have implicitly assumed that all end users have a shared perception of what's salient.
But common sense says that's not likely at all. As we said earlier, what an IP lawyer considers as salient will be different from what a civil court judge's intern perceives.
How do we implement these different perceptions into the summarization workflow? In the LLM ecosystem, this process is called LLM alignment fine-tuning.
First, we need to collect a dataset of contract chunks and selected sentences. We do this using these techniques:
Our system is depicted below:
The guided training workflow is meant to create a new training dataset where the end user chooses their salient sentences and also explains why they chose them:
The explanations enable the end user to explicitly convey to the LLM their reasoning behind choosing those sentences as salient. These explanations are appended to the expected outputs of the training set like this:
Next, the LLM is fine-tuned using parameter-efficient fine-tuning (PEFT) and direct preference optimization (DPO) techniques for rapid training. When the LLM is fine-tuned on a dataset like the one shown above, it learns to not only select salient sentences like a particular end user but also to implicitly reason about it in the same way as them.
Since the explanations are clearly marked with [EXPLANATION] tags in the LLM's replies, our system is able to easily segregate the summary results from their reasoning. This reasoning is shown to the end user until they are happy with the fine-tuned model's results.
Once the fine-tuned LLM goes live, the explanations are silently discarded and only the summaries are shown to end users.
Another improvement we make for our LLMs to generate effective contract summaries is to fine-tune them on clause libraries. Every legal specialization expects some standard clauses with standard phrasing to be present in their respective contracts. This enables our summarizer to do two important checks at once:
Our summarization systems use ReAct reasoning and clause library matching agents as follows:
An optional functionality is the inclusion of relevant case law for clauses in the contract.
Our systems integrate with case law lookup services like LexisNexis and the Caselaw Access Project.
A case law search agent is configured to be invoked by the LLM. For each clause in a contract, the LLM invokes the agent with the clause. The agent then searches for that clause in the case law search engine to find matches relevant to the contract's industry, jurisdiction, type of contract, and similar criteria.
This integration enables us to generate highly informative summaries for the legal professionals working with our clients.
In this section, we cover abstractive summarization of contracts. That means paraphrasing and simplifying the wording of a contract for non-legal stakeholders. The problems and shortcomings we talked about under extractive summarization are also relevant to abstractive summarization. We implement the same techniques to overcome those issues here as well. However, abstractive summarization also has some unique issues of its own.
For extractive summarization, we can blindly combine the chunk summaries without doing anything special to maintain continuity and coherence because those aspects are already present in the contract and the summary is just a verbatim subset.
In contrast, abstractive summaries paraphrase the text of the chunks. So continuity and coherence between consecutive chunks are not guaranteed. We must take special measures to ensure them.
The technique we use is a variant of the topic-infused chunking similar to the logic shown below:
It works like this:
For abstract summaries, we must guide the LLM on which aspects to focus on and what information to discard by combining various techniques.
One technique is suitably guiding the LLM via the prompt. Our systems provide default prompts but also allow end users to supply custom prompts to guide the LLM better.
For example, an accountant may only be interested in the payment terms and financial implications of a service contract. Once we have split the contract into logical chunks, we apply a custom prompt like this to each chunk:
These chunk summaries are then combined and optionally summarized once more. We get a simplified contract summary that only contains information of interest to an accountant as shown below:
If many end users of a role desire these summaries to contain particular details, it's more efficient to create a shared LLM that's fine-tuned for their requirements rather than ask each user to craft detailed prompts.
For such fine-tuning, we follow the same approach as saliency personalization explained earlier. Our systems guide these end users to create training sets consisting of contract fragments, their desired summary for each fragment, and any reasoning involved.
These fine-tuned LLMs learn to summarize and reason just like the end users who trained them.
Based on our clients' use of our summarization systems and their feedback, we have observed that for many end users, their summarization needs are not strictly extractive or abstractive. We have seen lawyer clients ask for simplified summaries and procurement clients ask the LLM to include named entities and specific information in the summaries.
So we often deploy a technique to blend extractive and abstractive summarization for users who need them. The generated summary is a mix of abstractive and extractive summarization based on key information in the input document. The number of entities varies too based on the amount of exact information requested.
The technique uses our custom prompt optimization framework that focuses on dynamically building the prompt at runtime based on the input. It enables us to dynamically put relevant information in front of our input to guide the LLM toward blended summarization.
Aspect-based summarization is a more advanced personalization technique for summaries. Let's understand what it is.
So far, we have assumed that an end user is only interested in one type of summary. While that may be true for specialized roles like accountants, it's not true for business executives with wider responsibilities. For example, the same management user may put on different hats and prompt our system with different key terms to create three different summaries, each focusing on one of the following aspects:
The user's requirements may not even be this broad but far more narrow. For example, they may ask for a summary of all contractual terms that must be fulfilled by next week.
Each of these — expenses, procurement requirements, regulatory concerns, and time-limited expectations — constitute a different aspect of the contract. It's inefficient to create a new fine-tuned LLM for every such aspect that some end user is interested in, often temporarily.
We tackle such dynamic use cases by implementing aspect-based summarization capabilities. Based on the aspects the end user is interested in, we synthesize a dynamic prompt that guides the LLM toward the expected summary. We use a large library of predefined aspect prompts to generate these prompts accurately and dynamically.
The prompt provides very detailed instructions to the LLM on which keywords and semantic concepts to focus on, which ones to ignore, and how to present the aspect-relevant information in the contract coherently.
Many businesses remain worried about using third-party LLM services like OpenAI. Are their confidential contracts being used for AI training and inadvertently ending up in the hands of their competitors? These are issues that worry many professionals, and it's likely to stay that way for a few more years until LLMs become ubiquitous. Though LLMs are other third-party services like Outlook Mail or Google Docs, they have not yet gained the trust of law firms and other businesses.
What's a good solution to these concerns? For our clients that want higher confidentiality and data security for their contracts, we deploy on-premise open-source LLMs like Llama 3 and Mistral. Since they're on-premise and self-hosted, the contracts they receive and the summaries they generate are all under their own control, not OpenAI or other cloud services. Additionally, all the summarization techniques described above can be used with these open-source LLMs too.
In this article, we explained the approaches, tips, and tricks we use to accurately and comprehensively summarize contracts to help different stakeholders.
Contact us if you want to deploy such customized contracts and other legal document understanding systems for your business based on state-of-the-art natural language processing and LLM techniques.