Hands-On Expert-Level Contract Summarization Using LLMs

October 31, 2024

Contract review and understanding has become the focus point in the legal ai space for a number of reasons. With the rise of LLMs and domain focused fine-tuning the idea that billable hours need to be racked up understanding these documents has put the space at an inflection point of ROI. Many companies are turning to LLMs to create understanding and data extraction from these documents quickly to allow this time be spent completing other more high touch legal tasks.

In this article, we take a deep dive into our techniques for implementing contract understanding systems using LLMs and other popular methods. We cover all the important summarization approaches like extractive, abstractive, blended, and aspect-based. We also explain how we address common concerns we hear from our clients, like confidentiality and data security, when applying LLMs to contracts.

Who Uses Contract Summarization?

Extractive summarization focuses on selecting key sentences from contracts without altering the original contract text. It strives to retain all the legal language, legal jargon, stylized phrasing, punctuations, and even the typos and grammatical mistakes of contracts exactly as written.

Such verbatim reproduction of the original text is preferred by legal professionals like:

Corporate lawyers
Judges
Paralegals
Arbitrators
Legal academics
Legal advisors and analysts
Patent attorneys and agents
Trust officers
Bankruptcy administrators

With the growth of ai in legal use cases, a new group of users is very interested in extractive summarization. Users of “chat with document” or “document Q&A” systems use these outputs frequently.

In contrast to verbatim reproduction, abstractive summarization tries to paraphrase and simplify the sentences in contracts to make them accessible and comprehensible to other professional stakeholders like:

Executives who must understand the business implications of contracts
Procurement and supply chain specialists tasked with understanding the operational expectations and consequences of contracts
Accounting professionals who must analyze the financial consequences of contracts
Risk and compliance officers tasked with identifying possible long-term risks in contracts
Investors and asset managers interested in the financial outcomes of contracts and agreements
Claims adjusters in the insurance industry who must analyze customer's insurance policies

These are diverse roles with very different interests, perspectives, and areas of focus. For example, a summary of a contract generated to assist an accountant will be very different from one for a procurement specialist. So, abstractive summarization techniques must always include additional knowledge and data relevant to the target stakeholders and industries, as we shall demonstrate later.

Datasets Used in This Case Study

For our demonstrations, we use contracts from the following sources:

Contract Understanding Atticus Dataset (CUAD)
Merger Agreement Understanding Dataset (MAUD)
Master service agreements (MSAs) and other agreements that are part of corporate filings in the Security and Exchange Commission's Electronic Data Gathering, Analysis, and Retrieval (EDGAR) system

Extractive Contract Summarization

In this section, we explain and demonstrate our extractive summarization tips and tricks. As explained earlier, such summaries are preferred by legal professionals and associated roles who give a lot of importance to the exact wording of a contract.

Baseline Summaries

An end user like a law intern or a paralegal who's been asked to summarize the key clauses of a contract will probably try ChatGPT or similar tools first since they're accessible, user-friendly, and often free for everyone including small and medium-sized law firms.

So we'll first try them out and show what happens.

Let's Start With ChatGPT as the Baseline

Here’s a two page contract from the CUAD dataset:

We prompt ChatGPT as follows:

This time, ChatGPT doesn't complain and generates a well-formatted extractive summary with all the selected sentences kept verbatim.

But the big question here is: Are these sentences, selected by the LLM, the most "key" or "salient" from the point of view of the end user? We'll revisit this question later.

A Baseline Summary From the OpenAI Playground

While ChatGPT is the preferred tool for end users, OpenAI provides another web application called the OpenAI Playground. Although it's for developers to prototype LLM experiments, any tech-savvy end user can sign up and use it.

Its advantages include letting you use the full 128K-token context, changing the LLM, and playing with control knobs like the maximum number of output tokens and the temperature (that decides whether the LLM will be precise or creative).

Here's what happens when we give it the long 106-page contract with the same prompt above:

The assistant extracts this summary:

The selected sentences are kept verbatim which is a good thing.

But we can also observe these shortcomings:

It has only three sections. The other eight sections of the contract aren't included.
This is probably due to the rather small output token limits of 4K-8K for OpenAI as well as most other LLMs.

What Can We Conclude From These Baseline Summaries?

The above experiments to generate baseline summaries with accessible tools reveal the following problems:

Many LLMs can't handle long inputs: Even paid services like ChatGPT have problems with long inputs. There's a size threshold above which these tools are useless.
Many LLMs can't generate lengthy summaries either: Even if they can handle long inputs, replies are limited to 4K-8K tokens for most LLMs, even commercial ones. For a long contract, a comprehensive summary too must be proportionately long. We can't restrict it based on the technical limitations of LLMs.
The quality of the summaries is difficult to judge: Are the selected sentences the most salient for every end user role? Would an intellectual property (IP) lawyer agree with a law intern assisting a judge on which sentences are salient, given their different areas of focus?

For the length problems — both input and output — some kind of divide-and-conquer strategy is required. For personalizing the abstract idea of salience to each end user, additional knowledge about their perceptions must be integrated into the summarization workflows.

These are the solutions that we'll explain next.

Handling Long Contracts

The obvious approach is to break up long contracts into smaller chunks, summarize each chunk, and combine all the chunk summaries into a final summary as shown below.

Since the summary is extractive, the combiner need not do anything special to maintain continuity and coherence while joining chunk summaries. Instead, it simply reproduces the level of coherence already present in the original contract; if it's strong there, it'll be strong in the final summary as well.

Chunking for Extractive Summarization

The key decision here is how to break up the contract into smaller chunks. Luckily, most contracts are already structured as distinct legal sections (like the 11 articles in the long contract above). We just turn each top-level logical section into a chunk. In case a chunk is too long, we can divide it again into available subsections or groups of sentences. The extractive nature of the task means we don't have to bother about where we split these chunks as long as it's not arbitrary.

The chunking and recombining approach solves both the input and summary length restrictions of LLMs, enabling you to summarize even contracts that run into thousands of pages.

Improved Extractive Summary After Chunking

With chunking implemented, the extractive summary covers the entire contract, is more comprehensive, and isn't restricted by LLM output token limits.

Each chunk is separately summarized with a prompt like this followed by the chunk text:

A part of the combined extractive summary generated for the long contract is shown below:

Personalizing Saliency Through Fine-Tuning

We have been using the phrase "key salient sentences" to guide the LLM. However, nowhere have we explicitly told the LLM how to judge the salience or importance of a sentence. We have implicitly assumed that all end users have a shared perception of what's salient.

But common sense says that's not likely at all. As we said earlier, what an IP lawyer considers as salient will be different from what a civil court judge's intern perceives.

How do we implement these different perceptions into the summarization workflow? In the LLM ecosystem, this process is called LLM alignment fine-tuning.

First, we need to collect a dataset of contract chunks and selected sentences. We do this using these techniques:

We interview and understand the roles of the end users who'll be using our summarization system.
We maintain a library of training data for common roles like IP lawyers, patent attorneys, and so on. If an end user's role matches one that's already in our library, we just use that training data to fine-tune a separate summarization model specifically for that role.
If the role isn't already present in our library, our system provides an LLM training workflow with a user-friendly graphical user interface to let the user train a new fine-tuned LLM specifically for that role.

Our system is depicted below:

The guided training workflow is meant to create a new training dataset where the end user chooses their salient sentences and also explains why they chose them:

The explanations enable the end user to explicitly convey to the LLM their reasoning behind choosing those sentences as salient. These explanations are appended to the expected outputs of the training set like this:

Next, the LLM is fine-tuned using parameter-efficient fine-tuning (PEFT) and direct preference optimization (DPO) techniques for rapid training. When the LLM is fine-tuned on a dataset like the one shown above, it learns to not only select salient sentences like a particular end user but also to implicitly reason about it in the same way as them.

Since the explanations are clearly marked with [EXPLANATION] tags in the LLM's replies, our system is able to easily segregate the summary results from their reasoning. This reasoning is shown to the end user until they are happy with the fine-tuned model's results.

Once the fine-tuned LLM goes live, the explanations are silently discarded and only the summaries are shown to end users.

Integrating Clause Libraries

Another improvement we make for our LLMs to generate effective contract summaries is to fine-tune them on clause libraries. Every legal specialization expects some standard clauses with standard phrasing to be present in their respective contracts. This enables our summarizer to do two important checks at once:

Check if a clause in the contract matches all the important terms in a standard clause from the library
Check if important clauses and key points are missing from a section

Our summarization systems use ReAct reasoning and clause library matching agents as follows:

The agent essentially implements a typical retrieval-augmented generation (RAG) system. It calculates embeddings for all the entries in a clause library and stores them in a vector database.
Whenever the LLM runs into a contract clause, it's configured to invoke the clause library matching agent.
The agent uses an LLM to classify the given clause into a category. A category corresponds to a legal specialization or area. For example, there are IP clauses, non-compete clauses, indemnity clauses, and so on.
The agent looks up the standard clauses for that category.
The agent calculates the embedding for the contract's clause and then matches it against those of the standard clauses for its category.
For the closest matches, it does string comparisons and LLM-based semantic evaluations to determine how standard the contract clause is.
If the contract clause closely matches a standard clause, the agent tells the LLM to continue. If not, the agent tells the LLM to include a warning for that clause in its summary.
Finally, the LLM invokes the agent one last time to enquire about missing standard clauses. The agent compares all the clauses it received so far against the list of standard clauses, identifies omissions, and sends the omitted clauses to the LLM.
The LLM generates the summary and includes the warnings and omissions it got from the clause library agent.

Integrating Case Law

An optional functionality is the inclusion of relevant case law for clauses in the contract.

Our systems integrate with case law lookup services like LexisNexis and the Caselaw Access Project.

A case law search agent is configured to be invoked by the LLM. For each clause in a contract, the LLM invokes the agent with the clause. The agent then searches for that clause in the case law search engine to find matches relevant to the contract's industry, jurisdiction, type of contract, and similar criteria.

This integration enables us to generate highly informative summaries for the legal professionals working with our clients.

Abstractive Contract Summarization

In this section, we cover abstractive summarization of contracts. That means paraphrasing and simplifying the wording of a contract for non-legal stakeholders. The problems and shortcomings we talked about under extractive summarization are also relevant to abstractive summarization. We implement the same techniques to overcome those issues here as well. However, abstractive summarization also has some unique issues of its own.

Handling Long Contracts

For extractive summarization, we can blindly combine the chunk summaries without doing anything special to maintain continuity and coherence because those aspects are already present in the contract and the summary is just a verbatim subset.

In contrast, abstractive summaries paraphrase the text of the chunks. So continuity and coherence between consecutive chunks are not guaranteed. We must take special measures to ensure them.

The technique we use is a variant of the topic-infused chunking similar to the logic shown below:

It works like this:

We split the contract chunks based on logical sections like before.
For each chunk, we take the previous and following chunks. We extract the top key topic from each using classifier prompts for an LLM.
A blending prompt is then used to guide the LLM to blend the top key topics into the current chunk.
The outcome is a new chunk that has a bit more information about what topics are included just before and after the chunk.
This information helps the paraphrased chunk summaries maintain continuity with their neighbors.

Personalizing Abstract Summaries

For abstract summaries, we must guide the LLM on which aspects to focus on and what information to discard by combining various techniques.

One technique is suitably guiding the LLM via the prompt. Our systems provide default prompts but also allow end users to supply custom prompts to guide the LLM better.

For example, an accountant may only be interested in the payment terms and financial implications of a service contract. Once we have split the contract into logical chunks, we apply a custom prompt like this to each chunk:

These chunk summaries are then combined and optionally summarized once more. We get a simplified contract summary that only contains information of interest to an accountant as shown below:

If many end users of a role desire these summaries to contain particular details, it's more efficient to create a shared LLM that's fine-tuned for their requirements rather than ask each user to craft detailed prompts.

For such fine-tuning, we follow the same approach as saliency personalization explained earlier. Our systems guide these end users to create training sets consisting of contract fragments, their desired summary for each fragment, and any reasoning involved.

These fine-tuned LLMs learn to summarize and reason just like the end users who trained them.

Blended Contract Summarization

Based on our clients' use of our summarization systems and their feedback, we have observed that for many end users, their summarization needs are not strictly extractive or abstractive. We have seen lawyer clients ask for simplified summaries and procurement clients ask the LLM to include named entities and specific information in the summaries.

So we often deploy a technique to blend extractive and abstractive summarization for users who need them. The generated summary is a mix of abstractive and extractive summarization based on key information in the input document. The number of entities varies too based on the amount of exact information requested.

The technique uses our custom prompt optimization framework that focuses on dynamically building the prompt at runtime based on the input. It enables us to dynamically put relevant information in front of our input to guide the LLM toward blended summarization.

‍

Aspect-Based Contract Summarization

Aspect-based summarization is a more advanced personalization technique for summaries. Let's understand what it is.

So far, we have assumed that an end user is only interested in one type of summary. While that may be true for specialized roles like accountants, it's not true for business executives with wider responsibilities. For example, the same management user may put on different hats and prompt our system with different key terms to create three different summaries, each focusing on one of the following aspects:

Long-term financial expenses incurred due to the terms of the agreement
Immediate procurement requirements to fulfill the contract terms
Regulatory compliance concerns because the other party in the contract is in a high-risk country

The user's requirements may not even be this broad but far more narrow. For example, they may ask for a summary of all contractual terms that must be fulfilled by next week.

Each of these — expenses, procurement requirements, regulatory concerns, and time-limited expectations — constitute a different aspect of the contract. It's inefficient to create a new fine-tuned LLM for every such aspect that some end user is interested in, often temporarily.

We tackle such dynamic use cases by implementing aspect-based summarization capabilities. Based on the aspects the end user is interested in, we synthesize a dynamic prompt that guides the LLM toward the expected summary. We use a large library of predefined aspect prompts to generate these prompts accurately and dynamically.

The prompt provides very detailed instructions to the LLM on which keywords and semantic concepts to focus on, which ones to ignore, and how to present the aspect-relevant information in the contract coherently.

Confidentiality, Security, and Privacy Concerns Over LLMs for Legal Tasks

Many businesses remain worried about using third-party LLM services like OpenAI. Are their confidential contracts being used for AI training and inadvertently ending up in the hands of their competitors? These are issues that worry many professionals, and it's likely to stay that way for a few more years until LLMs become ubiquitous. Though LLMs are other third-party services like Outlook Mail or Google Docs, they have not yet gained the trust of law firms and other businesses.

What's a good solution to these concerns? For our clients that want higher confidentiality and data security for their contracts, we deploy on-premise open-source LLMs like Llama 3 and Mistral. Since they're on-premise and self-hosted, the contracts they receive and the summaries they generate are all under their own control, not OpenAI or other cloud services. Additionally, all the summarization techniques described above can be used with these open-source LLMs too.

Proficient Contract Summarization With LLMs

In this article, we explained the approaches, tips, and tricks we use to accurately and comprehensively summarize contracts to help different stakeholders.

Contact us if you want to deploy such customized contracts and other legal document understanding systems for your business based on state-of-the-art natural language processing and LLM techniques.

‍

Lets Talk

Hands-On Expert-Level Contract Summarization Using LLMs

Who Uses Contract Summarization?

Datasets Used in This Case Study

Extractive Contract Summarization

Baseline Summaries

Let's Start With ChatGPT as the Baseline

A Baseline Summary From the OpenAI Playground

What Can We Conclude From These Baseline Summaries?

Handling Long Contracts

Chunking for Extractive Summarization

Improved Extractive Summary After Chunking

Personalizing Saliency Through Fine-Tuning

Integrating Clause Libraries

Integrating Case Law

Abstractive Contract Summarization

Handling Long Contracts

Personalizing Abstract Summaries

Blended Contract Summarization

Aspect-Based Contract Summarization

Confidentiality, Security, and Privacy Concerns Over LLMs for Legal Tasks

Proficient Contract Summarization With LLMs

Keep Reading

How SAP/ERP AI Chatbots Can Boost Your Sales and Customer Satisfaction

[New Feature] Introducing In-Category Product Data Mapping & Analysis - Improve your understanding of products relevance in a category

92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace

97% Accurate 5 Level Deep Product Categorization for Ecommerce Solutions & Ad Placement Company

Turbocharge Dialogflow Chatbots With LLMs and RAG

The Exact Steps to Implement Custom Ai Shopify Chatbots for Customers

How to Deploy Custom WordPress Chatbots for Happier Customers

Improve Your Customer Workflow With AI: How To Build a Zendesk Chatbot using ReAct

How to Build a Custom Salesforce Chatbot with our Powerful Framework

How Width.ai Builds In-Domain Conversational Systems using Ability Trained LLMs and Retrieval Augmented Generation (RAG)

Everything ML