Hands-On Expert-Level Contract Summarization Using LLMs
Discover how we use LLMs for robust high-quality contract summarization and understanding for our legal clients as well as other businesses.
Modern developments in artificial intelligence (AI), deep learning, and natural language processing (NLP), such as the advent of large language models like GPT-4 and ChatGPT, are extremely useful in the finance industry to increase revenues, automate processes, and streamline workflows. Understand more from this deep dive on the applications of NLP in finance.
NLP can be very useful for a variety of roles and tasks in the finance sector. A few examples include:
In the following sections, we help you understand, in-depth, how you can apply NLP in finance.
In financial services, analysts have to regularly read long-form financial reports running into hundreds of pages to gain insights. For example, each annual report of Silicon Valley Bank, a bank that went bust recently, contains about 200 pages of explanations and tables. Reading them page to page takes a lot of time and effort. Perhaps, if the information was more accessible and queryable, external analysts would have been able to notice an increased probability of bankruptcy and send out warnings.
For such long-form financial documents, you can use modern NLP models to summarize all the important information while retaining key details verbatim, all within a few seconds. This mix of abstractive and extractive summarization is called blended summarization and is very useful for any use case where key details must be kept unchanged.
The biggest limitation of the GPT models is their caps on the number of tokens. GPT-3 models were limited to about 4,000 sub-word tokens while GPT-4 models support 8,000-32,000 tokens. Any document that exceeds those limits must be broken into smaller pieces (known as chunks) and sent to GPT separately.
But doing so creates new problems like loss of context or repeating information which leads to inaccurate summaries. To solve these issues, we use a custom GPT-based pipeline (shown below) that offers clever solutions like chunking and prompt optimization.
Let's understand its special components and capabilities.
The chunking algorithm's job is to split long documents with minimal loss of context, while retaining all its fine-grained, specific context. That means, the pipeline should not split on clauses and paragraphs that supply important context to their neighborhoods. At the same time, retention of context must be optimized because the longer the input prompts get, the higher the risk of the model ignoring too much context and producing bad summaries.
So given a document, the pipeline decides the optimum balanced size for each chunk and, using custom logic, decides how much of context from the earlier chunk must be included. Next, it builds up a chunk-specific custom prompt for GPT that has been found to produce a better-quality summary.
The GPT models produce better results when we prefix the actual input chunk in the prompt with examples of input text and ideal summaries. This is called few-shot learning. Additionally, if the prompt examples are also made relevant to the input text, the generated summaries are of much better quality.
The task of pulling in the most relevant examples for each input text is called prompt optimization. For that, we maintain a database of gold reference prompt examples and summaries. Given an input text, we use a semantic similarity algorithm to pick the most relevant prompts and summaries and inject them in the prompt dynamically. This ensures that the few-shot examples in the prompt are highly relevant to the input text and summarization task.
Blended summarization is achieved by using the right dynamic prompts for each input chunk with the right prompt examples to show the GPT model that we want abstractive summaries for some sections and extractive for others. A database of prompts specific to finance or banking is maintained for this purpose.
An output algorithm recombines the summaries generated for each chunk such that they don't feel choppy to a reader but instead feel contiguous with good flow.
Finance, accounting, and banking professionals often need information from complex documents to make decisions on investments, loan approvals, trade reports, and other financial operations. Such information includes:
Let's see some practical applications.
Finance professionals may need to analyze Certificate of Incorporation (COI) documents to understand a company's legal structure, capitalization, and key business details for investment, evaluating initial public offerings, or due diligence prior to a merger or acquisition. Reviewing such documents can also help finance professionals to ensure compliance with laws and regulations.
The important information in such documents includes:
Such information can be identified and stored in a database using automated information extraction pipelines that accept digital formats or scans of these documents and accurately extract all the essential information using AI, computer vision, and NLP. We'll explain how it works in a later section.
Another common application is information extraction from invoices and purchase orders so that accounting professionals can either automate or optimize tasks like:
A third application is the automation of your business workflows, such as loan approvals. Such automated pipelines deliver enormous savings on manual labor and costs. They also inject some neutrality in the evaluation process to help prevent fraud and collusion.
Intelligent document processing pipelines enable your business to reliably automate the understanding of complex documents and extraction of information from them instead of using manual labor or unreliable semi-automated, template-based approaches. Regardless of the process you're trying to automate, these pipelines all work the same way with the following five stages.
Raw documents — with different formats like portable document format or image formats — are received, stored, and pre-processed to prepare them for the rest of the pipeline.
This is the primary stage where all the information extraction happens, with the following steps:
The extracted information is passed through rule engines that check the fields based on business policy rules. If it passes all the rules, it may be passed to more sophisticated statistical models. For example, a loan application may be sent to a loan default risk model or interest projection model to estimate its associated business value and risk.
The extracted information is stored in external systems like databases or ERP.
The extracted information is supplied to, or fetched by, other business workflows and acted upon. For example, based on loan default risk models or other business data analytics, loan applications are prioritized and sent for manual reviews and approvals by loan officers.
Regulatory compliance is one of the most challenging tasks in finance and banking. Non-compliance or mistakes in compliance carry risks like civil penalties, criminal prosecution, and damage to reputation. But compliance isn't simple. The difficulties include:
From a business perspective, compliance is often seen as a cost center with few benefits. So achieving full compliance with minimal cost and effort is a desired goal of all businesses. But its inherent and emergent complexities make it a difficult goal to achieve in practice.
The volume of documents being put out and constantly changed means that there is always niggling uncertainty over becoming inadvertently non-compliant simply due to ignorance about some minor change in some document. So compliance departments have no choice but to read them all.
AI and NLP can help with that and reduce compliance costs by reducing the time and labor expended on document reading and comprehension. In this section, we'll explain how NLP pipelines can automatically deduce whether a regulatory change is relevant to your business and notify compliance officers about the change in real time. Once notified, they can use question-answering chatbots to seek answers to complex questions about the regulations, a use case we cover later on.
The main question — "Does this clause of this regulation apply to my business?" — can be treated as a semantic similarity problem.
On one side is your business with various aspects like:
On the other side are the laws, regulations, and guidelines of all those jurisdictions with a complex set of conditions to decide who is regulated, when, and how.
So the semantic similarity problem is to take those complex regulatory conditions and match them against the aspects of your business. Most of these aspects will be in your ERP. A secondary semantic similarity problem is to detect whether a regulatory clause has changed between the last time it was fetched and now.
One thing GPT models are very good at is intelligently judging semantic similarity without requiring any hardcoded rules or thresholds like in traditional NLP. We can basically supply the text of the regulatory clauses along with the business aspect values from your ERP and ask GPT if anything matches. Every match is a regulation that's potentially relevant to your business. You can then dig deeper with the help of a question-answering chatbot (covered in the next section).
Similarly, GPT is used to detect changes between previous regulatory text and current text to find deltas. If there's a delta, task frameworks like LangChain can be integrated with the GPT model to notify relevant compliance officers in relevant jurisdictions.
In this section, we explain how GPT chatbots are used for helping employees understand complex financial documents.
Finance professionals often have to read long, dense documents like compliance regulations, annual reports, financial analyses, and investor reports. Likewise, banking professionals have to wade through documents like investment reports, compliance regulations, and corporate loan applications.
Reading them requires concentration and attention to detail because a lot of important information and answers are scattered over hundreds of pages. These professionals may have specific questions in mind, but the structures of those documents force them to expend a lot of brain cycles and time (and time is money) on reading and comprehension to find the answers.
Chatbots exponentially streamline this by easily digesting thousands of pages of documents in seconds and providing accurate, human language answers to complex questions about those documents.
Chatbots are not new, but users trust them only when they can give accurate and complete answers to complex questions every time. That's always been a sticking point with traditional chatbots, even those like Google's Dialogflow driven by older AI.
But chatbots backed by the awesome power of large language models like GPT-4 are a different breed altogether. Trained on vast quantities of real-world documents, they achieve unparalleled levels of semantic comprehension, abstraction, and accuracy.
Business benefits of using GPT chatbots include:
The example below shows one of our banking chatbots in action answering complex questions with natural language answers:
Frameworks like Google Dialogflow and Meta's BlenderBot are around to help create AI chatbots. But they have the following drawbacks compared to custom GPT chatbots:
A custom architecture to turn a GPT chatbot into one capable of understanding complex internal business documents and answering complex questions is shown below:
We'll walk you through the high-level steps that go into readying your information desk chatbot.
First, we collect all the useful documents like regulations, FAQs, and knowledge base articles that contain useful information for your employees. This information isn't specific to a user but something that's applicable to everyone, like in this example:
Potential questions and Informational facts are extracted from such content using manual annotation or web scraping. They go into a knowledge bank and provide the details and context required for the answers.
Prompts are key to getting the most out of GPT. When we build GPT chatbots, we must frame the relevant details and questions in particular ways for GPT to interpret them correctly. We can provide a few examples of ideal prompts and answers (few-shot learning) to GPT so that it can dynamically figure out what's expected of it based on the patterns in the examples.
We do this by maintaining a database of gold-standard prompts and answers. When a customer query is received, we dynamically select the most relevant examples from that database and prefix them to the customer's query before asking GPT. This helps GPT interpret the query correctly and return high-quality responses.
Placeholder variables are another important aspect of this phase. Instead of hardcoding details and links, we train GPT to output placeholders. These variables are replaced later with country-specific or department-specific information to provide personalized answers. For example, currencies and policy pages may be different in each country. So we ask GPT to generate placeholders for them instead of hardcoding a currency or page link.
In the example below, GPT outputs a placeholder for the link to a pricing page which will be replaced later in the pipeline:
In addition to few-shot examples and prompt optimization, another step that can potentially improve the quality of answers is fine-tuning the GPT model by supplying a dataset of questions and answers. Fine-tuning essentially creates a custom GPT model, stored on OpenAI's systems, that's available only to your company. It's a good approach if the nature of information, prompt syntax, and answer formats are very different and domain-specific compared to the standard text generated by GPT.
The essential idea here is that given a customer query, look up our database of extracted questions and answers, find the question that is most similar to the customer query, and return the associated answer for that extracted question.
To implement this idea of finding the most similar question, we convert all questions and queries to math forms called embeddings. They are essentially vectors that encode various linguistic and contextual information as numbers. Once converted to vectors, we can use math techniques like cosine similarity to find a question vector that is similar to a customer query vector. The answer associated with that question vector is then the most relevant answer to the customer's query as well.
For converting questions and queries to vectors, we use a model called Sentence-BERT from the sentence transformers library. It provides excellent results for such similarity tasks. We can further fine-tune it to achieve very high semantic relevance using our domain-specific question-answer datasets.
In the previous step, there are likely to be thousands or even millions of questions and answers. So a system that can store millions of vectors and calculate similarities quickly is necessary. Such systems are called vector databases, and Pinecone and FAISS are some popular options.
In this article, we showed how large language models can streamline a variety of tasks in finance and banking. While these models are already very capable, the ability to build custom models and pipelines from them boosts their capabilities even more. Here at Width.ai, we have years of expertise in customizing and fine-tuning large language models and other machine-learning algorithms for multiple industries.
Contact us to find out how we can help bring the power of large language models to your business!