92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace
How we were able to reach 97% on top level categories, and 92% on bottom level categories with a multilingual dataset for a customer of Pumice.ai.
A legal tech company was looking for an NLP based software solution that could be used to rewrite poorly written legal clauses into a higher quality version. The higher quality versions would be evaluated by legal professionals and reviewed for improved legal clause clarity and stronger language that reduces the long-term risk of the parties.
Low quality input clauses contain a number of issues such as vague information, missing key qualifiers, incorrect parties, or simply poor wording related to the outlined information. This is common when using template based legal contracts or when trying to mend together documents from different sources or common law practices. Oftentimes strong clauses need clean information included related to the exact engagement or other information specified in the contract. Generic clauses or multi source documents do not account for this level of granularity.
A well-written Buyer’s Inspection Period includes precise information on the legal requirements of the exact engagement with clear notes of expectation for Buyer and Seller. Many generic versions do not get into details of the determination or could simply have the wrong number of days for the inspection period.
On top of the clause rewriting product, we developed another endpoint that allows legal teams to upload “best case” clauses by contract type and compare them to input clauses for similarity. This acts as a faster version that lets teams quickly grab clause versions from a repository they already feel are the best versions of an input clause.
Teaching natural language processing models to generate rewritten text into a different version of input text is already a tough task to perform. What is even more challenging is teaching a model to rewrite text that doesn’t inherently have anything “wrong” in most cases. From an NLP perspective, these input texts mostly use proper grammar, are complete sentences, and use proper nouns correctly. In our task, we’re often asking these models to go backward and ignore their training on billions of sentences of standard English.
We ask these models to actually write sentences with weird insertions “(if applicable)” or long runoffs into multiple points. Telling a model that is trained already on something very different (and that it’s training is wrong) is much harder than we thought. The way proper nouns are used such as “Buyer” or “Seller” is strange compared to normal everyday language and even small things like teaching the models to capitalize those terms can be a challenge. These small language differences between legal jargon and normal text are more of a challenge than you might think.
The biggest challenge when rewriting clauses to stronger and clearer language is understanding the context of the rest of the document. The context surrounding the type of contract, other available clauses, and engagement specifics are a huge part of what makes a clause stronger and better written. Figuring out how to include this context when generating a single clause is challenging.
The main difficulty in the clause similarity use case is figuring out what information best leads to two clauses being “similar”. This is a common task when comparing text records such as product data or customer data for more than just semantic similarity.
We built an NLP based product that leverages custom fine-tuned large language models (LLMs) to generate rewritten legal clauses. We tested a number of different architectures focused on the best way to optimize the difficulties seen above across the training and testing data. While the models are trained on the starter dataset of bad clauses and correct clauses, we’ve developed a pipeline that automatically improves model performance as the product is used. This is done by leveraging real-time feedback on the output clauses and auto-optimization of the rewriting generation.
The generation models use a number of techniques to understand any context from the above information that is beneficial for understanding key information related to the engagement. This pipeline gave us an 8% boost in accuracy and seems to continue to grow with more training.
Our models could even do tasks such as translating clauses using the same exact pipeline used to rewrite to better versions.
Gathering data for generation and correction tasks can be a challenge especially when it is expensive to create or label the data. We leveraged our initial dataset & external contract examples to build a new set of models focused on generating new examples of bad clauses based on a given correct one. This allowed us to scale our training dataset to more examples, improving data variance and accuracy.
We built our clause similarity pipeline using various NLP based similarity architectures trained on learning key relationships between clauses past just semantics. This allows legal professionals to return “best fit” clauses based on document type, clause type, and clause information for a much better search of the repository. Users can upload new “best fit” records at any time via a simple API.
Our legal clause rewriter pipeline is evaluated on the percentage of correction from the bad clause to strong clause. This allows us to evaluate the difference between the two and measure how often the model reaches various thresholds for correct change. This includes adding text, removing text, or changing text.
The pipeline reached 90% correction in the resulting text on 86% of the evaluation set clauses. These results are groundbreaking on such a small dataset and can continue to grow through a much larger scope.
Interested in seeing how you can use NLP and ML in the legal or financial industry? We’ve built a ton of applications in this space and would love to hear about your use case.