Hands-On Expert-Level Contract Summarization Using LLMs
Discover how we use LLMs for robust high-quality contract summarization and understanding for our legal clients as well as other businesses.
A wholesale marketplace in the manufacturing space needed a product categorization system for adding new vendor products to their catalog based on the Google Product Taxonomy. They were currently using an old school keyword based categorization system that had a ton of issues, especially considering the multiple languages of products they support.
They knew they needed a better automated system as they scale and the catalogs continue to grow. They were able to leverage our product categorization API service and fine-tune it on their specific multilingual product data and completely automate the process.
Many vendors use their own internal taxonomies, so the categories they provide don’t match the ones the customer (you) uses. This means you either have to spend a bunch of time building a mapping, or rely on some internal system to produce new categories (some people even manually do this!).
If you’re using a keyword or fuzzy matching based system things like different languages, OEM names, missing information, and variations between vendors can absolutely ruin your ability to properly categorize. They also aren’t predictive models, so you can’t use confidence scores to allow manual reviewers to review categorized products that are close.
The customer needed a better solution to support their growing business with new vendors and new product categories they wanted to expand into. Their current keyword based solution wasn’t working well, and in no way was set up to support new larger vendors from Germany. The complexity of products and product attributes in the manufacturing and industrial space makes it even harder for them to categorize these products. Vendors also expect catalogs to be up and online in a fairly quick turnaround time, meaning manual approaches do not work.
A few key specifics:
Zypern ER29 - KG rechts, Edelstahl matt,mit EK300G RL 71, ohne PZ Rosetten
Our ai powered PIM automation platform Pumice.ai has a taxonomy based categorization API powered by our SOTA categorization architecture. We take our base categorization model and fine-tune it on customer specific taxonomies and data to guide the models understanding of categorization towards the customers specific use case. Everyone has variations in their taxonomy and the way their product data looks, so we always fine-tune our model on this dataset and deploy a custom model. We guarantee at least 90% accuracy on our created dataset when we fine-tune.
We also deployed our language translation model to translate all text to english. Many NLP models perform better with English text than other languages, so we split test using language translation and the categorization model with only English products vs fine-tuning the categorization model on all languages. The results clearly showed that translation + categorization model worked best. This is most likely due to the best model used in categorization being heavily english and the multilingual fine-tuning not being able to make much of a dent in the overwhelming majority of English text the base model has seen. Language translation is also one of the more accurate NLP tasks and our model is over 93% accurate on a character level and 98% on full sentence analysis.
The customer's dataset was more than large enough for us to do fine-tuning on our model. The data was already categorized which allows us to use this without any changes required. In some instances we’ll work with the customer on the dataset if we see serious issues and need to improve the data we have for better fine-tuning.
Every time we fine-tune we create an evaluation sheet that breaks down the data at a category level and presents us with accuracy metrics. We do this partially so the customer can stay in the loop while we fine-tune and understand where we’re at, and partially so we can produce smarter iterations. These evaluation sheets tell us a ton about where the model is having issues and allows us to make adjustments in the next iteration to address these issues. Most of the time the issue is simply some categories having little to no data and building a gameplan with the customer to handle these categories. The other key datapoint we can look at in these evaluation sheets is how specific fields are performing in categorization. Sometimes the customer will provide us additional fields from vendors that they believe can be useful, such as the vendor categories or vendor IDs, but they turn out to actually hurt the model. We like to get as much data from the customer as possible and make decisions based on these evaluation sheets and accuracy they produce.
This was especially relevant for this customer with the above mentioned issue of titles and descriptions being exactly the same for many products.
Each link is clickable to specific evaluations for each category level. We also break down the number of products per category at that level, as some product’s categories do not go all the way down.
We also review metrics related to semantics to fully understand at a word level how much specific words in the product data correlate to categories.
We were able to achieve 97% accuracy on top level categories across the entire Google Product Taxonomy, and 92.37% on the bottom most category level in just iterations on this dataset. As the customer rolls out the Pumice categorization API to production we will be collecting data and further fine-tuning the model. The top 10 bottom level categories had an accuracy of 95.92% with only two of those categories having more than 1800 product records and none having more than 2700.
We can validate that the model's bottom most level of categories will greatly improve over time, as the only categories under 84.37% accuracy were categories that had under 300 total records. It's not all categories at that level, but all lower accuracy ones are below that threshold.
Our architecture supports up to 50 million product records per day with built in storage of records for fine-tuning and future optimization. This made it easy for us to further fine-tune on the error set and the categories with low product data.
While this customer didn’t have product images from the vendors our architecture supports images as a data field for categorization. We strongly recommend images as they can be a key field in many use cases. While not all use cases improve with images, when images are provided we split test the results with just text vs text + image.
Our product allows you to get started with a production version of the model in weeks not months. Any taxonomy, any amount of existing product data, and any amount of product fields. 100s of ecommerce, marketplaces, and analytics platforms already use Pumice for their categorization. Reach out via the “Get In Touch” page for more information, and provide “GCP” in your message for free implementation.