97% top level and 93% 6 level deep accuracy for a marketplace with 5585 categories and massive volume.

February 4, 2025

A growing marketplace needed a high end product categorization API for adding new vendor products to their catalog based on Google Product Taxonomy. Not only do they add millions of products per month to their catalog, but their use case requires them to rerun pieces of the catalog every month to ensure the products fit the taxonomy relative to new products.

Not only do they need an incredibly high accuracy model that doesn’t require manual effort for this usage, they need infrastructure that supports extremely high volume, large catalog uploads, and scalability. This is very common with external seller marketplaces that want to scale.

They were able to leverage our product categorization API service with our upgraded infrastructure to automate this process, and set up a confidence threshold system to not only know when to review results, but when to retrain the model to support categories that need more training data.

‍

Problem Statement

Many vendors that marketplaces onboard use their own internal taxonomies with different categories and hierarchies, so the categories they provide when integrating their products don’t match the ones the pumice user uses. That means they either have to spend a bunch of time building a mapping to the onboarded customers taxonomy, or rely on some internal system to recategorize these products. Some marketplaces even do this manually!

If you’re using an old school approach like keyword/semantic, fuzzy, or string matching, different languages, OEM/generic names, and different naming conventions from vendors will make it incredibly difficult to properly categorize at scale. The above methods also aren’t predictive models, making it challenging to scale their ability as more data comes in that can be used for training the relationship between products and categories further.

The new Pumice customer did not have a solution in place, and was smart in understanding that in 2024 there are better options. The ability to quickly onboard product catalogs with accurate product categorization is the only way for marketplaces to scale and getting it right out of the gate will save a ton of time and money down the road.

‍

A few key specifics:

The customer has about 500k records already categorized to the Google Product Taxonomy that we can use for out of the gate fine-tuning. We also set up a gameplan for long term fine-tuning so that as they use the system, the model will be trained on the data that runs through.
Multiple languages are included.
They have images, titles, and descriptions.
Wide range of products but a heavy focus on apparel and home & garden. Multiple categories have almost no data.
Google Product Taxonomy contains 5585 distinct categories with a structure that has broad categories at the top and reaches 6 levels deep.

Our Solution Outline

pumice.ai full categorization pipeline — Full Pumice.ai pipeline

Our product data automation platform Pumice.ai has a categorization API with a fine-tunable model. We take our SOTA base categorization architecture and train it on customer specific product data to teach the models the relationship between data and product categories for their specific product data. Everyone's product data is a bit different in terms of naming conventions, length etc so it’s important to learn these relationships. For our fine-tuned models, we guarantee at least 90% accuracy or you don’t pay. We also set the customer up with our high end infrastructure to allow for higher volume. For customers that either have multiple models in the pipeline or need serious volume we have separate architecture designed for both constant streams of products as well as bulk runs. In this instance the customer needed it for both as they are using multiple models (categorization, color extraction, attribute extraction) and have a high volume use case. The starting dataset the custom had was good enough to get us out of the gate on fine-tuning our architecture. They had a dataset of already categorized products to their internal taxonomy from a few vendors that we could use. In this instance we set up a long term gameplan of constant fine-tuning to continue to grow the models accuracy on categories that did not have as much training data. Every time we go through a training iteration we create an evaluation sheet that breaks down the results at a category level and helps us understand where the models are at. We do this so the customer can stay up to date on their models, and so we can make better decisions based on metrics. These evaluation sheets give us a ton of insight into where the categorization model might be having issues and what we can do to improve the results moving forward. Most of the time issues can be tied back to a lack of data in a category relative to the complexity of the category and the other category options in a taxonomy. As we’ll see in these results the accuracy changes as you go down further into the nodes of the taxonomy, as differentiating products becomes more challenging. Its much easier to learn the difference between top level categories like baby & toddler vs animal and pet supplies than “bird gym & playstands” vs “bird perches” (yes these are real categories). We can also use these evaluation sheets to understand how specific product data fields are performing when being used in categorization. Sometimes additional fields provided from vendors are not actually useful for categorizing such as Vendor SKU ids. We can split test this data to understand the value of different combinations of product data fields.

In our evaluation sheets customers can click through to understand data at each level of the taxonomy.

We also review metrics related to semantics to fully understand at a word level how much specific words in the product data correlate to categories. We provide a ton of metrics related to the product data to understand how specific words and semantics correlate to categories. We just released a brand new feature that gives product data specialists more insight into these exact metrics on a per category basis to make better decisions about how products should be categorized and how the taxonomy should be structured.

‍

Results

We were able to achieve 97.623% accuracy on the top level Google Product Taxonomy categories, and 93.61% on the 6th level of the taxonomy. We were able to reach these results on just iterations of the initial dataset. Once the customer rolled out the product categorization API the models continued to improve each month with our fine-tuning. This is pretty amazing considering how little data these bottom level categories were trained on. The average training dataset size for each category at the bottom level was only 80 records and these categories are incredibly similar once you reach this point. We can validate that the model’s bottom most level of categories will improve over time by looking at the accuracy growth as we move up the levels of the taxonomy. The accuracy increases each level as the number of products in that category grows.

Results on known difficult categories

Some product categories in GPT are known to be much more difficult due to having other categories that are very similar or have very little training data. The “Mature” category is known to be difficult for both of these reasons. Most customers don’t have much training data, and a few of the categories are very similar to other top level categories. Our average accuracy on this category at the bottom most level is 89.4% with a top level “Mature” category accuracy of 91%. “Watercraft fuel systems” which has multiple very close categories below it achieved 99.58%.

With the google product taxonomy contain both of these categories it can be confusing for a model:Animals & Pet Supplies > Pet Supplies > Pet Food ContainersAnimals & Pet Supplies > Pet Supplies > Pet Bowls, Feeders & WaterersWe achieved 96.% and 97.44% on these categories. We deployed our upgraded infrastructure to support up to 50 million product records per day. We also have built in database storage so we can store the run products with proper product categorization for future fine-tuning and metrics.

‍
Additional benefits of categorization

On top of completely automating the task of onboarding new products and catalogs to their marketplace:

On site search functionality greatly improves. Nobody wants to click into category pages and see unrelated products or products that don’t fit their “idea” of what products make sense. It’s a poor customer experience that causes marketplaces to lose customers and becomes harder to control as you scale.
Stronger product hierarchy with products being categorized to deeper leaves more effectively. Customers don’t want to sift through top level categories if they know exactly what they’re looking for.
Easier backend inventory management.
Better understanding of analytics surrounding best performing categories and products.
Creating a standardized schema for product attributes becomes easier. When making decisions about what attributes to support such as “size, color, shape” for products it makes more sense to do it at a category level and fully evaluate how many attributes make sense across your catalog at that level. We talk about this a bunch in our new feature.

Want your own fine-tuned product categorization model?

‍

Pumice.ai allows you to start automatically categorizing your products in weeks not months. You can use any taxonomy and have millions of records of training data or none, and any product fields. Our data scientists work directly with you to clean up the data, prepare it for training, and then do everything for you. Sit back and let us get your models ready. Reach out at Pumice.ai or the get in touch page above. We’d love to deploy a test model for you to use. Our product allows you to get started with a production version of the model in weeks not months. Any taxonomy, any amount of existing product data, and any amount of product fields. 100s of ecommerce, marketplaces, and analytics platforms already use Pumice for their categorization. Reach out via the “Get In Touch” page for more information, and provide “GCP” in your message for free implementation.

Lets Talk

97% top level and 93% 6 level deep accuracy for a marketplace with 5585 categories and massive volume.

Problem Statement

Our Solution Outline

Results

Results on known difficult categories

‍
Additional benefits of categorization

Want your own fine-tuned product categorization model?

Keep Reading

92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace

97% Accurate 5 Level Deep Product Categorization for Ecommerce Solutions & Ad Placement Company

SOTA SKU Image Classification for Product Matching | How we outperformed the Fashion CLIP model

Our 94% accurate resume parser software - How it works - How we reached SOTA accuracy with high noise coverage

A Deep Guide to Text-Guided Open-Vocabulary Segmentation

92.44% Product Similarity through fine-tuning CLIP Model + Custom Pipeline for Image Similarity

Building an Ai Marketing Copy Generator For Keap With 17 Operations | Case Study

Processing Master Service Agreements With Multi-Task GPT-3 Model for Blended Summarization & Dynamic NER

Building an NLP legal clause rewriter that improves clause clarity and reduces risk | Case Study

Using GPT-3 For Automated Partial & Full UPC Code Matching | Case Study

Everything ML

97% top level and 93% 6 level deep accuracy for a marketplace with 5585 categories and massive volume.

Problem Statement

Our Solution Outline

Results

Results on known difficult categories

‍Additional benefits of categorization

Want your own fine-tuned product categorization model?

Keep Reading

92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace

97% Accurate 5 Level Deep Product Categorization for Ecommerce Solutions & Ad Placement Company

SOTA SKU Image Classification for Product Matching | How we outperformed the Fashion CLIP model

Our 94% accurate resume parser software - How it works - How we reached SOTA accuracy with high noise coverage

A Deep Guide to Text-Guided Open-Vocabulary Segmentation

92.44% Product Similarity through fine-tuning CLIP Model + Custom Pipeline for Image Similarity

Building an Ai Marketing Copy Generator For Keap With 17 Operations | Case Study

Processing Master Service Agreements With Multi-Task GPT-3 Model for Blended Summarization & Dynamic NER

Building an NLP legal clause rewriter that improves clause clarity and reduces risk | Case Study

Using GPT-3 For Automated Partial & Full UPC Code Matching | Case Study

Everything ML

‍
Additional benefits of categorization