92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace
How we were able to reach 97% on top level categories, and 92% on bottom level categories with a multilingual dataset for a customer of Pumice.ai.
As the marketing copy generation space was heating up, keap.com was looking to build an Ai product that allows users of their platform to create a huge variety of marketing “plays” that use generative ai to generate the copy to move customer through an automated workflow such as follow up sequences, lead magnet sequences, new customer welcomes, and much more.
Keap.com wanted these features to be different from the standard ones you see in other ai marketing copy products. They wanted the features to be niched into specific workflows instead of generic “email generators” or “social media post” that use simple GPT-3 models. These features are specific to plays and workflows that can generate niched pieces of the puzzle such as product benefits generators, upsell offers, product focused testimonial requests and more. The output is “plays” that are made up of these niched features and models that automate unique marketing funnels. As we’ve talked about before in articles like our CTA generator, niching these models makes for more accurate customer generations and reduces lag between what a customer thinks the correct output should be for a given step, and what is actually produced.
Width.ai built 17 generative ai pipelines for use in the Keap.com marketing copy generation product. These are used throughout the plays and are the backbone of creating these workflows. We focused on building highly creative models that generate jaw-dropping and unique copy that looks and performs much better than generic ai copy generators.
The pipelines focus on being able to understand the goal output of a given input set through fine-tuning and dynamic data that gives the pipeline an idea of relevant outputs that contain a high level of uniqueness and variance. This powerful combination general tuning to better understand a wide range of variance for our task but be able to generate extremely unique outputs for common use cases.
This concept makes adding differentiation between variables much easier that leads to the outputs having more variety. The generated copy is completely different for an age variable set to 50-75 and 18-34. This is the level of variability your outputs can have when refining generative ai to focus on specific domains (like this article).
We deployed state of the art knowledge on understanding and leveraging the combination of log probabilities and token level semantic relevance to maximize the variation and “uniqueness” of outputs while following a learned relationship based structured format. This build came from SOTA research at the time plus existing knowledge we had of how to maximize a similar workflow.
We built data management pipelines that allow you to easily incorporate real in-context user runs of your product into the ML model training process. This use of actual user data is a great way to start steering your generative ai outputs towards what your customers consider a good result.
The prompt engineering and design experts at Width.ai handcrafted powerful prompts for each of these generative ai systems. We focused on building prompts that cover as much data variance as possible before ever relying on fine-tuning our model. The goal of this process is to prove out the system's ability to rely on the prompt structure to generate high quality responses on edge cases that fine-tuning is unable to cover.
We’ve built a number of custom ai marketing copy generators for different companies with various model use cases. The above architecture outline comes from our most recent work in this space focused on generating CTAs in a pipeline that can auto optimize over time based on conversion metrics. This is an extension of our standard generative ai architecture that allows customers to get super refined outputs for their marketing copy generations. Let’s look at a quick overview of the parts of the pipeline we build to end up with a production pipeline for ai marketing copy generation.
What inputs we choose and how we structure our inputs to the models is a huge part of how we reach accurate marketing copy generations that users love. These LLMs are pretty much just a language API that tries to complete a task based on prompt instructions and any other knowledge that is provided (prompt examples/variables/fine-tuning). The work done upstream to build the prompt that contains our inputs is the real equation in production generative ai systems.
We spend a ton of time choosing input variables such as product name, marketing piece type, and copy tone that maximize the quality of the outputs for the specific marketing copy generation operation. With those variables we then focus on how we structure our prompts to best leverage the key fields that guide the model the most towards a correct output. This is also the step in the process where we can handle any extraction of text from URLs or images for additional fields.
We built a custom module focused on transforming these raw text inputs into a structured format for our GPT-3 prompt. This structured format is built during the early stages of our prompt engineering where we use concepts like “Hard prompts made easy” to build powerful prompts that guide our model towards a deep understanding of the variables we just provided. This module just focuses on the input variable part of our prompt.
One of the key preprocess steps covered here is our ability to create “tags” that wrap around specific lines of text in the input variables. These tags provide the underlying LLM with a bit more contextual information about what keywords or phrases are extra meaningful to the outputs.
In low data environments or environments trying to create a very specific and niched output such as a “per industry” or “per user” output, prompt design and iterating the quality of the prompt is the most important part of our process. This is what allows us to generate outputs with high variance between different variables and generate copy that much better follows the examples we provided. This is the difference between generating high quality marketing copy that fits specifically your use case and generalized copy you see from the leading products.
Token sequencing is a huge part of any large language model. We even spend time optimizing the order of our variables in the prompt to promote more important variables to the instructions. We can use the natural biases towards tokens being at the beginning or end of our prompt to our advantage to put more emphasis on specific variables.
Dynamic prompt optimization frameworks are trained algorithms we build with the goal of dynamically building our prompt at runtime for each unique input that is provided to maximize the optimization available and the model's understanding of our task. This allows us to go from static prompts that will struggle with edge cases and general data variance coverage to a more flexible system that covers wide data variance much better. These algorithms work upstream to the actual LLM and build the prompt for us.
Here’s a few common issues that you run into with prompt based LLMs that prompt optimization fixes:
With users running the ai marketing copy generator you can store information about the variables they provided and the outputs that were generated to make further optimizations and training easier. By storing data such as the inputs, prompts, and generated outputs you can quickly turn it into a fine-tuning dataset for GPT-3 or other LLMs you might want to leverage.
Considering the format for fine-tuning GPT-3 is a single prompt and completion this makes taking our stored prompts and user generated outputs and turning them into a training set much easier. Fine-tuning is going to allow you to show these task agnostic models how to refine into your specific task and generate outputs that your users consider high quality, instead of what the base model considers high quality based on your prompt instructions.
Model parameters available for large language models are one of the best ways to refine the levels of creativity and boundaries your copy generations can have. These parameters essentially take the most deterministic copy generation available and allow you to decide how much you want it to venture away from the highest probability output.
Temperature is the most common parameter adjusted to affect the levels of creativity you allow your model to use. Temperature decides how often lower probability tokens are used in the output. Lower probability tokens are generally more random and more “unique” based on a baseline idea of what the goal output GPT-3 thinks is correct.
GPT-3 predicts the most probable next token in a sequence and as we can see above, we can review the tokens that had a lower probability. Leveraging these tokens instead of the most probable are a great way to increase uniqueness in your generated copy and ensure your copy is different for each user.
Width.ai builds customer NLP and CV software for everyone from Fortune 50 to pre-funding startups. We’ve the leading experts in GPT-3 and other LLMs and have been building production level products with these exact tools since 2020. Schedule a call today with our “Contact Us” page to learn more about how we can build your own marketing copy generator.