92% 4 Level Deep Product Categorization with MultiLingual Dataset for Wholesale Marketplace
How we were able to reach 97% on top level categories, and 92% on bottom level categories with a multilingual dataset for a customer of Pumice.ai.
Dialogflow was a popular technology for several years because it enabled businesses to upgrade from limited, rule-based chatbots to chatbots capable of natural language processing (NLP) to extract useful structured data from user conversations. Its NLP capabilities weren't great but they were good for that time.
But we now have much more powerful large language models (LLMs) capable of function-calling, agents, and much more.
Many businesses and developers will be thinking about replacing Dialogflow with LLMs. We believe Dialogflow still provides several benefits to businesses and recommend a hybrid approach where LLMs are integrated into Dialogflow agents instead of replacing them.
In this article, we'll explain why you should continue with Dialogflow chatbots in 2024 but also show how modern LLMs can turbocharge their capabilities.
Dialogflow is still useful because of its versatile integration and deployment capabilities. Every Dialogflow chatbot supports the following channels and integrations right out of the box:
All these features are time-consuming to implement from scratch. Dialogflow's built-in support for them with proven track record and performance provide compelling reasons to continue with Dialogflow.
But Dialogflow has its shortcomings. We'll explain them in more detail later but they include:
We'll explain these issues and solutions for them in more detail in the sections below.
A common problem with Dialogflow is that it's quite limited for use cases that involve complex queries. For example, in fields like medical insurance, conversations may take many unexpected turns and users may provide unexpected inputs that are specific to that domain, such as names of medical conditions and prescription drugs.
For example, a user may start with insurance plan benefits, then switch to questions about coverage of their medical conditions, list the drugs they've been prescribed to check for coverage, ask about copay conditions, and inquire about pre-existing exclusions.
Dialogflow's intent-based framework is just not flexible enough to handle such unpredictable context switches. In fields like medical insurance or online shopping, the information that must be given to users is too vast to handle via Dialogflow parameters and the possible turns a conversation may take are too many to fit into Dialogflow intents. The trigger-based intent identification that Dialogflow does, as illustrated below, is not suitable for such domains.
In contrast, LLMs trained for instruction following and human alignment — like GPT-4 or Mistral — are extremely good at handling diverse turns in conversations. They're able to organically use all the previous information given by the user without having to convert it into structured information. However, switching to LLMs means losing all the benefits of Dialogflow we outlined earlier.
Luckily, it's possible to avail all the integration and deployment benefits of Dialogflow and combine it with LLMs for handling complex conversations.
For our clients in complex domains like financial services and e-commerce, we integrate LLMs into Dialogflow as outlined below.
First, apart from the default welcome intent and the default fallback intent, we don't configure any intents at all.
Instead, we configure a webhook URL for the Dialogflow agent as shown below:
This URL points to a custom webhook service hosted on our client's server. We can also implement it using cloud functions. Either way, the webhook enables us to get notified by Dialogflow whenever it executes an intent and allows us to modify or override the configured response as well.
We then set up the default fallback intent to call this webhook as shown below.
This way, every user input immediately reaches our webhook service to process and respond as we please. It's in the webhook that we plug in the LLMs and their reasoning capabilities.
LLMs provide us the flexibility we need to have informative multi-turn conversations with users and fulfill their exact information needs.
With normal Dialogflow chatbots, the tendency is to handle simple straightforward conversations but once the conversation starts involving more criteria and complex information needs, users are either switched to a live agent or asked to call customer service. The ratio of end-to-end fulfillment automation may only be around 15% and the remaining 85% of conversations are either handed over to live agents or possibly result in user abandonment.
In contrast, generative AI techniques like retrieval-augmented generation (RAG) enable our chatbots to handle about 70-80% of conversations end-to-end and switch to live agents only 20-30% of the time. This is an enormous improvement in workforce productivity, customer experience, and user retention.
LLMs enable the following
In the next section, we demonstrate one such LLM-based Dialogflow chatbot.
We've already seen the complexity of a field like medical insurance, for example. When we try to service such a business using Dialogflow, we quickly run into several difficult problems:
In reality, Dialogflow by itself — even its built-in machine learning features — isn't cut out for such domains. Dialogflow is closer to old-school phone-based interactive voice response systems except that instead of nine number keys, a slightly larger set of spoken dialogues is available.
Our RAG-based LLM architecture shown below is much more powerful.
Its responses are based on information available in your:
For example, insurance information from reputed insurers often looks like this, with anywhere from 30-500+ pages of documents to go through:
Instead of forcing users to go through all this complex information, LLM-based chatbots can easily provide all the required information to users looking for medical insurance plans. A chatbot capable of such conversations is useful for a variety of users like:
In the example below, notice how the LLM is able to handle a wide-ranging conversation with multiple turns and satisfy a user's information need.
First, the user asks if a medical condition is covered:
Then the user switches to asking about coverage for a drug that's been prescribed for another condition. The LLM crunches through a 300-page document about covered prescription drugs in seconds and gives them the exact information they need:
At this point, the user probably favors going with this plan. But they have some more important criteria for coverage as shown below:
Again, the LLM provides extremely detailed information for their query, including coverage limits, copayment terms, allowed procedures, and so on.
This case study demonstrates the ability of LLMs to reduce the chances of user frustration and abandonment by answering their complex questions in great detail.
In the following sections, we explain some more techniques we frequently use to create agents based on Dialogflow quickly.
In Dialogflow ES, intents are the primary means of configuring a chatbot's behavior. The screenshot below shows the intents bundled with the predefined online shopping chatbot.
Every user message is mapped to one of these intents by matching it against each intent's training phrases. The intent that matches best is then executed. This involves:
If Dialogflow can't find a suitable intent, it sends the query to the default fallback intent.
Creating a large number of intents using the Dialogflow GUI is time-consuming:
We can greatly streamline the above process and make it layperson-friendly by automating the intent creation using an LLM.
First, a developer can describe the chatbot's behavior in plain English. The LLM then rapidly generates the JSON configuration data required to create intents using Dialogflow APIs.
Next, the user can quickly generate a large set of training phrases using LLMs. Here, LLMs help in three ways.
LLMs can generate a wide set of conversational variants. Since LLMs are extensively trained on internet datasets, the number of available semantic variants for a conversation can be extremely large.
LLMs can be instructed to generate the training phrases in multiple languages. In the example below, a developer who doesn't know German has generated product search trigger phrases in German, enabling a business to easily expand their customer service features to a new market:
Dialogflow training phrases can be annotated with parameters and entity types to help the Dialogflow engine convert unstructured conversations into neat structured data that can be easily processed. The annotated training phrases for a pre-built online shopping intent are shown below:
Since LLMs can understand coding-related data formats like JSON, they can be instructed to generate even entity annotations with the training phrases. In the earlier generated phrases, the LLM has already done it partially by using "[product]" and "[produkt]" as placeholders. From there, it's just a matter of converting that placeholder, as well as other placeholders provided by the developer — like SKUs, dates, and times — into annotated parts of training phrases.
This helps create a larger and more varied number of training phrases and helps make the chatbot smarter and more robust. Your business can reduce the percentage of cases you have to hand over to live agents.
In addition, we use custom Python code to generate annotated training phrases using Dialogflow APIs:
The example below shows the behavior of Dialogflow's built-in online shopping agent and its "product.search" intent when LLMs and webhooks are not used:
The agent can identify that "shoes" are the product, as seen above. This is also evident from the Dialogflow console:
However, stock Dialogflow's lack of robustness becomes quickly evident when the user just switches from "shoes" to "socks" as shown below!
The intent fails to identify "socks" as the product parameter as shown in the development console below:
Instead of relying only on Dialogflow, we now enhance the intent with our webhook that uses an LLM to identify the product in the query as shown below:
After intercepting the query using an LLM, the chatbot can easily identify the products that it previously stumbled on:
It can even understand multiple products in the query as shown below:
The last technique we explain here is how to convert conversational flow diagrams prepared by conversation designers directly into Dialogflow agents using LLMs and image recognition.
The diagram below is a conversational flow diagram showing conversational states and transitions between them:
We use multimodal vision-language models like GPT-4V to convert these states and transitions directly into Dialogflow CX pages and routes.
We prompt GPT-4V with this instruction: "This is an image of a conversational flow. All the text boxes are conversational states. All the arrows are transitions or routes between them. Extract all the text in the text boxes and list them."
It generates the following list of conversational states:
We then ask it to convert the blue and red connections as a list of transitions between these conversational states using this prompt: In the image, identify all the connections between the textboxes shown by the blue and red arrows. If they are one-way arrows, list them using the format "From state => To state". If they are double-ended arrows, list them using "From state
<=> To state".
The LLM can convert them to a format suitable for processing easily. We then use the Google Cloud Dialogflow CX APIs to convert them into page and route objects.
In this article, we demonstrated how we develop Dialogflow chatbots better and faster. By doing so, we solve the shortcomings of both technologies. The LLMs and modern conversational AI make Dialogflow smarter while leveraging its versatile integrations and deployment options to deploy B2B and B2C chatbots quickly on multiple channels.
Contact us for high-quality Dialogflow chatbots for your business!