दे

वां

Why RAG is not enough?

Last year, I built a knowledge bot POC that had this written on the landing page.

Wombot landing page

Wombot landing page

At the time, I was aware that function (tool) calling was something that could address these limitations. However, since RAG was commoditizing quickly, we halted the exploration and looked at off the shelf solutions.

Fast forward to a year, I attended the Microsoft’s AI tour1 to see what they’re up to and was glad to see that knowledge bot ecosystem has well defined solutions for all these limitations.

Before looking at the solutions, let’s understand these limitations.

What most knowledge bots do?

How will such a bot answer the following question?

Find me all the workplace themed LEED certified projects in Sydney done by design principal John Doe.

How does it work?

  1. We take the private company documents, split their text into chunks and generate numerical vectors for these chunks using an encoder model.
  2. These vectors are indexed in a relational database.
  3. The user’s question is also turned into a numerical vector using the same encoder model.
  4. A similarity search is performed in the database to find the vectors that are semantically nearest to the numerical vector from the question.
  5. From these similar vectors, we retrieve K number of vectors and the chunks associated with them.
  6. These chunks along with the original question, context, and conversation history are passed to an LLM for inference.

Assuming numerical factors were generated from project information documents, the chunks returned by the semantic search may resemble the following.

Sample semantic search results

Sample semantic search results

While all of them are semantically relevant, not all of them are helpful.

Chunk 3 is only a partial match. While not led by John Doe, this project demonstrates similar sustainability practices in workplace design.

Chunk 5 is a distractor. Although certified under LEED, this project is a residential development and not categorized as workplace.

Here, the bot is going to try to find the information from the blogs, company website, public project directories, green project databases, etc. And the results may resemble the following.

Sample web search results

Sample web search results

Here, Green design weekly, is an example of an irrelevant web result.

Limitations of RAG

When these chunks and web results are passed to an LLM, it will be able to give you a list of projects. However, we don’t confidently know if the bot was able to find all the projects that met the criteria mentioned in the question.

What if, instead of seeking to get the names of all the projects, the user sought to know the total number of projects that met the criteria mentioned in the question. In this case too, while the bot may fetch the similar information, we will not have the confidence in the number it will report.

Lastly, imagine asking this bot the following question

What are the average man hours spent on workplace projects in 2025 Q1 at company X?

Since the question involves fetching relevant data and performing some computation on top of that, the bot will most likely not able to answer this question.

As more and more businesses are experiencing these limitations, the ecosystem has come up with some solutions.

Advanced Search Strategies

Simple queries

Keyword search

Image source: Microsoft AI tour presentation

Think of inverted index as a HashMap where a word is mapped to an array of document ids it appears in.

inverted_index = {
	"25": [1, 3, 5],
	"foot": [0, 1, 2, 3, 5],
	"hose": [5]
	...
}
Vector search

Image source: Microsoft AI tour presentation

Reciprocal Rank Fusion (RRF)

Merge the results from the keyword search and the vector search and rank the results based off their relative ranks.

Reciprocal Rank Fusion

Image source: Microsoft AI tour presentation

Re-Ranking

Re-Ranking

Image source: Microsoft AI tour presentation

It has been observed that for most queries, the hybrid approach outperforms either just RAG, or just keyword search.

Combined result

Image source: Microsoft AI tour presentation

But hybrid search is not always enough. How do you handle more complex types of queries?

Relational Queries

When your query involves relations. For example,

Relational query

Image source: Microsoft AI tour presentation

Here, we can use the LLM to come up with a Graph or an SQL query to run on our Graph or a relational database respectively.

The modern LLMs are becoming increasingly better at translating a user query into an SQL or a Graph query given the context and the database schemas.

Using this approach assumes that the company is at at least the state-12 of the data maturity journey.

Graph query

Image source: Microsoft AI tour presentation

Multiple Queries in One

When a query involves multiple queries. For example,

Multiple queries

Image source: Microsoft AI tour presentation

This is where the search start becoming agentic and the idea of query planning comes into play.

Query planning3

Query planning 1
Query planning 2
Query planning 3

Image source: Microsoft AI tour presentation

Managed Search Solution

Interesting to see that the Azure AI search offers all of this as a managed service4. Where you are only responsible for

  1. Bringing your data
  2. Building your clients

The rest of the heavy lifting involving

Managed search

Image source: Microsoft AI tour presentation

Making Search Cheaper

Prompt engineering is used to add a character (personality) to a chat bot. Here, a guideline in the form of text is provided to the chat bot in terms of

But one major downside of this practice is that you are paying for the additional tokens in all the conversations. Such costs can add up quickly when users are having multiple conversations throughout the day across the organization.

How to use fine tuning to make search cheaper?

Fine tuning cost

Image source: Microsoft AI tour presentation

What is fine tuning?

Fine tuning explanation

Image source: Microsoft AI tour presentation

Fine tuning can also be useful to add often private domain specific knowledge such as project proposals (bids), and for task-specific optimization such as Grasshopper or Dynamo workflow generation.

Fine tuning domain knowledge

Image source: Microsoft AI tour presentation

There are several fine tuning techniques to choose from.

Fine tuning techniques

Image source: Microsoft AI tour presentation

Increasingly, organizations are moving incorporating fine tuning in their workflows.

Fine tuning workflow 1
Fine tuning workflow 2

Image source: Microsoft AI tour presentation

Example of getting to the point answers with improved accuracy using RAFT.

RAFT accuracy
RAFT answers

Image source: Microsoft AI tour presentation

Is RAFT for everyone?

While RAFT is touted to improve the experience and accuracy of knowledge retrieval, organizations must realize the initial investment in terms of human hours and money required incorporate fine tuning in their workflows.

RAFT investment

Image source: Microsoft AI tour presentation

It’s a cyclical exploratory process where you start with data, choose a technique, fine tune a model, and evaluate the results. This may take several iterations and may involve changing the fine tuning techniques and hyperparameters multiple times.

RAFT process

Image source: Microsoft AI tour presentation

Closing thoughts

When every year, more and more of the knowledge bot tool chain is being commoditized, where do we focus our efforts on?


  1. https://aitour.microsoft.com/flow/microsoft/toronto26/sessioncatalog/page/sessioncatalog ↩︎

  2. https://pipeline2insights.substack.com/i/150965082/stage-starting-with-data ↩︎

  3. https://learn.microsoft.com/en-us/azure/search/agentic-retrieval-overview#:~:text=to%2Dagent%20workflows.-,Here%27s%20what%20it%20does,-%3A ↩︎

  4. https://learn.microsoft.com/en-us/azure/search/search-what-is-azure-search ↩︎