• What is Skimle?
  • Releases
  • Pricing
  • Contact
  • Signal & Noise blog
Sign In

AI-Powered Document Analysis

© Copyright 2025 Skimle. All Rights Reserved.

Skimle
  • About
  • Contact
  • Terms of Service
  • Privacy Policy

Why ‘RAGs to riches doesn’t work’ - structuring data instead of dumping embeddings

Nov 10, 2025

Developers are starting to realise that even after optimising embeddings, chunking logic, reranking and models, RAG (Retrieval-Augmented-Generation) falls short in many real world applications

Cover Image for Why ‘RAGs to riches doesn’t work’ - structuring data instead of dumping embeddings

The promise of RAG

Feeding the prompt of an LLM with query and company -specific data located with an RAG (Retrieval Augmented Generation) pattern is the go-to approach for a lot of developers wanting to create AI systems that understand the company context and can respond based on specific documents not just generic web or training data. It is very straightforward to implement basic RAG systems, either building from ready components or integrating a ready solution. For many product demonstrations and simple applications the result looks very convincing - the AI answers clearly draw from documents stored in the RAG database.

In more complex applications you need to tune the RAG setup by tweaking the way the data is chunked, embeddings calculated, how retrieved documents reranked, and so on. This makes the responses more relevant and ensures the model finds more relevant sources to quote from. Since RAG systems are a relatively new invention (the first paper was written in 2020), there are a lot of ongoing developments to all parts of the flow and entire communities (e.g., Reddit's /rag community) dedicated to learning what works and what does not.

The limitations of RAG

However, what many developers are starting to realise is that the fundamental paradigm of a RAG model limits its usefulness in many applications. If you want to humanise what a RAG system does, think of an analyst sitting in a crowded office cubicle filled with thousands of two-sided post-it notes everywhere. The front side has a short code (embedding) describing the longer back side. Every time you ask them a question, they scurry around to find as many relevant post it notes as possible, read the back sides through and then answer based on that. The analyst has a great scheme for identifying post-it notes at run time based on matching what you asked with the short codes. However, every time you ask something a bit differently, they look for different short codes and consequently come back with different notes. And you never know if they read through all the right notes or just picked a few obvious ones. Maybe there was a key insight in one of the back sides, but it was never read and thus the answer is based on mediocre sources only.

Now, you could try to overcome this by increasing the context length of the model and then feeding it more tokens by being more generous on what notes the analyst retrieves. In some applications (say analysing 100+ pages of interview notes) you could even skip the RAG part and just ask the analyst to read all the notes through each time you ask it something. Now, in this case what tends to happen is that all the unnecessary facts clog the brain of the analyst as they try to make sense of which of the data is actually relevant. The quality of the answers degrades with more tones and many LLMs seem to overweight the first and last tokens in their answers. And the answer is still a black box answer as you do not know if the analyst used the most crucial insights to shape their answer.

The shortcomings of RAG models are most evident in situations where

  • You need to identify all emerging themes from the documents and with a degree of certainty the list is comprehensive
  • You need two way transparency and traceability - what are the themes (and how do we know it) and what themes each document relates to
  • You need to understand outliers, nuanced opinions and “minority reports”, not just summarise
  • You need to keep stability in your answers instead of retrieving them again with each analysis
  • You need to work with the data to e.g., add and merge categories or zoom in and out, instead of via simplistic chat only

These situations are common in many lines of knowledge work. If you are paid to form a comprehensive analysis, having RAG models as your go-to tool can be frustrating. The customers want something with transparency and certainty, yet you are not able to deliver it.

The birth of Skimle

It was a situation like this where the idea of Skimle was born. We had to analyse over a hundred public consultation responses to a proposed change in Finnish law to identify common themes and summarise them by topic. Each response was typically a 5 to 10 page freely structured document containing the opinion of the company, individual, interest group, or government body organisation. The ministry first tried to use a RAG setup that seemed to have a “solved problem” by enabling experts to ask questions from the data. However, contrasting the summaries against the actual data showed crucial bits of feedback were missing and some opinions were misrepresented. Each time the AI analyst would gladly accept the mistake and propose another run, but it was evident the system was lacking the trustworthiness and robustness needed for serious analyses like these where experts were expected to provide a comprehensive and transparent analysis.

To continue on the analyst metaphor, Skimle's approach is to structure the post-it notes into piles per category during storage. The algorithm draws inspiration from grounded theory and thematic analysis in social sciences: it analyses each part of the document to understand what it describes, and then creates categories and sub-categories across the entire dataset. This process is done with hundreds of micro-AI calls following the same workflow academic researchers do but automating it.

Once the post-it notes are stored in reasonable piles (and if needed, copied to multiple piles if they pertain to multiple categories), the work of the analyst is much easier. They can pull out the pile related to e.g., market competition and know that it contains all the relevant data and only the relevant data for that category. They can show the full set of notes, discuss it's applicability, merge the piles with other piles, look for differences by segment and so on - and do this with speed as the heavy-lifting was done at storage time not when retrieving. You have full two-way transparency in terms of where did each insight land, and what insights form each category.

After putting Skimle together we discovered the same solution works also across applications, including

  • Academic research, where it helps with initial categorisation and coding and then gives the researcher the tools for theory building
  • Due diligence, where experts calls and data room contents can be brought together to one tool bench of qualitative data
  • Interview notes organising, where Skimle helps identify the themes and find the relevant quotes
  • Open text analysis in HR studies, where discovering the categories allows analysing responses beyond the “word cloud” we see too often
  • Market and user research, where deep and nuanced insights per category can literally be worth gold
  • … and so on!

Closing thoughts

The difference between the RAG approach (as commonly understood) and the Skimle approach has parallels in other choices in software engineering. Do you optimise for easy storage (like with a linked list) or do you structure your data up front to make retrieval easy (like with a tree). Do you go for synchronous solutions where retrieval and processing is done at runtime after receiving the user query, or do you break it down to asynchronous steps where the heavy lifting is done offline so that individual queries by the user run fast.

Of course its horses for courses, but we're seeing a lot of times people try to force an RAG or one-shot LLM solution for use cases where they simply can not be trusted to produce accurate and comprehensive results that are transparent and stable over time.

Trust your parents: if your room is a mess, you won't find things!

You can read more about out the method here and try Skimle for free. Let us know if it helped you get more value from your data!

Olli & Henri from the Skimle team