The analysis challenge
You've conducted 15 stakeholder interviews for a strategic project. You have 200 pages of transcripts sitting in a folder. Your deadline is next week, and the project leadership expects insights that will shape a million-dollar decision. Where do you even start?
Scenarios like this played out dozens of times during my 18 years at McKinsey. The pressure was real - executives don't want to hear "I think maybe customers care about speed" backed by vague impressions. They want: "73% of interviewed customers explicitly mentioned speed as their primary pain point, but it was not just the overall speed but especially these three nuances..." That level of rigour requires a systematic approach to analysis.
Let me share my approach for turning messy interview data into credible insights, honed in multiple projects. The same steps apply across multiple situations, but always with variations. If doing academic research, you need to go much deeper and come back with more descriptive findings than if you are using the data to build a business narrative - but in all cases you need the rigour as else you are just producing slop and wasting everyone’s time.
Step 1: Immerse yourself in the data
Before you touch a highlighter or create a single code, you need to truly understand what you have. This means reading through all your transcripts or notes at least once - ideally twice - without trying to analyze anything yet.
What this looks like in practice:
- Block out dedicated time
- Read each transcript start to finish
- Take high-level notes on initial impressions
- Notice recurring phrases, surprising statements, or strong emotions
- Resist the urge to start categorising yet
One time I had interviewed people around a failed joint venture launch. The first read-through left me thinking the main concern was IT systems development and integration. But on the second pass, I noticed subtle references to "culture clash" and "different ways of working" in nearly every interview. Had I jumped straight to coding after the first read, I would have missed the actual story in the data.
The mistake most people make here is skipping straight to Step 2. They want efficiency, so they read and code simultaneously. But this approach causes you to miss the forest for the trees. Your insights will be superficial because you don't yet understand the full landscape of what people said. Slow down to speed up.
Step 2: Develop your initial categorisation scheme
Now that you understand the terrain, it's time to create structure. Coding (also called "tagging" or "labeling") means identifying meaningful units of text and assigning them to categories.
Start with these principles:
- Create codes that answer your research questions
- Use descriptive names (e.g., "pricing concerns" not "issue_3")
- Start with 10-15 broad codes, not 50 narrow ones
- Allow codes to emerge from the data, not just from your assumptions
- Document what each code means (you'll forget in 3 days)
Aim to find the “Goldilocks-solution” in terms of granularity. Your categories need to be broad enough to capture themes, but specific enough to be actionable. "Feedback" would be too vague. "Complaints about the export button being blue" would be too narrow.
Note that your categorisation scheme should be fluid during the later steps. You start with something, and once you notice a category becomes too large you can split it down. Conversely, in the end you might realise that some categories are too small to merit their own label and can be merged with something bigger.
This process is called thematic analysis in academic research. Thematic analysis is a method for identifying, analyzing, and reporting patterns (themes) within data. Learn more about how to do thematic analysis with AI
Step 3: Code systematically across all interviews
This is the heavy lifting. Go through each transcript and tag relevant passages with your codes. A single quote can have multiple codes - that's not just OK, it's expected.
The systematic approach:
- Work interview by interview (finish one before moving to the next)
- Tag at the paragraph or thought level, not line by line
- Keep your codebook updated as codes evolve - meaning you might need to go back to earlier interviews to recode if you decide to change something.
- Use a second opinion if something is unclear, often the coding stage reveals there are some passages which you might not understand but which could be valuable insights.
Let me give you a concrete example. In a cost efficiency project we were trying to understand the feasibility of various savings ideas from 30 different interviews to get a sense of real options on the table.
One executive commented on a specific idea with “The proposed timeline of 6 months is completely unrealistic. Our systems take 18-24 months to update, and we'd need board approval which only happens quarterly. This means the system will not be part of the solution for savings”.
I coded this passage with:
- Implementation timeline (the core issue)
- Technical constraints (the systems reference)
- Governance challenges (board approval process)
That's three codes for one paragraph - and that's appropriate because this quote speaks to multiple themes we were tracking.
A dangerous pitfall here is confirmation bias. You have a hypothesis going in ("I bet they'll say X"), and suddenly you see X everywhere while missing Y and Z. Combat this by forcing each paragraph to have an independent code - you will notice the interviewees have said much more than your biased mind remembers!
Step 4: Identify themes and synthesise findings
You now have 150 pages of transcripts covered in coloured highlights or tags. Now comes the synthesis - the step that transforms codes into insights.
The process:
- Pull together all quotes for each code
- Look for patterns within each code (Do all CFOs say one thing while all COOs say another? Do concerns vary by company size?)
- Count frequency (How many people mentioned this unprompted?)
- Assess intensity (Did they mention it casually or passionately?)
- Look for outliers (Why did 2 people have the opposite view?)
- Synthesise into theme statements
Let’s take a product pricing issue. Assume you had 47 passages coded as “pricing concerns”. A simplistic summary of the data would be “many current and prospective customers are concerned about the price”. True, but not really helpful. Doing this step rigorously would mean digging deeper into these concerns, and uncovering real themes:
Theme 1: "Too expensive for small teams" (mentioned by 8/20 interviewees)
- Representative quote: "At $99/month, this only makes sense if you have 10+ people. For our team of 3, it's not worth it."
- Pattern: All from companies with less than 50 employees
Theme 2: "Unclear value for the price" (mentioned by 12/20 interviewees)
- Representative quote: "I'm not against paying $99, but I can't explain to my boss what we get for that versus the free tier."
- Pattern: Strongest among users in month 1-2 (before seeing full value)
Theme 3: "Pricing creates internal politics" (mentioned by 6/20 interviewees)
- Representative quote: "The per-seat pricing means I have to justify every team member I add. It creates friction."
- Pattern: Only mentioned by enterprise customers (500+ employees)
See the difference? "People mentioned pricing" is a code. "Small teams find it too expensive, never users don't see the value, and enterprise teams face political friction from per-seat pricing" is an actionable insight.
Key output: Create a synthesis document with:
- Each major theme as a heading
- Supporting evidence (frequency counts, quotes, patterns)
- Your interpretation (what it means, why it matters)
- Anomalies or contradictions (you build credibility by acknowledging complexity and nuance)
Step 5: Validate and pressure-test your findings
You think you've found the key themes. But before you present to stakeholders, pressure-test your analysis. This step separates rigorous research from wishful thinking.
Validation techniques:
- Negative case analysis: Actively look for quotes that contradict your themes. Can you explain why those exist?
- Peer review: Have a colleague read your synthesis and 2-3 raw transcripts. Do they arrive at similar conclusions?
- Member validity checking: Go back to 2-3 interviewees and say "Here's what we heard across all interviews - does this ring true?"
- Triangulation with quantitative data: Do your findings align with quantitative data, if you have any? If 60% of interview subjects say "we need feature X" but only 5% of survey respondents ranked it highly, something's off.
In that business launch project mentioned earlier, I initially concluded "culture clash is the main concern." But when pressure-testing, I realized:
- Only senior executives mentioned culture (8/15 interviews)
- Middle managers barely mentioned it (2/15 interviews)
- When I went back and looked at what middle managers did emphasise, it was concrete process issues
My revised finding: "Senior leaders consider there is a big culture issue, but on-the-ground managers point to practical process challenges. This perception gap itself may be a risk." That's a much richer, more honest finding than my initial take.
If all your themes confirm exactly your initial hypotheses, you are either the world's best modern oracle, the thing you studied was trivial … or you have not fully listened to the data and need to go back!
What I wish existed 10 years ago
During my consulting time I knew the right approach (steps 1 to 5), but frustratingly often we could not follow it due to lack of time and lack of proper tools. We would try to load quotes to Excel and use tables, but the formatting would kill us. We would use Word with color coding, comment boxes but get lost between documents. Some teams used dedicated older tools like NVivo, ATLAS.ti or MAXQDA, but they were expensive and had a steep learning curve.
This meant even talented consultants ended up with messy Word docs and "gut feel" synthesis. It worked... barely. But it wasn't defensible, it wasn't transparent, and it took forever. This also partly explains why “hard numbers” and modelling was the more popular work stream - numerical data is easy to analyse and present nicely.
But I always believed in the power of qualitative analysis. Numbers only tell about things that can be counted, and often look backwards. True insights can be found from the nuances of human understanding.
When AI tools started to come out, I noticed a lot of people were hoping they provided a silver bullet. Upload all your documents to the chat or dedicated RAG-based tool and start asking questions, hoping that “AI magic” would provide the insights. But chatting with someone is not analysis, and often RAG doesn't work! If you’ve tried simple AI tools for analysis you will have seen how the insights are superficial, crucial facts are omitted and responses are hallucinated. If you slightly change your question, the answer changes a lot. If you point out the mistakes, the AI will gladly acknowledge them and apologise… but in the end you are not doing proper analysis as the AI is not putting in the real thinking and structured needed.
This is why we built Skimle. The goal was simple: make Steps 2-4 of this framework fast and transparent. Upload your transcripts, let AI suggest initial categorisation, refine categories as you go, and instantly see all quotes organized by category as a basis for summarising the themes. What used to take hours of manual highlighting now takes 2 hours of reviewing and refining AI-generated codes, and the interface allows you to take control of the full dataset in an intuitive way. Skimle is the spreadsheet tool for qualitative analysis.
The analysis still requires human judgment - Steps 1 and 5 cannot be automated. But the mechanical work of coding and organizing is something AI should handle to give you more time for honing your insights.
If you want to try an AI-assisted approach to interview analysis that follows best practices try Skimle for free. It's built for researchers, consultants, and anyone who needs defensible insights from qualitative data.
Olli from the Skimle team
