Choosing the right method: how rapid testing reshaped WhisperPad’s core interaction

A note before the post: this is a write-up of how I used a specific UX research method, RITE, while building WhisperPad, a local-first voice-to-text app for macOS. It was run rapidly and informally during a graduate project. There are no formal session reports behind it, and that was a deliberate choice for the context. What follows is a practitioner’s account of what the method surfaced and how it changed the product.

The method

RITE stands for Rapid Iterative Testing and Evaluation. It was formalized at Microsoft Games, and its defining trait is not simply “test often.” It is this: when you observe a problem and the fix is obvious, you change the design immediately, sometimes after a single participant, and the next participant tests the new version. You do not wait for a full round of sessions to finish before acting.

I chose RITE for WhisperPad because the situation called for it. I had a working product, a tight core interaction, and limited time. I was not trying to validate a finished design. I was trying to find the friction I could not see myself, because I used the app every day and had gone blind to it. RITE is built for exactly that: fast loops, decisive changes, the design moving between participants.

I ran sessions with six participants, classmates from my program, a mix of technical and non-technical backgrounds. Sessions were short, roughly fifteen minutes, walking each person through onboarding, settings, and the core dictation flow. Two of those participants kept using the app afterward and became ongoing sources of feedback, which matters later in this post.

One discipline I held throughout: I did not act on everything I heard. I acted when a problem was clearly a problem, when something obvious to me as a daily user was clearly not obvious to the person in front of me. Chasing every comment is how you iterate yourself in circles. RITE works because the changes are decisive, and decisive requires judgment about which signals are real.

The chain that started with a missing signal

The most useful thing RITE surfaced was not a single finding. It was a chain, one problem leading to a fix that created the next problem, and I think the chain is more honest about how design actually works than a simple before-and-after would be.

The first version of WhisperPad had almost no feedback that transcription was happening. There was the small microphone icon macOS shows in the menu bar whenever any app has the mic open. It turns orange. But that is the generic system indicator, not something WhisperPad controls, and on a laptop screen it is very easy to miss. From the app’s point of view, you spoke into a void and text eventually appeared. (see below)

A participant named the gap directly. They said it would help to see the transcription happening, to watch the words appear in the text field as they spoke. That made sense. So I built it: live transcription, text inserted as you talk.

The first build of that feature was janky. Honestly it was close to unusable. I spent time tightening it until it was good enough to put in front of someone, and then I tested it.

That is where the next participant revealed a new and real problem. As they spoke, they started reading the text appearing on screen, their own words from a second or two earlier. And it derailed them. They slowed their speech to match what they were reading. They lost their train of thought. When they saw a small transcription error, it would distract them further. The lag made it worse: the words on screen never matched where they were in their thought, so the screen and the speaker were always slightly out of sync. (see below)

Live transcription, the thing a user had asked for, was actively breaking the core experience for the next user.

The principle: only default when you are 100% sure

Here is the decision that I think matters most, and it is a principle I now apply everywhere: I did not revert the feature, and I did not make it the default. I made it a toggle.

My rule is simple. A new feature only becomes the default behavior when I am 100% certain it improves the experience. Until then, it ships as an option the user can turn on. I was not certain live transcription was an improvement, the evidence was actively mixed, so it stayed available for the people who wanted it and stayed off by default for everyone else.

Back to the gap, not the failed solution

The failed feature did not make the original problem go away. Users still needed to know the app was working. Live transcription was a bad solution to a real gap, and the discipline is to separate those two things: throw out the solution, keep the problem.

So I went back to the gap and designed a third option. While the app is listening, it types the word “listening” and animates an ellipsis after it, adding one dot at a time and clearing them, a small breathing animation, dot, dot, dot, clear, repeat, for as long as you are speaking. When you end capture with the keyboard shortcut, the indicator switches to “transcribing” with the same animation. Then it pastes your finished text. (see below)

The point is that it gives you a clear signal the system is working without ever showing you your own words mid-thought. There is nothing to read along with, nothing to derail you. Just confirmation.

That tested noticeably better. It was not distracting, and it closed the feedback gap.

The real conclusion: the right answer was “let them choose”

Even with the listening indicator, testing did not converge on one perfect default. Some users liked the indicator. Others, including me, preferred silent mode, no on-screen feedback at all, just speak and the text appears when it is ready.

This is the part of the study I find most interesting, and it is the part a case study can sometimes hide. RITE did not hand me one correct design. It revealed that the correct design was different for different people, and that the right response was to stop hunting for a single universal default and instead give users genuine control over the mode they work in. WhisperPad ended up with three: silent, listening indicator, and live transcription. Each one exists because testing showed a real group of users it served, and each non-default mode exists as a toggle because I was not 100% sure any single one belonged to everyone.

The method revealed a real gap and also helped me validate the effectiveness or ineffectiveness of the solutions that I came up with in a very short amount of time.

A second example: pacing the onboarding

The same principle, cognitive load is the thing to design against, showed up in onboarding.

Early on, WhisperPad’s first-run setup was just the settings page. I opened it for the user and let them read through the options on their own. That was fine when there were only a few settings. But as the app grew, the first screen crossed a threshold, more than five distinct choices on a single page, and watching people meet that page, it was clearly too much at once. The typical response was to ignore it completely and then be confused about the behavior they encountered in the app later.

So I replaced the open-settings approach with an onboarding wizard: one decision per screen. Same settings, same choices, but paced. Each screen introduces one feature, explains it, and asks for one selection. The point is breathing room, giving the user space to understand each thing before the next one arrives, instead of facing the whole configuration surface at once.

The thread continued: solving it again under a constraint

One coda, because the story did not stop when the RITE testing did.

WhisperPad now ships in two versions. The direct-download version is my main branch, and it can paste transcribed text directly where your cursor is. The Mac App Store version cannot do that, because the accessibility permissions that automatic pasting requires are not something Apple will approve for the App Store, so that version places your text on the clipboard for you to paste yourself.

During review, Apple flagged that the App Store build felt slow or unclear, that it was hard to tell what was happening. That is the exact gap from the start of this post, returning in a new form. So I designed the signal again, this time as a small dedicated window: it says “listening,” shows a pulsing microphone icon, and tells the user precisely what to do next, double-press Control again to stop. When they do, the window switches to “transcribing” until the text is ready, then prompts them to paste. (see below)

I liked that solution enough that I brought it back to the direct version as another optional mode. In the App Store build it is on by default, because there it solves a problem the app would otherwise have. In the direct build it is a toggle, because, by now you know the rule, I was not 100% sure every user wanted it.

What I would take from this

A method is not a ritual. RITE was the right choice for WhisperPad because the conditions matched what RITE is for: a working product, a need to find friction I had gone blind to, and the ability to change the design fast between sessions. The same project used longitudinal check-ins with my two ongoing users for the slower questions a fifteen-minute session cannot reach. Different question, different method.

And the most portable thing I took from it is the toggle principle. Testing rarely hands you one universal answer. More often it shows you that users genuinely differ. When that happens, the move is not to pick a winner and impose it. It is to default to what you are certain of, and to let the user choose the rest.

From Insight to Interface: Building a UX Research Tool with AI Collaboration

As a UX researcher and strategist, I’m always thinking about how to make decision-making easier, faster, and more grounded in evidence — not assumptions. Some of the more frequent questions I get while working are:

“How big of a sample size do we need? Why does it have to be that big? How confident are we in the results?”

These questions are deceptively simple — and incredibly important. So I set out to build a tool that could help answer them quickly and visually.

But rather than mock something up and pass it off to an engineer, I wanted to try something different:

Could I collaborate with an AI (ChatGPT) to build, iterate, and deploy a live web app myself — no dev team required?

The Tool: A Sample Size Confidence Estimator

The result is a live tool that:

  • Shows how much your survey might cost depending on your per-response rate
  • Calculates your estimated confidence level based on your current sample and population size
  • Compares your sample size to the required sizes for 85%, 90%, and 95% confidence
  • Offers clear, plain-language explanations of margin of error and confidence intervals

You can try it live here:

 Sample Size Confidence Estimator

The UX Behind the Interface

This wasn’t just a code experiment — it was driven by UX strategy:

  • I identified a recurring pain point from stakeholders and researchers
  • I mapped out what users need to know — and when — to feel confident in their decisions
  • I focused on reducing cognitive load, with simple sliders, contextual tooltips, and visual benchmarks
  • I designed for progressive disclosure: advanced options like finite population correction are there if you need them, but stay out of the way otherwise

Building With AI (and Not Just Asking for Code)

What made this project special was how I used AI as a creative collaborator, not just a code monkey:

  • I asked questions like “How would we calculate confidence based on margin of error?” and “How should the slider respond at different ranges?”
  • I got help debugging build issues, refactoring the interface, and improving clarity
  • I used the AI to make statistical methods more accessible, turning abstract math into readable, plain-language explanations

This wasn’t a prompt-and-go project. It was an ongoing conversation with the model — much like pair programming or rubber duck debugging — and it let me iterate fast.

Why This Matters

I’m proud of this tool not just because it works — but because it’s a small example of what happens when UX, research, strategy, and modern tools all come together. It reflects how I like to work:

  • Grounded in real user needs
  • Thinking from both the researcher’s and stakeholder’s perspective
  • Open to experimentation and technology — even if it means learning by doing

Whether you’re a fellow researcher, designer, or strategist: I hope this tool helps you get better answers, faster.

And if you’re a hiring manager or collaborator? Let’s talk about what we could build together next.


Understanding the Customer Support Experience

Image of a telephone support office with rows of desks

Understanding the Support Experience

M. Fraser, Sr. UX Researcher

J. Swartz, Assoc UX Researcher

R. Zelaya, CX Researcher

Agents as a function of CARE were seen as a pain point in the following ways

  • Financially
  • Time taken to resolve cases was higher than average
  • tNPS scores were unsatisfactory

A solution suggested was to redesign the agent dashboard. A list of features, possible designs and improvements and ways of understanding the process were seen as the potential outcomes.  Since we had very little knowledge of what the day to day needs and work of our care agents, so designed a research project getting at this knowledge

Initial exploration

Reviewing the care rep performance data uncovered an unexpectedly long amount of time both during and after the care calls. We had expected the average call to be 5-10 minutes but the average call was 12 minutes or more. In addition, resolution rates were lower than expected possibly meaning repeat calls by the customer to resolve their issue leading to higher than expected call volume.

Methods:

We used a primary and secondary methodology to identify impacts on time.

The primary methodology was contextual in nature.   We shadowed and interviewed agents during the process of answering calls.  We attempted to understand their normal work flow and understand the pain points associated with the process.  We investigated tools, dashboards, and systems used to handle calls. The end goal here was to quantify product issues and customer facing issues.

Care agent interview
A follow-up interview after a call shadowing session

The second was a cafe study.  We chatted with agents over coffee one-on-one and spoke about their day-to-day experiences, specifically asking about the pros and cons of working at the call center.   This was done to understand the motives behind their process and work life. We interviewed the agents outside of the work environment and asked about their day to day lives, commuting, living situation etc. We took them out of the work environment to give them a space to vent about their jobs and lives without a manager’s presence influencing their answers or speech. We wanted to understand their lives in order to streamline their workspace thus ensuring a greater ability to manipulate dashboards and other tools.

Results

Affinity mapping revealed a few themes.

First, we noticed there were organizational processes which increased customer effort. For example: our partners in costa rica would hold a required manager’s meeting each morning for 15-20 minutes. However managers were part of a required business process for refunds approval. If a customer was unlucky enough to call during this time they would have to wait longer for help with a simple business process.

We also learned about frustrations with tools the agents were equipped with. For instance for security reasons agents weren’t allowed able to download or save files. So if a customer wanted a report of their usage,  a supervisor or manager had to download the file to email it to the customer. This was problematic as agents couldn’t take notes of their calls to save for follow ups. In short, agents were denied simple procedural behaviors that would allow them to do their job quickly, creating a longer time on task.

Image of an employee going through security

Additionally, we noticed the work culture at the partner site led to distrust.  Regular security checks, the aforementioned security restrictions and a lack of mobile devices (needed to troubleshoot mobile applications). We recommended a renegotiation of contract with our near shore partners, or a return to having internal customer care teams.

Outcomes

Customer care operations are moving back in house. This affords greater control over tools and processes, so agents and customers  are set up for a lower effort experience.

Agents are now given the same permissions as supervisors, and are trained to use their own judgment for refunds and other issues that would have required supervisor intervention.

FInally a culture of trust is fostered. Agents are now a part of our company and not external resources hired for a 3rd party contract. This creates a sense of belonging and trust in our representatives who are on the front lines of our customer experience.

Napses Mobile

Napses was an awesome experience and during my relationship with them I was driven to fully explore my curiosity in UX.  When I first joined them it was the Summer after I had graduated college. We were all feeling a little disappointed by the technology that was being used as a CMS for classrooms during our last years at school and felt that there could be a better way of managing a college course.
Continue reading “Napses Mobile”