What makes Explorbot so fast

#AI testing

Update Jun 22, 2026

6 min read

648 views

Explorbot is an open-source AI agent that explores web applications on its own. The first question people ask when they watch it run is always the same: how does it think so fast? They expect AI tools to be slow, hungry, and expensive. Explorbot is none of those things.

The short answer is that Explorbot uses two AI models with very different jobs, served by inference providers built for speed rather than training. The long answer is what the rest of this post is about.

Why testing needs speed

Testing a web application generates a lot of raw data. On every step, the agent reads HTML snapshots, ARIA trees, and sometimes screenshots. That’s the bulk of the work: perception, not reasoning. The actual decision (which element to interact with next) is comparatively simple. Click a button, fill a field, check whether a modal appeared. Nothing close to the complexity of generating code or reasoning through a multi-step problem.

A model that spends 30 seconds deciding which button to click is not useful here. Take a test with 30 steps: even if 20% of them lead somewhere wrong, a fast model recovers quickly. It explores a path, hits a dead end, backtracks, and tries the next option, all within the same time a slow “thinking” model would spend deliberating on step one.

The occasional wrong click is not a failure. It’s how exploratory testing works. Speed to recover matters more than certainty on each step.

The threshold that changes how the agent feels

On fast AI providers, the time for a model to respond is comparable to the time a browser takes to react to a click. A Playwright click (locate the element, fire the click, wait for the state change) takes roughly the same time as a single AI request.

One AI request, one browser interaction

Cross that threshold and the agent stops feeling like an AI tool. It starts feeling like another tab in your devtools, doing the same work at the same speed.

A quick primer on inference providers

Not all AI providers are built the same. OpenAI and Anthropic train their own models and run them on conventional GPU clusters built for training. Others specialise in inference: the act of running a model that someone else has trained, as fast and cheaply as possible.

Groq, Cerebras , and SambaNova are inference providers. They built custom silicon specifically for serving models, not training them. Different hardware, different performance profile.

Groq is the main provider behind Explorbot’s speed. Groq builds their own chips (not GPUs) designed from the ground up for running large language models. Groq doesn’t develop models; it serves open-weight models from Meta, OpenAI, and others at speeds GPU-based providers struggle to match. The result is consistently high throughput at some of the lowest prices in the industry.

When OpenAI released gpt-oss under Apache 2.0 in August 2025, Groq was among the first to serve it. For the second model slot, OpenRouter fills the gap: a meta-provider with one API key that routes to the fastest available provider across the market.

Recommended models for Explorbot

Throughput below is measured in TPS — tokens per second, the rate at which a model produces output. Higher TPS means the agent waits less between steps.

Config key	Model	Provider	Throughput	Pricing
`model`	gpt-oss-20b	Groq	1000+ TPS	$0.03 input / $0.14 output per 1M tokens
`agenticModel`	Minimax M2.5	OpenRouter :nitro	200+ TPS	$0.26 input / $1.00 output per 1M tokens
`visionModel`	Llama 4 Scout	Groq	500+ TPS	$0.11 input / $0.34 output per 1M tokens

gpt-oss-20b is the default model: a 21-billion-parameter mixture-of-experts model with only 3.6 billion active parameters per token. Small enough to run on a single GPU, cheap to serve at scale. Researcher, Navigator, and Tester use it on every iteration to read full HTML and ARIA snapshots.
Minimax M2.5 is the agentic model, used by Pilot and Planner. It makes high-level decisions based on compact action logs, not raw HTML. OpenRouter’s `:nitro` routing dispatches requests to the fastest available provider in real time.
Llama 4 Scout is the vision model. It handles visual checks: when something is genuinely visual (did the toast appear, is this element highlighted), a screenshot is faster and more reliable than parsing the DOM.

The real split: cheap and fast, vs expensive and smart

Most AI testing tools pick one model and feed it everything. Explorbot does not.

Explorbot uses two tiers of fast model, paired by job:

	Cheap and fast	Expensive and smart
Model	gpt-oss-20b	Minimax M2.5
Input price	$0.03 / 1M tokens	$0.26 / 1M tokens
Output price	$0.14 / 1M tokens	$1.00 / 1M tokens
Throughput	1000+ TPS	200+ TPS
Job	Reads HTML and executes fast repetitive tasks	Makes decisions, explores flows, and plans actions

Both columns are fast. The split is not speed. It is **cost per token against reasoning quality**, applied to the right context size.

Why HTML lives in the cheap column

HTML snapshots are big. A modest application page produces five to twenty thousand tokens of cleaned HTML once you include the structure, attributes, and ARIA roles needed for reliable selectors. Multiply that by thirty iterations in a test, and the agent is sending hundreds of thousands of input tokens through whichever model reads it.

You cannot do that on the expensive tier. At $1 per million output tokens it adds up fast. At $0.14 per million it is rounding error.

So Researcher, Navigator, and Tester (the agents that read the page on every step) all run on `gpt-oss-20b`. Token-hungry by nature, and the cheap tier keeps that cost negligible.

Why decisions live in the expensive column

Pilot, Planner, and the other decision-making agents never read the full HTML. They read a researcher-built page summary plus the last few tool executions. A few hundred tokens of context, not twenty thousand.

That tiny context is what makes the expensive smart tier affordable here. The per-token price is higher, but it applies to a fraction of the tokens. Pilot’s reviews are some of the most important calls Explorbot makes, and Minimax M2.5 reasons through them well.

Screenshots are also fast

A common assumption is that vision-capable models are slow. They were once. Today, **Llama 4 Scout on Groq runs above 500 tokens per second** with multimodal input. Explorbot uses it for visual checks (did the toast appear, is this button highlighted), and the screenshot path is not the bottleneck people assume.

Explorbot still defaults to HTML for the main loop because text is more reliable for selectors and structure. But vision is a tool, not a tier to avoid.

Recommended setup

{

model: groq('openai/gpt-oss-20b'), // tester, researcher, navigator

visionModel: groq('meta-llama/llama-4-scout-17b-16e-instruct'), // see() tool

agenticModel: openrouter('minimax/minimax-m2.5:nitro'), // pilot, planner, decisions

}

Why not Playwright-MCP and Cursor

Playwright-MCP lets you drive browsers from an AI IDE like Cursor. But MCP consumes roughly 4x more tokens than direct CLI approaches for the same task—the accessibility snapshots are expensive. More importantly, MCP workflows are interactive: you guide the agent through chat. Explorbot is autonomous. It runs overnight without you. Different tools, different jobs.

Takeways

The interesting part of Explorbot isn’t the model list. It’s the principle behind it: match the model to the shape of the work, not to the prestige of the brand.

Perception is bulk work on huge context. It wants a cheap, fast model. Decisions are surgical work on tiny context. They want a smarter, slightly slower one. Once you see that split, the model choice becomes obvious, and the bill stops looking scary.

The same principle holds outside testing. Any AI workflow that mixes “read this giant document” with “decide what to do about it” can be split the same way. We just happen to do it for browsers.

Test the claims

Everything above is a claim: that gpt-oss-20b is fast enough, that Minimax M2.5 reasons well enough, that $1 an hour covers a real testing run. None of it is hard to verify. The install takes a minute.

bash

npm i explorbot
npx explorbot init

Set model: openai/gpt-oss-20b on Groq, point your agentic slot at Minimax M2.5 on OpenRouter, and run it against your own app.

GitHub: github.com/testomatio/explorbot
Docs: docs.testomat.io
Sync results with Testomat.io for unified test reporting

Michael Bodnarchuk

Read other posts

Passionate Dev and test automation enthusiast. Michael believes testing should be easy and fun. Thus, he has created Codeception (PHP) and CodeceptJS (NodeJS) frameworks for easy BDD-style tests. Full-time open-source contributor since 2013, tech consultant, corporate trainer, and conference speaker. Currently serving as the CTO role and chief development of our testomat.io test management tool. Also, enjoys kayaking, hiking, and playing Heroes 3. Come on, connect with Michael on Twitter and other social media ↩️

Linkedin GitHub Twitter

Latest articles

AI testing automated testing Insights qa process test automation

What is SDET? Software Development Engineer in Test Explained

You opened three SDET job listings, and each one reads like a different role. The first wants Selenium and Java. The second asks for Playwright with TypeScript. The third lists Kubernetes alongside test design and CI pipelines. So what does an SDET actually do, and is this the path you want to grow into? In […]

Tetiana Khomenko

Jul 09, 2026

AI testing ci\cd MCP playwright

Multi-Agent AI Testing with Claude Code & Playwright Test Agents

You ship a feature. Someone writes a manual test case in a spreadsheet, and someone else automates it later. The spec drifts from the manual case. The manual case drifts from the product. Nobody knows what’s actually covered. I wanted to fix that entire chain, not by hiring more people, but by rethinking who (or […]

Oleksandr Bazurin

Jul 07, 2026

AI testing MCP playwright

Playwright MCP & n8n: AI-Powered Agentic Orchestration Tutorial

AI is quickly changing how QA teams work due to its growing popularity, ease of use and adoption. The latest stage in the evolution of software testing is the independent AI agents that help software testers amplify their productivity by automating a variety of routine tasks, from test design to execution. Such agents can work […]

Ujjwal Khanal

Jul 04, 2026

AI testing MCP qa process

Building an AI-Powered Test Case Drafting Skill with Testomat.io MCP

Writing manual test cases is one of those tasks that feels straightforward until it isn’t: every QA engineer on your team formats things slightly differently, context gets lost between features, and coverage gaps show up only when something breaks in production. My team solved this by building a reusable AI skill called test-case-drafting-template. This’s a […]

Daria Tsion

Jul 03, 2026

Continuous penetration testing software interface with a security shield icon

AI testing api testing automated testing ci\cd DevOps test management testing guides

Continuous Penetration Testing: Guide for QA and DevOps Teams

When your team deploys code every day, you trust your CI pipeline to catch the bugs. You run unit tests, integration tests, and code reviews. But there’s a scenario most security programs haven’t been ready for. Continuous deployment moves faster than your pentest schedule the moment your team starts to scale.

Tetiana Khomenko

Jun 04, 2026

AI testing

Automated Exploratory Testing: How Explorbot Tests Your App on Its Own

For years, we treated automated tests as a regression safety net. That is what they are good at. The problem is simple: a test only walks the path you wrote for it. Hardcoded. Repeatable. Blind to anything outside the script. A test can pass on a screen where a real user is hopelessly lost, because […]

Michael Bodnarchuk

May 20, 2026

AI testing testing tool

Best HIPAA Compliance Software & Guide for QA Engineering

Healthcare data doesn’t forgive mistakes. A misconfigured access control, an untested workflow, an unreviewed policy any one of these can turn into a HIPAA violation, a breach notification, and a fine that dwarfs whatever was saved by cutting corners on compliance. The organizations that stay compliant aren’t necessarily the ones with the biggest legal teams; […]

Tetiana Khomenko

May 06, 2026

AI testing

AI for Quality Assurance: Automating Modern Testing Workflows

QA teams are drowning in repetitive work: writing test cases from scratch, maintaining outdated test suites, hunting for coverage gaps across hundreds of features. AI in quality assurance shifts where your time goes. Instead of generating boilerplate test cases manually, QA engineers spend their hours reviewing what AI drafts, catching edge cases the model missed, […]

Vitaliy Mikhailyuk

Apr 24, 2026

AI testing testing guides

Autonomous Software Testing: Tools, AI Models & Guide 2026

Most test automation still requires a human to write every script, maintain every selector, and decide what gets tested. Autonomous testing refers to something different: software testing where AI handles the generation, execution, and analysis of tests without step-by-step human instruction. That’s a meaningful distinction. Traditional test automation automates the execution of tests a human […]

Mykhailo Poliarush

Apr 21, 2026

AI testing

Generative AI in Software Testing: Practical Guide & Strategies

Companies spent 33.9 billion dollars on generative AI per year. Yet 74% of teams struggle with implementation. The problem is not adoption. The problem is making AI testing tools actually work. QA teams face specific challenges when testing generative AI systems. Traditional testing methods fail when applications generate unpredictable outputs. Test case creation becomes exponentially […]

Mykhailo Poliarush

Mar 26, 2026

AI testing

LLM Testing Frameworks: Top Tools & Evaluation Strategies

Testing LLM applications differs from traditional software testing in fundamental ways. A chatbot built on large language models produces different outputs for identical inputs. Regression tests that worked yesterday fail today without code changes. QA teams trained on deterministic systems struggle with non-deterministic AI responses. Gartner reports 85% of GenAI projects fail due to inadequate […]

Vitaliy Mikhailyuk

Mar 19, 2026

AI testing Insights

Manual Testing with AI: Optimize QA with Minimal Intervention

Hi everyone, my name is Ira, and I work as a QA Engineer at MEV. I have been in the testing industry for about seven years, working on a variety of interesting projects – testing web, desktop, and mobile applications. Every day I use AI tools. In my opinion, they are a very convenient assistant […]

Ira Kravchenko

Mar 06, 2026

Testing with Testomat.io & Playwright Agents

AI testing Insights test management testing guides

From Manual QA to QA Automation Engineer: A 2026 Guide with Playwright & AI

There’s a particular kind of frustration that lives in the gap between knowing what to test and knowing how to automate it. You’ve written the steps. You understand the logic. You know exactly what the button should do when clicked. But the moment someone mentions TypeScript, page objects, or async/await, the mountain starts looking very […]

Vitaliy Mikhailyuk

Feb 27, 2026

agile AI testing automation testing Insights jira testing theory

Software Testing Trends 2026: The Ultimate QA Guide

The software testing industry has witnessed more transformation in the past few years than in the previous decade combined. Remember when “automation” meant recording macros in Excel and teams debated whether Selenium was worth the learning curve? Those days feel like ancient history now that AI-driven testing tools can generate comprehensive test scenarios faster than […]

Mykhailo Poliarush

Jan 09, 2026

AI AI testing automation testing test management

Test Case Generation Using LLMs: How AI Is Transforming Software Testing

According to different estimates, the global number of software testers doesn’t exceed 9 million. Given the meteoric spike in the production of all kinds of IT products and the respective quantity of software tests to be conducted, QA teams can’t get them covered, no matter how hard they try. The only way to ensure a […]

Vitaliy Mikhailyuk

Dec 30, 2025

Frequently asked questions

Can I use my coding subscription (Anthropic/OpenAI/GitHub Copilot)?

No. Anthropic, OpenAI, and GitHub restrict subscription plans from being used programmatically. As of April 2026, Anthropic requires commercial usage to be on API key billing, separate from personal subscriptions. OpenAI and GitHub have similar limits and weekly caps. More importantly, the models in those subscriptions are slow. Opus and gpt-5 (around 200ms time-to-first-token) are nowhere near the throughput of gpt-oss-20b on Groq (1000+ TPS) or Minimax on OpenRouter (200+ TPS). You’d spend more on overages while reading HTML slowly.

Why not just Claude Opus or Haiku?

Opus 4.7 costs $5 input and $25 output per million tokens. Read 20,000 tokens of HTML and output a decision, and you’re spending $0.50 per iteration. Haiku 4.5 is cheaper ($1 input, $5 output) but doesn’t have the reasoning for Pilot’s supervision. This split, cheap fast models on HTML-hungry agents and smarter affordable models on small-context decisions, is more cost-efficient than either alone.

Can we use our own models?

Yes. If you host your own models (via vLLM, Ollama, or a Kubernetes cluster), you can point Explorbot at any inference endpoint via Vercel AI SDK. Speed and pricing depend entirely on your infrastructure. For most teams, outsourcing to OpenRouter or Groq is simpler and more cost-effective than maintaining dedicated GPU capacity.

What makes Explorbot so fast

Why testing needs speed

The threshold that changes how the agent feels

One AI request, one browser interaction

A quick primer on inference providers

Recommended models for Explorbot

The real split: cheap and fast, vs expensive and smart

Why HTML lives in the cheap column

Why decisions live in the expensive column

Screenshots are also fast

Recommended setup

Why not Playwright-MCP and Cursor

Takeways

Test the claims

Michael Bodnarchuk

Latest articles

Frequently asked questions

Can I use my coding subscription (Anthropic/OpenAI/GitHub Copilot)? <img width="16" height="9" src="https://testomat.io/wp-content/themes/testomatio/img/icons/chevron-down.svg" alt="Testomat">

Why not just Claude Opus or Haiku? <img width="16" height="9" src="https://testomat.io/wp-content/themes/testomatio/img/icons/chevron-down.svg" alt="Testomat">

Can we use our own models? <img width="16" height="9" src="https://testomat.io/wp-content/themes/testomatio/img/icons/chevron-down.svg" alt="Testomat">

Can I use my coding subscription (Anthropic/OpenAI/GitHub Copilot)?

Why not just Claude Opus or Haiku?

Can we use our own models?