Methodology

Anyone can generate a score. Few can defend one.

A judgment about a person's capability is only worth as much as the reasoning behind it. This is the reasoning behind Biziga.

The hard problem

Measuring capability is harder than measuring knowledge.


A test can tell you what someone knows. It struggles to tell you what they'll do — whether they'll see the real problem, weigh the tradeoffs, and act well under pressure when the answer isn't in front of them.

That gap is where most assessment fails, and where hiring and development quietly go wrong. Closing it takes more than a clever question. It takes a way to observe judgment in motion — and a disciplined way to evaluate what you observe.

The principle

The framework is the referee. The AI is the instrument.

Most AI evaluation asks a model to form an opinion — and then asks you to trust it. We don't. Before a single simulation is scored, the standard already exists: a defined model of what good judgment looks like for the role, written down, fixed in advance.

The AI's job is narrow and accountable: observe what the learner did, and apply that standard to it. It doesn't decide what good looks like. It measures against a definition that was set before anyone pressed start.

What we measure

The capabilities that actually predict performance.

Every role is evaluated across a defined set of competencies — not personality traits or test scores, but the observable dimensions of how work actually gets done.

How they think

Whether they frame the right problem, reason through ambiguity, and arrive at sound decisions when the path isn't obvious.

How they act

Whether they prioritize well, move at the right moments, and follow through under pressure rather than freezing or flailing.

How they work with others

Whether they communicate with clarity, navigate the human dynamics of a workplace, and bring people with them.

How they hold up

Whether their judgment stays steady as stakes rise, information stays incomplete, and the easy option isn't the right one.

Each dimension breaks down further into the specific, role-tuned competencies a simulation evaluates. We'll walk you through the full model in a demo.

How it stays honest

A score is only useful if it can be trusted twice.

Evidence before judgment

Every rating traces back to specific moments in the simulation — what was done, said, or missed. No score floats free of the work that earned it.

The same standard, every time

Two learners who perform the same way receive the same evaluation. The standard doesn't drift with mood, model, or the order people happened to take the simulation in.

No inflation

A score means the same thing across people and over time. It reflects how much was actually demonstrated — and is honest about how confident the evidence allows us to be.

Calibrated to difficulty

Handling a hard situation well counts for more than coasting through an easy one. The standard accounts for what was actually being asked.

Why it holds up

Built to survive the question "how do you know?"


A capability score is only as good as the conversation it can survive — with a hiring panel, an L&D lead, a board. So we built the methodology to answer the only question that matters: how do you know?

Because the standard is defined in advance, the evidence is on record, and the reasoning is traceable — every Biziga result comes with its own defense. Not a number to take on faith. A judgment you can stand behind.

See the full framework, applied to your roles.

In a demo, we'll show you exactly how a role is modeled and scored — end to end.

Book a demo →