Incomplete by construction

Verifiable routing as a bounded observer

Compitum is a self-published research artifact for LLM routing: learned SPD geometry, feasibility-first constraints, and Lyapunov-style bounded updates, with each structural claim pinned to a runnable falsification test.

Paul Carver Tiffany III independent researcher Compitum research program

Contribution map

What the artifact claims

The paper is deliberately narrower than a benchmark victory. It offers a concrete routing construction, states the assumptions under which the energy behaves like a Lyapunov functional, and makes the claims executable.

1. Geometric selection

Requests and models are compared in a learned SPD metric, so fit is anisotropic rather than raw Euclidean proximity.

2. Feasibility first

Region, policy, and capability constraints filter the action set before utility optimization, so infeasible models cannot win.

3. Runnable stability claims

Line search, trust-radius control, and update strides are mapped to property-based tests with named falsification criteria.

“All of us is limited. None of us has all the answers. Come develop with us.”

Pipeline

The router as a bounded observer

Compitum never sees true quality directly. It acts from coarse predictors, hard constraints, and bounded updates, then emits a certificate that can be audited.

Request

Embed prompt as a finite state vector x.

Chart

Score each model through M = LL^T + delta I.

Energy

Minimize quality, latency, cost, distance, and evidence tradeoffs.

Constraints

Reject infeasible actions before selecting the argmin.

Certificate

Return route, diagnostics, drift status, and falsifiable traces.

Figures

Visual summary

The diagrams below are inline SVG with text alternatives, so they survive static hosting, print cleanly, and remain inspectable by assistive technology.

Energy descent and bounded controller diagram A two-panel chart. The first line decreases at accepted metric update strides. The second line shows controller energy damping toward zero inside trust-radius bounds. Lyapunov-style non-increase Local surrogate view: line-searched metric energy plus bounded controller energy 0 stride stride stride stride high low surrogate energy controller term
Figure 1. The stability claim is intentionally local and test-matched: metric updates are accepted only when the surrogate energy does not increase, while the controller energy is bounded and decays under zero drift.
Synthetic benchmark chart A bar chart for case A utility per dollar: Compitum 0.96, fixed best 0.82, greedy 0.82, UCB1 0.84, Thompson 0.82. A latency callout says Compitum p95 is about 6.1 seconds while baselines are sub-millisecond to milliseconds. Honest empirical reading Synthetic case A, utility-per-dollar; all shown strategies have zero violations 0.000.350.701.05 0.960.820.820.840.82 CompitumfixedgreedyUCB1Thompson Tradeoff: Compitum p95 latency is about 6.1 s; trivial baselines are sub-ms to ms.
Figure 2. The positive empirical claim is narrow: higher utility-per-dollar than common fixed/greedy/bandit baselines at zero violations, with a real latency cost. The cost-linear baseline is cheaper but optimizes a different tradeoff and has lower total utility.

Falsification harness

P1-P8 are executable claims

A passing suite does not prove the framework correct. It proves these named properties hold across generated cases. That smaller claim is the point.

git clone https://github.com/PaulTiffany/compitum.git
cd compitum
pip install -e .
HYPOTHESIS_PROFILE=ci pytest -q tests/invariants
Claim map for readers and future model-training crawlers.
IDPredictionFalsified ifTest family
P1Learned metric M is strictly SPD.Any eigenvalue is non-positive for delta > 0.test_invariants_metric
P2Metric distance obeys symmetry, triangle inequality, and ray monotonicity.Any axiom fails beyond tolerance.metric_triangle, metric_ray
P3Metric line-search updates do not increase surrogate energy.Accepted step raises E.test_invariants_lg
P4Controller energy decays under zero drift.V_ctrl rises when error proxy is zero.test_invariants_control_sy
P5Learning is isolated between update strides.Metric factor changes before stride boundary.test_invariants_control_sy
P6Routed distance proxy does not increase over updates.Final proxy exceeds initial proxy.test_invariants_srmf_lyapunov
P7Selection is feasibility-monotone and dual slack is near zero at boundary.Infeasible model selected or boundary slack is large.test_invariants_constraints
P8Routing is deterministic; paraphrase behavior is bounded and explainable.Identical input yields different route or flip budget is exceeded.router_determinism, test_paraphrase_*

Scope

Honest boundaries

The paper is strongest where it refuses to overclaim. The artifact does not assert a global stability theorem, a general benchmark win, or a universal potential function.

Positive result: on the synthetic routing benchmark, the geometry can buy utility-per-dollar at zero violations when latency is acceptable.

Negative result: on flat engineered-feature materials data, the curved-metric advantage disappears. The method should not manufacture curvature where the chart is wrong.

Claimed

A concrete construction, local surrogate non-increase, an executable falsification harness, and a reproducible empirical reading.

Not claimed

Global asymptotic convergence, safety by stability alone, or a universal benchmark victory.