CLI Agent Readiness Evaluation

Make your CLI legible to agents.

CLIARE measures a released command-line binary as a black box, records runtime evidence, and emits the command shape that agent harnesses need before they start guessing.

Open-source Rust CLI. Runtime evidence first. Shape files for maintainers, security reviewers, and agent harnesses.

Illustration of an agent using command-line hands to reach services.
CLIs are becoming the hands agents use to reach code hosts, cloud systems, internal platforms, and local developer workflows.

The shift

Agents can explore terminals. That does not make every CLI design acceptable.

A human can read docs, infer intent, and tolerate drift. An agent harness needs stronger evidence: which commands exist, which operands are required, which outputs parse, and which probes change state.

Discovery Help

Canonical command help should work on every real command path.

Invocation Args

Usage lines should expose required, optional, and variadic operands.

Consumption JSON

Machine-readable output should be advertised, executable, and parseable.

Safety Proof

Help, version, and diagnostic probes should avoid hidden writes.

Two use cases

One measurement, two products.

For maintainers

Turn agent-readiness into an implementation queue.

CLIARE tells maintainers where agents lack evidence: missing help, unclear grammar, weak diagnostics, precondition blockers, unvalidated output modes, and unsafe discovery behavior.

For harnesses

Load a shape file before executing and learning.

Harnesses can consume command-index.json, shape.json, and generated guidance to route through commands with known confidence, preconditions, and output contracts.

Product story

From CLI drift to evidence-backed command shape.

Illustration of docs, help, and runtime drift.

Surfaces drift

Docs, help text, and the released binary can stop matching as a CLI grows.

Illustration of an agent burning tokens in a help loop.

Agents pay the tax

Without a shape, agents rediscover commands by trying help, flags, errors, and retries.

Illustration of CLIARE probing a released CLI.

CLIARE records evidence

Probes run under bounded contexts, then produce durable evidence and command artifacts.

How it works

Measure the binary the agent will actually use.

CLIARE does not trust framework internals or stale docs. Help text is evidence, not truth. Runtime confirmation is stronger evidence.

01

Probe safely

Run help, version, invalid-input, output-mode, and discovered command probes within a bounded traversal budget.

02

Record evidence

Capture exit status, bounded stdout/stderr, timing, side-effect deltas, runtime context, and probe intent.

03

Infer shape

Extract command rows, flags, operands, output contracts, diagnostics, and preconditions from runtime observations.

04

Emit artifacts

Write command indexes, scorecards, issue ledgers, summaries, persona packets, and generated agent guidance.

Install

Start with the release installer or Cargo.

Latest release binary

curl -fsSL https://github.com/modiqo/cliare/releases/latest/download/install.sh | sh
cliare metadata --format text

Crates.io

cargo install cliare
cliare metadata --format text

Measure a CLI

cliare measure mycli \
  --out .cliare/mycli \
  --profile standard \
  --refresh

cliare summary --out .cliare/mycli

Prepare a harness shape

cliare describe .cliare/mycli --write
cliare report harness --out .cliare/mycli --write
cliare surface query "list projects" --out .cliare/mycli

Outputs

The shape file is not another doc. It is evidence with provenance.

Products distributed as CLIs can ship these artifacts beside each release so agents can assess the command surface, choose the right operation, and invoke it with the output contract their inference path needs.

Harness map

command-index.json

Command paths, summaries, runtime states, confidence, preconditions, output contracts, and suitability.

Use case: a harness routes user intent to a command without rediscovering the CLI through trial runs.

Inferred model

shape.json

Claims about commands, flags, positionals, output modes, shape gaps, and evidence references.

Use case: a CLI vendor packages a versioned shape file so agents can inspect supported operations before invocation.

Proof log

evidence.jsonl

The runtime record of scheduled probes and completed target processes.

Use case: maintainers and auditors trace each shape claim back to the command output that produced it.

Score

scorecard.json

Maintainer readiness, harness shape confidence, subscores, findings, model id, and model hash.

Use case: release pipelines gate on regressions in discovery, grammar, output parsing, recovery, and safety.

Decoder

condition-dictionary.csv

Plain-English definitions and examples for condition labels used in reports and summaries.

Use case: reports stay understandable when a developer or agent sees labels like precondition-blocked or candidate-only.

Agent guidance

AGENT_SKILL.md

Generated guidance an agent can read when inspecting CLIARE artifacts.

Use case: an agent gets a local review workflow for reading the evidence, shape, score, and harness packet together.

Adoption path

Measure once, then use the evidence where agents make decisions.

Illustration of maintainers using CLIARE in CI.

Maintainer CI

Keep command help, diagnostics, output contracts, and discovery safety from regressing between releases.

Illustration of a harness using a command index.

Harness routing

Give an agent a measured command surface before it chooses a command path or constructs arguments.

Illustration comparing agent skills and command indexes.

Skills plus indexes

Let skills teach workflow and policy while the command index maps what the runtime actually supports.

Agents should not be forced to discover your command surface by trial and error. Give them evidence for the navigation capabilities you expect them to use.

Deeper model

Read why CLIARE exists.

The research note explains the terminal-agent evidence, the scoring boundary, and the current deterministic algorithm.