Analysing a grammar

Checking for issues

ts-bnf-tool check runs all static checks on a grammar file and exits with a non-zero status if any issue is found. This makes it easy to wire into a CI pipeline:

ts-bnf-tool check json.bnf
echo $?   # 0 if clean, 1 if warnings only, 2 if any errors

Checks performed:

Check	Severity	Example diagnostic
Undefined rule references	warning	`warning: undefined rule reference 'foo'`
Undefined `%axiom` rule	error	`error: %axiom references undefined rule 'foo' (line 1)`
Duplicate `%axiom`	error	`error: %axiom declared more than once (line 2)`
Undefined `%conflicts` rules	warning	`warning: %conflicts references undefined rule 'foo'`
Undefined `%inline` rules	warning	`warning: %inline references undefined rule 'foo'`
Undefined `%supertypes` rules	warning	`warning: %supertypes references undefined rule 'foo'`
Undefined `%extras` rules	warning	`warning: %extras references undefined rule 'foo'`
Unreferenced rule	warning	`warning: rule 'foo' is never referenced (line 4)`

Pass --json to get diagnostics as a JSON object on stdout instead of plain text on stderr. Exit codes are not affected:

ts-bnf-tool check --json json.bnf

{"diagnostics":[{"severity":"warning","message":"rule 'unused' is never referenced (line 3)"}]}

Left-recursion

Left-recursive rules are not flagged by check. Tree-sitter is a GLR parser generator: left recursion is fully supported and is the idiomatic style for binary and postfix expression rules.

# OK — directly left-recursive, idiomatic for binary operators
expr -> expr '+' term | term ;

Left recursion is still a grammar property worth knowing about — for instance, a left-recursive rule may need a %prec annotation or a %conflicts entry to resolve ambiguity. The check --summary block reports how many rules are directly or mutually left-recursive (see Summarising grammar shape below).

What actually makes tree-sitter generate fail is unresolved ambiguity — for example expr -> expr '+' expr | 'n' with no precedence annotation. Ahead-of-time detection of such conflicts is planned separately (#31).

Unreferenced rules

A rule that is defined but never referenced by any other rule (and is not the root rule) is reported as a warning. The root is either the rule named by %axiom, or — when %axiom is absent — the first-declared rule:

root   -> item+ ;
item   -> /[a-z]+/ ;
unused -> 'x' ;   # never referenced

warning: rule 'unused' is never referenced (line 3)

Summarising grammar shape

check --summary appends a compact metrics block to stdout after the run. Diagnostics still go to stderr, so the two streams can be captured independently in shell pipelines.

ts-bnf-tool check --summary json.bnf

Rules            6  (leaf: 2, unreachable: 0)
Terminals       12  (literals: 10, patterns: 2, unique values)
Undefined refs   0
Left-recursive   0  (direct: 0, mutual: 0)
FIRST sets      min 1  max 7  avg 2

Each row measures a different aspect of the grammar:

Row	What it tells you
Rules	Total named productions. leaf = rules whose body contains no rule references (only terminals). unreachable = rules never reached from the root, which `check` also flags as warnings.
Terminals	Unique terminal values across all rule bodies, split into string literals and regex patterns. See the note on uniqueness below.
Undefined refs	Rule names used in bodies but never defined — `check` flags these as warnings too.
Left-recursive	Rules involved in left-recursion, split into direct (`a → a …`) and mutual (`a → b …`, `b → a …`). Informational only — left recursion is idiomatic tree-sitter style, not a defect.
FIRST sets	Size statistics (min / max / average) of the FIRST set of each rule — the set of terminals that can open a derivation. A large max or high average suggests the grammar may have ambiguous alternatives.

Terminal uniqueness is measured by raw source text, not by what the lexer matches. 'x' and "x" are counted as two distinct literals even though they match the same character. The count reflects how many distinct token patterns the grammar author wrote, which is a useful proxy for lexer complexity.

Using `--summary` with `--json`

Combining --json and --summary adds a "summary" key to the JSON output alongside "diagnostics", making both machine-readable in a single pass:

ts-bnf-tool check --json --summary json.bnf | jq .summary.rules

The full "summary" object shape:

{
  "rules": 6,
  "leaf_rules": 2,
  "unreachable_rules": 0,
  "unique_literals": 8,
  "unique_patterns": 6,
  "undefined_refs": 0,
  "left_recursive_direct": 0,
  "left_recursive_mutual": 0,
  "first_sets": { "min": 1, "max": 7, "avg": 3.3 }
}

first_sets is null when the grammar has no productions.

`check` options

  --json      Emit output as a JSON object instead of plain text
  --summary   Append a grammar metrics block after diagnostics

Inspecting FIRST sets

ts-bnf-tool firsts prints the FIRST set of each rule — the set of terminals that can appear as the very first token of any string the rule can derive. This is useful for understanding LL(1) feasibility: if two alternatives in a choice(…) share a terminal, a single token of look-ahead cannot tell them apart.

ts-bnf-tool firsts json.bnf

array: '['
number: /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
object: '{'
pair: '"'
string: '"'
value: '"', '[', 'false', 'null', 'true', '{', /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/

Pass --json to get a JSON object instead, suitable for editor plugins or other tooling that consumes structured output:

ts-bnf-tool firsts --json json.bnf

{
  "array":  ["'['"],
  "number": ["/\\-?[0-9]+(\\.[0-9]+)?([eE][+-]?[0-9]+)?/"],
  "object": ["'{'"],
  "pair":   ["'\"'"],
  "string": ["'\"'"],
  "value":  ["'\"'", "'['", "'false'", "'null'", "'true'", "'{'", "/\\-?[0-9]+(\\.[0-9]+)?([eE][+-]?[0-9]+)?/"]
}

`firsts` options

  -n, --no-check   Skip static checks and suppress all warnings
  --json           Emit output as JSON instead of plain text

Previous: End-to-end workflow · Next: Formatting and refactoring