Analysing a grammar
Checking for issues
ts-bnf-tool check runs all static checks on a grammar file and exits with a
non-zero status if any issue is found. This makes it easy to wire into a CI
pipeline:
ts-bnf-tool check json.bnf
echo $? # 0 if clean, 1 if warnings only, 2 if any errors
Checks performed:
| Check | Severity | Example diagnostic |
|---|---|---|
| Undefined rule references | warning | warning: undefined rule reference 'foo' |
Undefined %axiom rule |
error | error: %axiom references undefined rule 'foo' (line 1) |
Duplicate %axiom |
error | error: %axiom declared more than once (line 2) |
Undefined %conflicts rules |
warning | warning: %conflicts references undefined rule 'foo' |
Undefined %inline rules |
warning | warning: %inline references undefined rule 'foo' |
Undefined %supertypes rules |
warning | warning: %supertypes references undefined rule 'foo' |
Undefined %extras rules |
warning | warning: %extras references undefined rule 'foo' |
| Unreferenced rule | warning | warning: rule 'foo' is never referenced (line 4) |
Pass --json to get diagnostics as a JSON object on stdout instead of plain
text on stderr. Exit codes are not affected:
ts-bnf-tool check --json json.bnf
{"diagnostics":[{"severity":"warning","message":"rule 'unused' is never referenced (line 3)"}]}
Left-recursion
Left-recursive rules are not flagged by check. Tree-sitter is a GLR
parser generator: left recursion is fully supported and is the idiomatic
style for binary and postfix expression rules.
# OK — directly left-recursive, idiomatic for binary operators
expr -> expr '+' term | term ;
Left recursion is still a grammar property worth knowing about — for
instance, a left-recursive rule may need a %prec annotation or a
%conflicts entry to resolve ambiguity. The check --summary block
reports how many rules are directly or mutually left-recursive (see
Summarising grammar shape below).
What actually makes tree-sitter generate fail is unresolved ambiguity
— for example expr -> expr '+' expr | 'n' with no precedence annotation.
Ahead-of-time detection of such conflicts is planned separately
(#31).
Unreferenced rules
A rule that is defined but never referenced by any other rule (and is not the
root rule) is reported as a warning. The root is either the rule named by
%axiom, or — when %axiom is absent — the first-declared rule:
root -> item+ ;
item -> /[a-z]+/ ;
unused -> 'x' ; # never referenced
warning: rule 'unused' is never referenced (line 3)
Summarising grammar shape
check --summary appends a compact metrics block to stdout after the run.
Diagnostics still go to stderr, so the two streams can be captured independently
in shell pipelines.
ts-bnf-tool check --summary json.bnf
Rules 6 (leaf: 2, unreachable: 0)
Terminals 12 (literals: 10, patterns: 2, unique values)
Undefined refs 0
Left-recursive 0 (direct: 0, mutual: 0)
FIRST sets min 1 max 7 avg 2
Each row measures a different aspect of the grammar:
| Row | What it tells you |
|---|---|
| Rules | Total named productions. leaf = rules whose body contains no rule references (only terminals). unreachable = rules never reached from the root, which check also flags as warnings. |
| Terminals | Unique terminal values across all rule bodies, split into string literals and regex patterns. See the note on uniqueness below. |
| Undefined refs | Rule names used in bodies but never defined — check flags these as warnings too. |
| Left-recursive | Rules involved in left-recursion, split into direct (a → a …) and mutual (a → b …, b → a …). Informational only — left recursion is idiomatic tree-sitter style, not a defect. |
| FIRST sets | Size statistics (min / max / average) of the FIRST set of each rule — the set of terminals that can open a derivation. A large max or high average suggests the grammar may have ambiguous alternatives. |
Terminal uniqueness is measured by raw source text, not by what the lexer matches.
'x'and"x"are counted as two distinct literals even though they match the same character. The count reflects how many distinct token patterns the grammar author wrote, which is a useful proxy for lexer complexity.
Using --summary with --json
Combining --json and --summary adds a "summary" key to the JSON output
alongside "diagnostics", making both machine-readable in a single pass:
ts-bnf-tool check --json --summary json.bnf | jq .summary.rules
The full "summary" object shape:
{
"rules": 6,
"leaf_rules": 2,
"unreachable_rules": 0,
"unique_literals": 8,
"unique_patterns": 6,
"undefined_refs": 0,
"left_recursive_direct": 0,
"left_recursive_mutual": 0,
"first_sets": { "min": 1, "max": 7, "avg": 3.3 }
}
first_sets is null when the grammar has no productions.
check options
--json Emit output as a JSON object instead of plain text
--summary Append a grammar metrics block after diagnostics
Inspecting FIRST sets
ts-bnf-tool firsts prints the FIRST set of each rule — the set of terminals
that can appear as the very first token of any string the rule can derive. This
is useful for understanding LL(1) feasibility: if two alternatives in a
choice(…) share a terminal, a single token of look-ahead cannot tell them
apart.
ts-bnf-tool firsts json.bnf
array: '['
number: /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
object: '{'
pair: '"'
string: '"'
value: '"', '[', 'false', 'null', 'true', '{', /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
Pass --json to get a JSON object instead, suitable for editor plugins or
other tooling that consumes structured output:
ts-bnf-tool firsts --json json.bnf
{
"array": ["'['"],
"number": ["/\\-?[0-9]+(\\.[0-9]+)?([eE][+-]?[0-9]+)?/"],
"object": ["'{'"],
"pair": ["'\"'"],
"string": ["'\"'"],
"value": ["'\"'", "'['", "'false'", "'null'", "'true'", "'{'", "/\\-?[0-9]+(\\.[0-9]+)?([eE][+-]?[0-9]+)?/"]
}
firsts options
-n, --no-check Skip static checks and suppress all warnings
--json Emit output as JSON instead of plain text
Previous: End-to-end workflow · Next: Formatting and refactoring