End-to-end: from .bnf to a working parser
Step 1 — write the grammar
Save your grammar to a .bnf file, say json.bnf:
%extras /\s/
value -> object | array | string | number | 'true' | 'false' | 'null' ;
object -> '{' (pair (',' pair)*)? '}' ;
pair -> key: string ':' val: value ;
array -> '[' (value (',' value)*)? ']' ;
string -> << '"' /([^"\\]|\\.)*/ '"' >> ;
number -> /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/ ;
Step 2 — preview the output
Use --rules-only for a quick look at just the rule bodies without the
boilerplate wrapper — handy when iterating:
ts-bnf-tool convert --rules-only json.bnf
value -> choice($.object, $.array, $.string, $.number, 'true', 'false', 'null')
object -> seq('{', optional(seq($.pair, repeat(seq(',', $.pair)))), '}')
pair -> seq(field('key', $.string), ':', field('val', $.value))
array -> seq('[', optional(seq($.value, repeat(seq(',', $.value)))), ']')
string -> token(seq('"', /([^"\\]|\\.)*/, '"'))
number -> /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/
Step 3 — generate a full grammar.js
Without any extra flags, ts-bnf-tool convert prints a complete grammar.js
to stdout. Redirect it to a file:
ts-bnf-tool convert json.bnf > grammar.js
You can also read from stdin by passing - as the filename, which is useful
in pipelines:
cat json.bnf | ts-bnf-tool convert - > grammar.js
When reading from stdin the grammar name defaults to grammar; use --name
to set a specific name:
cat json.bnf | ts-bnf-tool convert --name json - > grammar.js
Step 4 — generate a ready-to-use tree-sitter project
--generate writes grammar.js and a skeleton queries/highlights.scm to a
directory, then runs tree-sitter generate to produce the C parser:
ts-bnf-tool convert --generate json.bnf
# creates ./json/grammar.js, ./json/queries/highlights.scm, and ./json/src/parser.c
Override the output directory and grammar name:
ts-bnf-tool convert --generate --output-dir ~/parsers/json --name json json.bnf
The resulting directory is a complete tree-sitter language package, ready for
tree-sitter parse, editor integration, or publishing as an npm package.
convert options
--name <NAME> Grammar name (default: filename stem)
--rules-only Print rule bodies only, without grammar.js boilerplate
--generate Write grammar.js to a directory and run tree-sitter generate
--output-dir <DIR> Output directory for --generate (default: ./<name>)
--no-header Suppress the generated-file comment at the top of grammar.js
-n, --no-check Skip static checks; suppress all warnings and convert unconditionally
--strict Treat warnings as errors (conflicts with --no-check)
Step 5 — refine the highlights skeleton
The generated queries/highlights.scm is a starting point based on naming
conventions. Open it and replace every ; TODO: @??? line with the appropriate
capture name, or delete it if the rule does not need highlighting.
You can also generate or regenerate the skeleton at any time with the
highlights subcommand:
ts-bnf-tool highlights json.bnf -o queries/highlights.scm
Use --no-todos to emit only the rules that were automatically classified,
leaving the unknowns out entirely:
ts-bnf-tool highlights --no-todos json.bnf
Example output for the JSON grammar:
; Generated by ts-bnf-tool v0.3.0 — edit as needed.
(string) @string
(number) @number
(line_comment) @comment
Rules whose bodies contain no terminals (purely structural rules) are omitted.
Recognised rules get a capture name based on their name; unrecognised rules
get a ; TODO: @??? placeholder for human review. The heuristics applied are:
| Rule name pattern | Capture |
|---|---|
comment, *_comment |
@comment |
string, char, *_string, string_* |
@string |
number, integer, float, *_literal |
@number |
keyword_*, common keyword names (if, else, return, …) |
@keyword |
operator, *_op, *_operator |
@operator |
identifier, name, *_identifier, *_name |
@variable |
boolean |
@boolean |
null, nil, none, undefined, void |
@constant.builtin |
highlights options
-o <FILE> Write output to this file instead of stdout
--no-todos Suppress `; TODO: @???` placeholder entries
Previous: Cheat sheet · Next: Analysing a grammar