Skip to the content.

End-to-end: from .bnf to a working parser

Step 1 — write the grammar

Save your grammar to a .bnf file, say json.bnf:

%extras /\s/

value   -> object | array | string | number | 'true' | 'false' | 'null' ;
object  -> '{' (pair (',' pair)*)? '}' ;
pair    -> key: string ':' val: value ;
array   -> '[' (value (',' value)*)? ']' ;
string  -> << '"' /([^"\\]|\\.)*/ '"' >> ;
number  -> /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/ ;

Step 2 — preview the output

Use --rules-only for a quick look at just the rule bodies without the boilerplate wrapper — handy when iterating:

ts-bnf-tool convert --rules-only json.bnf
value  -> choice($.object, $.array, $.string, $.number, 'true', 'false', 'null')
object -> seq('{', optional(seq($.pair, repeat(seq(',', $.pair)))), '}')
pair   -> seq(field('key', $.string), ':', field('val', $.value))
array  -> seq('[', optional(seq($.value, repeat(seq(',', $.value)))), ']')
string -> token(seq('"', /([^"\\]|\\.)*/, '"'))
number -> /\-?[0-9]+(\.[0-9]+)?([eE][+-]?[0-9]+)?/

Step 3 — generate a full grammar.js

Without any extra flags, ts-bnf-tool convert prints a complete grammar.js to stdout. Redirect it to a file:

ts-bnf-tool convert json.bnf > grammar.js

You can also read from stdin by passing - as the filename, which is useful in pipelines:

cat json.bnf | ts-bnf-tool convert - > grammar.js

When reading from stdin the grammar name defaults to grammar; use --name to set a specific name:

cat json.bnf | ts-bnf-tool convert --name json - > grammar.js

Step 4 — generate a ready-to-use tree-sitter project

--generate writes grammar.js and a skeleton queries/highlights.scm to a directory, then runs tree-sitter generate to produce the C parser:

ts-bnf-tool convert --generate json.bnf
# creates ./json/grammar.js, ./json/queries/highlights.scm, and ./json/src/parser.c

Override the output directory and grammar name:

ts-bnf-tool convert --generate --output-dir ~/parsers/json --name json json.bnf

The resulting directory is a complete tree-sitter language package, ready for tree-sitter parse, editor integration, or publishing as an npm package.

convert options

  --name <NAME>          Grammar name (default: filename stem)
  --rules-only           Print rule bodies only, without grammar.js boilerplate
  --generate             Write grammar.js to a directory and run tree-sitter generate
  --output-dir <DIR>     Output directory for --generate (default: ./<name>)
  --no-header            Suppress the generated-file comment at the top of grammar.js
  -n, --no-check         Skip static checks; suppress all warnings and convert unconditionally
  --strict               Treat warnings as errors (conflicts with --no-check)

Step 5 — refine the highlights skeleton

The generated queries/highlights.scm is a starting point based on naming conventions. Open it and replace every ; TODO: @??? line with the appropriate capture name, or delete it if the rule does not need highlighting.

You can also generate or regenerate the skeleton at any time with the highlights subcommand:

ts-bnf-tool highlights json.bnf -o queries/highlights.scm

Use --no-todos to emit only the rules that were automatically classified, leaving the unknowns out entirely:

ts-bnf-tool highlights --no-todos json.bnf

Example output for the JSON grammar:

; Generated by ts-bnf-tool v0.3.0 — edit as needed.
(string) @string
(number) @number
(line_comment) @comment

Rules whose bodies contain no terminals (purely structural rules) are omitted. Recognised rules get a capture name based on their name; unrecognised rules get a ; TODO: @??? placeholder for human review. The heuristics applied are:

Rule name pattern Capture
comment, *_comment @comment
string, char, *_string, string_* @string
number, integer, float, *_literal @number
keyword_*, common keyword names (if, else, return, …) @keyword
operator, *_op, *_operator @operator
identifier, name, *_identifier, *_name @variable
boolean @boolean
null, nil, none, undefined, void @constant.builtin

highlights options

  -o <FILE>    Write output to this file instead of stdout
  --no-todos   Suppress `; TODO: @???` placeholder entries

Previous: Cheat sheet · Next: Analysing a grammar