simdjsont — JSON at SIMD speed
simdjsont binds simdjson — the SIMD-accelerated parser — and puts a typed codec layer on top: validate, extract by JSON pointer, decode into your own types, encode back, and stream NDJSON, with a zero-copy path for hot loops. Every snippet below is cut from two programs under examples/ that run in the test suite.
Codecs describe both directions
(* A codec describes both directions at once: how to decode a JSON
object into a [user] and (via [~enc]) how to encode one back. *)
let user_codec =
let open Simdjsont.Decode in
Obj.field (fun id name active -> { id; name; active })
|> Obj.mem "id" int ~enc:(fun u -> u.id)
|> Obj.mem "name" string ~enc:(fun u -> u.name)
|> Obj.mem "active" bool ~enc:(fun u -> u.active)
|> Obj.finishA codec is one value that both decodes and encodes. Obj.field names the constructor; each Obj.mem declares a member with its type and, via ~enc, how to read it back out of the record; Obj.finish seals it. Field names, order and types live in one place, so a JSON contract cannot drift from its decoder.
Validate, extract, decode, encode
Four operations, all on the same document, in ascending cost.
Validation
(* [check] confirms a document parses, returning the parser's own
diagnostic on failure — cheaper than a full decode. *)
(match Simdjsont.Validate.check json with
| Ok () -> Printf.printf "valid: %s\n" json
| Error e -> die ("validate: " ^ e));
(match Simdjsont.Validate.check "{broken" with
| Ok () -> die "expected \"{broken\" to be rejected"
| Error e -> Printf.printf "rejected \"{broken\": %s\n" e);check confirms a document parses and returns simdjson's own diagnostic on failure — cheaper than decoding when you only need a yes/no. is_valid is the bool-returning variant.
Pointer extraction
(* Pull a single value out by JSON pointer (RFC 6901) without
decoding the whole document — handy to route on a discriminator. *)
(match Simdjsont.Extract.string json ~pointer:"/name" with
| Ok name -> Printf.printf "pointer /name -> %s\n" name
| Error e -> die ("extract: " ^ e));Extract pulls one value out by JSON pointer (RFC 6901) without materializing the rest — useful to read a discriminator field and decide which full codec to run. Extract has string, int, float, bool, is_null and a codec-typed at.
Decode and encode
(* Decode parses straight into the record; the same codec encodes a
value back to a JSON string. *)
let alice =
match Simdjsont.Decode.decode_string user_codec json with
| Ok u ->
Printf.printf "decoded: id=%d name=%s active=%b\n" u.id u.name u.active;
u
| Error e -> die ("decode: " ^ e)
in
let off_duty = { alice with active = false } in
Printf.printf "encoded: %s\n" (Simdjsont.Encode.to_string user_codec off_duty);decode_string parses into the record; Encode.to_string serializes a value back. Both sides come from the one codec, so a round trip is guaranteed to line up.
The combinator vocabulary
Codecs nest, so a schema of any shape is a value built from smaller values: null, bool, int, float, string, list, array, optional, map, and the Obj builder with mem and opt_mem.
Lists, nesting, optional members
(* Codecs nest: an [address] codec becomes a member of [person]. opt_mem
handles a member that may be absent; list builds a homogeneous array;
the whole thing is still one value describing both directions. *)
let address_codec =
let open Simdjsont.Decode in
Obj.field (fun city zip -> { city; zip })
|> Obj.mem "city" string ~enc:(fun a -> a.city)
|> Obj.mem "zip" string ~enc:(fun a -> a.zip)
|> Obj.finish
let person_codec =
let open Simdjsont.Decode in
Obj.field (fun name age address tags -> { name; age; address; tags })
|> Obj.mem "name" string ~enc:(fun p -> p.name)
|> Obj.mem "age" int ~enc:(fun p -> p.age)
|> Obj.opt_mem "address" address_codec ~enc:(fun p -> p.address)
|> Obj.mem "tags" (list string) ~enc:(fun p -> p.tags)
|> Obj.finishAn address codec becomes a member of person; list string decodes the tags array; opt_mem maps a member that may be absent to an option. map adapts a codec through a pair of functions when the JSON and OCaml shapes differ.
Missing members decode cleanly
(match Simdjsont.Decode.decode_string person_codec json with
| Ok p ->
Printf.printf "%s, %d, %d tags, address: %s\n" p.name p.age
(List.length p.tags)
(match p.address with Some a -> a.city | None -> "(none)")
| Error e -> die ("decode: " ^ e));
(* A document that omits the optional member still decodes. *)
(match
Simdjsont.Decode.decode_string person_codec
{|{"name":"Bo","age":20,"tags":[]}|}
with
| Ok p -> Printf.printf "%s has no address: %b\n" p.name (p.address = None)
| Error e -> die ("decode: " ^ e));$ dune exec examples/simdjsont_nested/main.exeStreaming and zero-copy
Two paths matter under load: many small documents, and large ones you decode once.
NDJSON streams
(* Decode newline-delimited JSON into a lazy sequence of typed
results — one Error per bad line, not a failed batch. *)
let ndjson =
{|{"id": 2, "name": "Bob", "active": true}
{"id": 3, "name": "Carol", "active": false}
|}
in
print_endline "ndjson stream:";
Simdjsont.Ndjson.decode_string_seq user_codec ndjson
|> Seq.iter (function
| Ok u -> Printf.printf " user #%d %s (active=%b)\n" u.id u.name u.active
| Error e -> die ("ndjson: " ^ e))decode_string_seq turns newline-delimited JSON into a lazy Seq of typed results — one Error element per bad line, never a failed batch. It fits log pipelines and bulk imports where the consumer streams.
Bigstring, no string round-trip
(* On a hot path, decode straight from the buffer the socket filled:
no intermediate OCaml string is allocated. *)
let buf = Simdjsont.bigstring_of_string json in
(match
Simdjsont.decode_bigstring person_codec buf ~len:(String.length json)
with
| Ok p -> Printf.printf "zero-copy decode: %s\n" p.name
| Error e -> die ("bigstring: " ^ e))decode_bigstring works directly on the buffer the socket filled — no intermediate OCaml string is allocated, which is what an araara JSON endpoint uses to decode a request body on a hot path. Encode has the matching to_bigstring. The same codec layer also handles CBOR.
At the boundary
simdjsont sits at araara's edge: an API module decodes a request body into a trusted domain type or maps the error to a 400, and the context behind it never sees raw JSON. The web/API parity convention pairs each JSON endpoint with an HTML one over the same context.