Simdjsont

simdjsont · API reference

High-level API.

This module provides convenience functions for:

  • validating JSON (Validate)
  • extracting values by JSON pointer (Extract)
  • encoding/decoding typed values via codecs (Codec, Decode, Encode)
  • decoding NDJSON / JSON-Lines streams (Ndjson)

Strings and bigstrings

Every consume/produce entry point comes in two flavours: one that works on OCaml strings, and a _bigstring sibling that works on Raw.buffer — the standard bigstring type (char, int8_unsigned_elt, c_layout) Bigarray.Array1.t, shared with Bigstringaf, Lwt_bytes and Core.Bigstring. The bigstring path avoids copying through a string: parsing is zero-copy when the buffer already has the padding bytes the parser requires (see Raw.ensure_padded), and encode_bigstring produces a parse-ready buffer that can be fed straight back into decode_bigstring.

For low-level access to the underlying simdjson parser and elements, see Raw.

Example: typed records

type user = { id : int; name : string; email : string option }

let user_codec =
  let open Simdjsont.Codec in
  Obj.field (fun id name email -> { id; name; email })
  |> Obj.mem "id" int ~enc:(fun u -> u.id)
  |> Obj.mem "name" string ~enc:(fun u -> u.name)
  |> Obj.opt_mem "email" string ~enc:(fun u -> u.email)
  |> Obj.finish

let decoded =
  Simdjsont.decode user_codec
    {|{"id":1,"name":"Ada","email":"ada@example.test"}|}

let encoded =
  Simdjsont.encode user_codec
    { id = 1; name = "Ada"; email = Some "ada@example.test" }

Choosing an API

Use Validate to check whether bytes are JSON, Extract to read a few values by JSON pointer, Decode / Encode for typed application data, Ndjson for streams of documents, Cbor for JSON-compatible CBOR, and Raw only when you need low-level simdjson access.

module Json : sig ... end

Dynamic JSON value representation used by this library.

module Codec : sig ... end

Codecs used to decode and encode typed OCaml values.

module Raw : sig ... end

Low-level bindings: parsers, elements, arrays, objects, JSON pointers, and streaming. Prefer the high-level modules unless you need parser lifetime control or byte offsets.

module Validate : sig ... end

JSON validity checks.

module Extract : sig ... end

Extract values from a JSON string using a JSON pointer.

module Decode : sig ... end

Codecs and decoding functions.

module Encode : sig ... end

Encoding using a codec.

NDJSON / JSON streams

Newline-delimited / concatenated JSON documents (NDJSON, JSON Lines), decoded lazily through a codec. This wraps the lower-level Raw.Stream. Note: this is not JSON-LD (W3C linked data).

let events = "1\n2\n3\n" in
Simdjsont.Ndjson.decode_string_seq Simdjsont.Codec.int events
|> Seq.iter (function
  | Ok n -> Printf.printf "event=%d\n" n
  | Error msg -> Printf.eprintf "bad event: %s\n" msg)
module Ndjson : sig ... end
val padding : int

Number of trailing padding bytes the parser requires; re-export of Raw.padding.

val create_bigstring : int -> Raw.buffer

Allocate a parse-ready bigstring with room for the given number of data bytes (padding is added automatically); re-export of Raw.create_buffer.

val bigstring_of_string : string -> Raw.buffer

Copy a string into a freshly padded bigstring; re-export of Raw.buffer_of_string.

val validate : string -> bool

validate json is a convenience alias for Validate.is_valid.

val validate_bigstring : Raw.buffer -> len:int -> bool

validate_bigstring buf ~len is a convenience alias for Validate.is_valid_bigstring.

val decode : 'a Codec.t -> string -> ('a, string) result

decode codec json is a convenience alias for Codec.decode_string.

val decode_bigstring : 
  'a Codec.t ->
  Raw.buffer ->
  len:int ->
  ('a, string) result

decode_bigstring codec buf ~len is a convenience alias for Codec.decode_bigstring.

val encode : 'a Codec.t -> 'a -> string

encode codec value is a convenience alias for Codec.encode_string.

val encode_bigstring : 'a Codec.t -> 'a -> Raw.buffer * int

encode_bigstring codec value is a convenience alias for Codec.encode_to_bigstring.

CBOR

CBOR (Concise Binary Object Representation, RFC 8949) support for the JSON-compatible subset of CBOR, using the same codec infrastructure as JSON. This means a record codec can be shared by JSON and CBOR:

type point = { x : int; y : int }

let point =
  let open Simdjsont.Codec in
  Obj.field (fun x y -> { x; y })
  |> Obj.mem "x" int ~enc:(fun p -> p.x)
  |> Obj.mem "y" int ~enc:(fun p -> p.y)
  |> Obj.finish

let bytes = Simdjsont.Cbor.encode_string point { x = 10; y = 20 }
let decoded = Simdjsont.Cbor.decode_string point bytes

Supported CBOR types:

  • Integers (major types 0, 1) including 64-bit
  • Floats (major type 7) including half-precision (16-bit) and single/double precision
  • Byte strings (major type 2), represented as OCaml strings because the public JSON-compatible model has no separate bytes constructor
  • Text strings (major type 3) including indefinite-length
  • Arrays (major type 4) including indefinite-length
  • Maps (major type 5) with text-string or byte-string keys, including indefinite-length maps
  • Tags (major type 6), decoded by ignoring the tag number and decoding the tagged value
  • Simple values: true, false, null, and undefined (decoded as null) Not supported as distinct semantic values:
  • Semantic interpretation of tags such as dates, URIs, encoded CBOR, or bignums; tags are currently transparent wrappers
  • Integer map keys
  • Arbitrary-precision integers beyond 64-bit integer codecs
  • CBOR simple values other than booleans, null, and undefined
module Cbor : sig ... end