Simdjsont.Raw

simdjsont · API reference

Low-level bindings: parsers, elements, arrays, objects, JSON pointers, and streaming. Prefer the high-level modules unless you need parser lifetime control or byte offsets.

Low-level bindings to simdjson.

Prefer Simdjsont.Decode, Simdjsont.Extract, and Simdjsont.Ndjson for application code. This module is for callers that need direct access to simdjson's parsed document views, object/array iteration, byte offsets, or streaming state.

Lifetime: Elements, arrays, objects, and iterators reference data owned by the parser. They are only valid while the parser is alive and has not been reused for another parse. Do not store an element, array_, or object_ beyond that lifetime.

Example: parse and access a field

let parser = create_parser () in
match parse_string parser json with
| Error e -> prerr_endline e.message
| Ok root -> (
    match at_pointer root "/user/name" with
    | Error e -> prerr_endline e.message
    | Ok name -> print_endline (string_exn name))

Example: iterate arrays and objects

  let parser = create_parser () in
  match parse_string parser json with
  | Error e -> ...
  | Ok root ->
      let users = array_exn (Result.get_ok (at_pointer root "/users")) in
      array_iter
        (fun user ->
          let obj = object_exn user in
          match object_find obj "name" with
          | Ok name -> print_endline (string_exn name)
          | Error _ -> ())
        users

Example: stream concatenated JSON documents

let parser = create_parser () in
let input = buffer_of_string "1\n2\n3\n" in
match Stream.create parser input ~len:6 with
| Error e -> prerr_endline e.message
| Ok stream ->
    Stream.to_seq stream
    |> Seq.iter (function
      | Ok (elt, offset) ->
          Printf.printf "doc at %d: %s\n" offset (element_to_string elt)
      | Error (e, offset) ->
          Printf.eprintf "error at %d: %s\n" offset e.message)
type parser

Parser instance used by the underlying simdjson library.

type element

JSON value obtained from parsing.

See the lifetime note at the top of this module.

type array_

A JSON array view. It aliases parser-owned storage; see the module lifetime note.

type object_

A JSON object view. It aliases parser-owned storage; see the module lifetime note.

type array_iter

Low-level iterator over an array. Most callers should use array_iter, array_fold, array_to_seq, or array_to_list.

type object_iter

Low-level iterator over an object. Most callers should use object_iter, object_fold, object_to_seq, or object_to_list.

type buffer =
  (char, Bigarray.int8_unsigned_elt, Bigarray.c_layout) Bigarray.Array1.t

Input buffer used for parsing.

The constant padding describes the number of extra bytes required by the underlying parser.

type element_type = 
  | Array (* JSON array. *)
  | Object (* JSON object. *)
  | Int64 (* Signed integer. *)
  | Uint64 (* Unsigned integer. *)
  | Double (* Floating-point number. *)
  | String (* JSON string. *)
  | Bool (* JSON boolean. *)
  | Null (* JSON null. *)
type error_code = 
  | Success
  | Capacity
  | Memalloc
  | Tape_error
  | Depth_error
  | String_error
  | T_atom_error
  | F_atom_error
  | N_atom_error
  | Number_error
  | Bigint_error
  | Utf8_error
  | Uninitialized
  | Empty
  | Unescaped_chars
  | Unclosed_string
  | Unsupported_architecture
  | Incorrect_type
  | Number_out_of_range
  | Index_out_of_bounds
  | No_such_field
  | Io_error
  | Invalid_json_pointer
  | Invalid_uri_fragment
  | Unexpected_error
type error = {
  code : error_code;
  message : string;
}

Error value returned by parsing and accessor functions.

code is the structured simdjson error code. message is a human-readable diagnostic suitable for logs or Error results.

exception Parse_error of error

Exception raised by _exn accessors.

val padding : int

Number of padding bytes required at the end of the input buffer.

val create_buffer : int -> buffer

create_buffer len creates a bigstring with room for len data bytes plus the trailing parser padding. The returned buffer_length includes the padding, so pass the original data length as ~len when parsing.

val buffer_of_string : string -> buffer

buffer_of_string s copies s into a newly allocated padded buffer.

This is convenient and safe for parsing string input. If your data is already in a bigstring, use ensure_padded to avoid a copy when possible.

val buffer_length : buffer -> int

buffer_length buf returns the bigstring capacity, including any parser padding. It is not necessarily the logical JSON document length.

val buffer_blit_string : 
  string ->
  src_pos:int ->
  buffer ->
  dst_pos:int ->
  len:int ->
  unit

buffer_blit_string s ~src_pos buf ~dst_pos ~len copies len bytes from s into buf. This is useful when filling a padded buffer incrementally from a larger input source.

val ensure_padded : buffer -> len:int -> buffer

ensure_padded buf ~len returns a buffer holding the first len bytes of buf with at least padding bytes of trailing slack, as required by the parser. If buf already has the capacity it is returned unchanged (zero-copy); otherwise the data is copied into a freshly padded buffer.

val error_message : error_code -> string

Convert an error_code to a message string.

val create_parser : unit -> parser

create_parser () creates a parser/document owner.

A parser can be reused for multiple parses, but doing so invalidates any elements, arrays, or objects returned by previous parses.

val free_parser : parser -> unit

free_parser parser releases parser resources eagerly.

The OCaml GC also owns parser values, so most code does not need to call this manually. Never use elements obtained from parser after freeing it.

val parse : parser -> buffer -> len:int -> (element, error) result

parse parser buf ~len parses the first len bytes from buf. buf must have the trailing parser padding; use ensure_padded if that is not known.

val parse_string : parser -> string -> (element, error) result

parse_string parser json parses json. It is the easiest raw entry point, but it copies through simdjson's string path; use parse for bigstring input.

val parse_file : parser -> string -> (element, error) result

parse_file parser path parses a JSON document from path.

val element_type : element -> element_type

element_type elt returns the runtime JSON type of elt. Use it before selecting a typed accessor when the input shape is dynamic.

val get_bool : element -> (bool, error) result

Access a boolean value.

val get_int64 : element -> (int64, error) result

Access an int64 value.

val get_uint64 : element -> (Unsigned.UInt64.t, error) result

Access an unsigned 64-bit integer value.

val get_double : element -> (float, error) result

Access a floating-point value.

val get_string : element -> (string, error) result

Access a string value.

val get_array : element -> (array_, error) result

Access an array value.

val get_object : element -> (object_, error) result

Access an object value.

val bool_exn : element -> bool

Like get_bool, but raises Parse_error on error.

val int64_exn : element -> int64

Like get_int64, but raises Parse_error on error.

val uint64_exn : element -> Unsigned.UInt64.t

Like get_uint64, but raises Parse_error on error.

val float_exn : element -> float

Like get_double, but raises Parse_error on error.

val string_exn : element -> string

Like get_string, but raises Parse_error on error.

val array_exn : element -> array_

Like get_array, but raises Parse_error on error.

val object_exn : element -> object_

Like get_object, but raises Parse_error on error.

val array_length : array_ -> int

Return the length of an array.

val array_to_seq : array_ -> element Seq.t

Convert an array to a sequence.

val array_to_list : array_ -> element list

Convert an array to a list.

val array_iter : (element -> unit) -> array_ -> unit

Iterate over an array.

val array_fold : ('a -> element -> 'a) -> 'a -> array_ -> 'a

Fold over an array.

val object_length : object_ -> int

Return the number of members in an object.

val object_find : object_ -> string -> (element, error) result

Find a member by key.

val object_find_opt : object_ -> string -> element option

Find a member by key, returning None if missing.

val object_to_seq : object_ -> (string * element) Seq.t

Convert an object to a sequence of key/value pairs.

val object_to_list : object_ -> (string * element) list

Convert an object to a list of key/value pairs.

val object_iter : (string -> element -> unit) -> object_ -> unit

Iterate over an object.

val object_fold : ('a -> string -> element -> 'a) -> 'a -> object_ -> 'a

Fold over an object.

val at_pointer : element -> string -> (element, error) result

at_pointer root pointer navigates within root using JSON Pointer syntax (RFC 6901), for example "/users/0/name". The returned element aliases the same parser-owned storage as root.

val element_to_string : element -> string

element_to_string elt serializes an element back to compact JSON. This is useful for diagnostics or for handing a sub-tree to an API that expects a JSON string.

val format_double : float -> string

Format a float as a JSON number.

module Stream : sig ... end

Parse multiple JSON documents from a buffer.