FAIR and interactive data graphics from a scientific knowledge graph
Contents
This is a comparison of data serialization formats, various ways to convert complex objects to sequences of bits. It does not include markup languages used exclusively as document file formats.
Overview
Name | Creator-maintainer | Based on | Standardized?[definition needed] | Specification | Binary? | Human-readable? | Supports references?e | Schema-IDL? | Standard APIs | Supports zero-copy operations |
---|---|---|---|---|---|---|---|---|---|---|
Apache Avro | Apache Software Foundation | — | No | Apache Avro™ Specification | Yes | Partialg | — | Built-in | C, C#, C++, Java, PHP, Python, Ruby | — |
Apache Parquet | Apache Software Foundation | — | No | Apache Parquet | Yes | No | No | — | Java, Python, C++ | No |
Apache Thrift | Facebook (creator) Apache (maintainer) |
— | No | Original whitepaper | Yes | Partialc | No | Built-in | C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml, Delphi and other languages[1] | — |
ASN.1 | ISO, IEC, ITU-T | — | Yes | ISO/IEC 8824 / ITU-T X.680 (syntax) and ISO/IEC 8825 / ITU-T X.690 (encoding rules) series. X.680, X.681, and X.683 define syntax and semantics. | BER, DER, PER, OER, or custom via ECN | XER, JER, GSER, or custom via ECN | Yesf | Built-in | — | OER |
Bencode | Bram Cohen (creator) BitTorrent, Inc. (maintainer) |
— | De facto as BEP | Part of BitTorrent protocol specification | Except numbers and delimiters, being ASCII | No | No | No | No | No |
BSON | MongoDB | JSON | No | BSON Specification | Yes | No | No | No | No | No |
Cap'n Proto | Kenton Varda | — | No | Cap'n Proto Encoding Spec | Yes | Partialh | No | Yes | No | Yes |
CBOR | Carsten Bormann, P. Hoffman | MessagePack[2] | Yes | RFC 8949 | Yes | No | Yes, through tagging |
CDDL | FIDO2 | No |
Comma-separated values (CSV) | RFC author: Yakov Shafranovich |
— | Myriad informal variants | RFC 4180 (among others) |
No | Yes | No | No | No | No |
Common Data Representation (CDR) | Object Management Group | — | Yes | General Inter-ORB Protocol | Yes | No | Yes | Yes | Ada, C, C++, Java, Cobol, Lisp, Python, Ruby, Smalltalk | — |
D-Bus Message Protocol | freedesktop.org | — | Yes | D-Bus Specification | Yes | No | No | Partial (Signature strings) |
Yes | — |
Efficient XML Interchange (EXI) | W3C | XML, Efficient XML | Yes | Efficient XML Interchange (EXI) Format 1.0 | Yes | XML | XPointer, XPath | XML Schema | DOM, SAX, StAX, XQuery, XPath | — |
Extensible Data Notation (edn) | Rich Hickey / Clojure community | Clojure | Yes | Official edn spec | No | Yes | No | No | Clojure, Ruby, Go, C++, Javascript, Java, CLR, ObjC, Python[3] | No |
FlatBuffers | — | No | Flatbuffers GitHub | Yes | Apache Arrow | Partial (internal to the buffer) |
Yes | C++, Java, C#, Go, Python, Rust, JavaScript, PHP, C, Dart, Lua, TypeScript | Yes | |
Fast Infoset | ISO, IEC, ITU-T | XML | Yes | ITU-T X.891 and ISO/IEC 24824-1:2007 | Yes | No | XPointer, XPath | XML schema | DOM, SAX, XQuery, XPath | — |
FHIR | Health Level 7 | REST basics | Yes | Fast Healthcare Interoperability Resources | Yes | Yes | Yes | Yes | Hapi for FHIR[4] JSON, XML, Turtle | No |
Ion | Amazon | JSON | No | The Amazon Ion Specification | Yes | Yes | No | Ion schema | C, C#, Go, Java, JavaScript, Python, Rust | — |
Java serialization | Oracle Corporation | — | Yes | Java Object Serialization | Yes | No | Yes | No | Yes | — |
JSON | Douglas Crockford | JavaScript syntax | Yes | STD 90/RFC 8259 (ancillary: RFC 6901, RFC 6902), ECMA-404, ISO/IEC 21778:2017 |
No, but see BSON, Smile, UBJSON | Yes | JSON Pointer (RFC 6901), or alternately, JSONPath, JPath, JSPON, json:select(); and JSON-LD | Partial (JSON Schema Proposal, ASN.1 with JER, Kwalify Archived 2021-08-12 at the Wayback Machine, Rx, JSON-LD |
Partial (Clarinet, JSONQuery / RQL, JSONPath), JSON-LD |
No |
MessagePack | Sadayuki Furuhashi | JSON (loosely) | No | MessagePack format specification | Yes | No | No | No | No | Yes |
Netstrings | Dan Bernstein | — | No | netstrings.txt | Except ASCII delimiters | Yes | No | No | No | Yes |
OGDL | Rolf Veen | ? | No | Specification | Binary specification | Yes | Path specification | Schema WD | — | |
OPC-UA Binary | OPC Foundation | — | No | opcfoundation.org | Yes | No | Yes | No | No | — |
OpenDDL | Eric Lengyel | C, PHP | No | OpenDDL.org | No | Yes | Yes | No | OpenDDL library | — |
PHP serialization format | PHP Group | — | Yes | No | Yes | Yes | Yes | No | Yes | — |
Pickle (Python) | Guido van Rossum | Python | De facto as PEPs | PEP 3154 – Pickle protocol version 4 | Yes | No | Yes[5] | No | Yes | No |
Property list | NeXT (creator) Apple (maintainer) |
? | Partial | Public DTD for XML format | Yesa | Yesb | No | ? | Cocoa, CoreFoundation, OpenStep, GnuStep | No |
Protocol Buffers (protobuf) | — | No | Developer Guide: Encoding, proto2 specification, and proto3 specification | Yes | Yesd | No | Built-in | C++, Java, C#, Python, Go, Ruby, Objective-C, C, Dart, Perl, PHP, R, Rust, Scala, Swift, Julia, Erlang, D, Haskell, ActionScript, Delphi, Elixir, Elm, Erlang, GopherJS, Haskell, Haxe, JavaScript, Kotlin, Lua, Matlab, Mercurt, OCaml, Prolog, Solidity, Typescript, Vala, Visual Basic | No | |
S-expressions | John McCarthy (original) Ron Rivest (internet draft) |
Lisp, Netstrings | Largely de facto | "S-Expressions" Archived 2013-10-07 at the Wayback Machine Internet Draft | Yes, canonical representation | Yes, advanced transport representation | No | No | — | |
Smile | Tatu Saloranta | JSON | No | Smile Format Specification | Yes | No | Yes | Partial (JSON Schema Proposal, other JSON schemas/IDLs) |
Partial (via JSON APIs implemented with Smile backend, on Jackson, Python) |
— |
SOAP | W3C | XML | Yes | W3C Recommendations: SOAP/1.1 SOAP/1.2 |
Partial (Efficient XML Interchange, Binary XML, Fast Infoset, MTOM, XSD base64 data) |
Yes | Built-in id/ref, XPointer, XPath | WSDL, XML schema | DOM, SAX, XQuery, XPath | — |
Structured Data eXchange Formats | Max Wildgrube | — | Yes | RFC 3072 | Yes | No | No | No | — | |
UBJSON | The Buzz Media, LLC | JSON, BSON | No | ubjson.org | Yes | No | No | No | No | — |
eXternal Data Representation (XDR) | Sun Microsystems (creator) IETF (maintainer) |
— | Yes | STD 67/RFC 4506 | Yes | No | Yes | Yes | Yes | — |
XML | W3C | SGML | Yes | W3C Recommendations: 1.0 (Fifth Edition) 1.1 (Second Edition) |
Partial (Efficient XML Interchange, Binary XML, Fast Infoset, XSD base64 data) |
Yes | XPointer, XPath | XML schema, RELAX NG | DOM, SAX, XQuery, XPath | — |
XML-RPC | Dave Winer[6] | XML | No | XML-RPC Specification | No | Yes | No | No | No | No |
YAML | Clark Evans, Ingy döt Net, and Oren Ben-Kiki |
C, Java, Perl, Python, Ruby, Email, HTML, MIME, URI, XML, SAX, SOAP, JSON[7] | No | Version 1.2 | No | Yes | Yes | Partial (Kwalify Archived 2021-08-12 at the Wayback Machine, Rx, built-in language type-defs) |
No | No |
Name | Creator-maintainer | Based on | Standardized? | Specification | Binary? | Human-readable? | Supports references?e | Schema-IDL? | Standard APIs | Supports zero-copy operations |
- ^ The current default format is binary.
- ^ The "classic" format is plain text, and an XML format is also supported.
- ^ Theoretically possible due to abstraction, but no implementation is included.
- ^ The primary format is binary, but text and JSON formats are available.[8][9]
- ^ Means that generic tools/libraries know how to encode, decode, and dereference a reference to another piece of data in the same document. A tool may require the IDL file, but no more. Excludes custom, non-standardized referencing techniques.
- ^ ASN.1 has X.681 (Information Object System), X.682 (Constraints), and X.683 (Parameterization) that allow for the precise specification of open types where the types of values can be identified by integers, by OIDs, etc. OIDs are a standard format for globally unique identifiers, as well as a standard notation ("absolute reference") for referencing a component of a value. For example, PKIX uses such notation in RFC 5912. With such notation (constraints on parameterized types using information object sets), generic ASN.1 tools/libraries can automatically encode/decode/resolve references within a document.
- ^ The primary format is binary, a json encoder is available.[10]
- ^ The primary format is binary, but a text format is available.
Syntax comparison of human-readable formats
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
---|---|---|---|---|---|---|---|---|
ASN.1 (XML Encoding Rules) |
<foo />
|
<foo>true</foo>
|
<foo>false</foo>
|
<foo>685230</foo>
|
<foo>6.8523015e+5</foo>
|
<foo>A to Z</foo>
|
<SeqOfUnrelatedDatatypes>
<isMarried>true</isMarried>
<hobby />
<velocity>-42.1e7</velocity>
<bookname>A to Z</bookname>
<bookname>We said, "no".</bookname>
</SeqOfUnrelatedDatatypes>
|
An object (the key is a field name):
<person>
<isMarried>true</isMarried>
<hobby />
<height>1.85</height>
<name>Bob Peterson</name>
</person>
A data mapping (the key is a data value): <competition>
<measurement>
<name>John</name>
<height>3.14</height>
</measurement>
<measurement>
<name>Jane</name>
<height>2.718</height>
</measurement>
</competition>
|
CSVb | null a(or an empty element in the row)a |
1 atrue a
|
0 afalse a
|
685230 -685230 a
|
6.8523015e+5 a
|
A to Z "We said, ""no""."
|
true,,-42.1e7,"A to Z"
|
42,1 A to Z,1,2,3 |
edn | nil
|
true
|
false
|
685230 -685230
|
6.8523015e+5
|
"A to Z" , "A \"up to\" Z"
|
[true nil -42.1e7 "A to Z"]
|
{:kw 1, "42" true, "A to Z" [1 2 3]}
|
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
Ion |
|
true
|
false
|
685230 -685230 0xA74AE 0b111010010101110
|
6.8523015e5
|
"A to Z" '''
|
[true, null, -42.1e7, "A to Z"]
|
{'42': true, 'A to Z': [1, 2, 3]}
|
Netstringsc | 0:, a4:null, a
|
1:1, a4:true, a
|
1:0, a5:false, a
|
6:685230, a
|
9:6.8523e+5, a
|
6:A to Z,
|
29:4:true,0:,7:-42.1e7,6:A to Z,,
|
41:9:2:42,1:1,,25:6:A to Z,12:1:1,1:2,1:3,,,, a
|
JSON | null
|
true
|
false
|
685230 -685230
|
6.8523015e+5
|
"A to Z"
|
[true, null, -42.1e7, "A to Z"]
|
{"42": true, "A to Z": [1, 2, 3]}
|
OGDL[verification needed] | null a
|
true a
|
false a
|
685230 a
|
6.8523015e+5 a
|
"A to Z" 'A to Z' NoSpaces
|
true null -42.1e7 "A to Z"
|
42 true "A to Z" 1 2 3 42 true "A to Z", (1, 2, 3) |
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
OpenDDL | ref {null}
|
bool {true}
|
bool {false}
|
int32 {685230} int32 {0x74AE} int32 {0b111010010101110}
|
float {6.8523015e+5}
|
string {"A to Z"}
|
Homogeneous array:
int32 {1, 2, 3, 4, 5} Heterogeneous array: array { bool {true} ref {null} float {-42.1e7} string {"A to Z"} } |
dict { value (key = "42") {bool {true}} value (key = "A to Z") {int32 {1, 2, 3}} } |
PHP serialization format | N;
|
b:1;
|
b:0;
|
i:685230; i:-685230;
|
d:685230.15; dd:INF; d:-INF; d:NAN;
|
s:6:"A to Z";
|
a:4:{i:0;b:1;i:1;N;i:2;d:-421000000;i:3;s:6:"A to Z";}
|
Associative array:a:2:{i:42;b:1;s:6:"A to Z";a:3:{i:0;i:1;i:1;i:2;i:2;i:3;}} Object: O:8:"stdClass":2:{s:4:"John";d:3.14;s:4:"Jane";d:2.718;} d
|
Pickle (Python) | N.
|
I01\n.
|
I00\n.
|
I685230\n.
|
F685230.15\n.
|
S'A to Z'\n.
|
(lI01\na(laF-421000000.0\naS'A to Z'\na.
|
(dI42\nI01\nsS'A to Z'\n(lI1\naI2\naI3\nas.
|
Property list (plain text format)[11] |
— | <*BY>
|
<*BN>
|
<*I685230>
|
<*R6.8523015e+5>
|
"A to Z"
|
( <*BY>, <*R-42.1e7>, "A to Z" )
|
{ "42" = <*BY>; "A to Z" = ( <*I1>, <*I2>, <*I3> ); } |
Property list (XML format)[12] |
— | <true />
|
<false />
|
<integer>685230</integer>
|
<real>6.8523015e+5</real>
|
<string>A to Z</string>
|
<array>
<true />
<real>-42.1e7</real>
<string>A to Z</string>
</array>
|
<dict>
<key>42</key>
<true />
<key>A to Z</key>
<array>
<integer>1</integer>
<integer>2</integer>
<integer>3</integer>
</array>
</dict>
|
Protocol Buffers | — | true
|
false
|
685230 -685230
|
20.0855369
|
"A to Z"
|
field1: "value1" field1: "value2" field1: "value3 anotherfield { foo: 123 bar: 456 } anotherfield { foo: 222 bar: 333 } |
thing1: "blahblah"
thing2: 18923743
thing3: -44
thing4 {
submessage_field1: "foo"
submessage_field2: false
}
enumeratedThing: SomeEnumeratedValue
thing5: 123.456
[extensionFieldFoo]: "etc"
[extensionFieldThatIsAnEnum]: EnumValue
|
Format | Null | Boolean true | Boolean false | Integer | Floating-point | String | Array | Associative array/Object |
S-expressions | NIL nil
|
T #t ftrue
|
NIL #f ffalse
|
685230
|
6.8523015e+5
|
abc "abc" #616263# 3:abc {MzphYmM=} |YWJj|
|
(T NIL -42.1e7 "A to Z")
|
((42 T) ("A to Z" (1 2 3)))
|
YAML | ~ null Null NULL [13]
|
y Y yes Yes YES on On ON true True TRUE [14]
|
n N no No NO off Off OFF false False FALSE [14]
|
685230 +685_230 -685230 02472256 0x_0A_74_AE 0b1010_0111_0100_1010_1110 190:20:30 [15]
|
6.8523015e+5 685.230_15e+03 685_230.15 190:20:30.15 .inf -.inf .Inf .INF .NaN .nan .NAN [16]
|
A to Z "A to Z" 'A to Z'
|
[y, ~, -42.1e7, "A to Z"]
- y - - -42.1e7 - A to Z |
{"John":3.14, "Jane":2.718}
42: y A to Z: [1, 2, 3] |
XMLe and SOAP | <null /> a
|
true
|
false
|
685230
|
6.8523015e+5
|
A to Z
|
<item>true</item>
<item xsi:nil="true"/>
<item>-42.1e7</item>
<item>A to Z<item>
|
<map>
<entry key="42">true</entry>
<entry key="A to Z">
<item val="1"/>
<item val="2"/>
<item val="3"/>
</entry>
</map>
|
XML-RPC | <value><boolean>1</boolean></value>
|
<value><boolean>0</boolean></value>
|
<value><int>685230</int></value>
|
<value><double>6.8523015e+5</double></value>
|
<value><string>A to Z</string></value>
|
<value><array>
<data>
<value><boolean>1</boolean></value>
<value><double>-42.1e7</double></value>
<value><string>A to Z</string></value>
</data>
</array></value>
|
<value><struct>
<member>
<name>42</name>
<value><boolean>1</boolean></value>
</member>
<member>
<name>A to Z</name>
<value>
<array>
<data>
<value><int>1</int></value>
<value><int>2</int></value>
<value><int>3</int></value>
</data>
</array>
</value>
</member>
</struct>
|
- ^ Omitted XML elements are commonly decoded by XML data binding tools as NULLs. Shown here is another possible encoding; XML schema does not define an encoding for this datatype.
- ^ The RFC CSV specification only deals with delimiters, newlines, and quote characters; it does not directly deal with serializing programming data structures.
- ^ The netstrings specification only deals with nested byte strings; anything else is outside the scope of the specification.
- ^ PHP will unserialize any floating-point number correctly, but will serialize them to their full decimal expansion. For example, 3.14 will be serialized to 3.140000000000000124344978758017532527446746826171875.
- ^ XML data bindings and SOAP serialization tools provide type-safe XML serialization of programming data structures into XML. Shown are XML values that can be placed in XML elements and attributes.
- ^ This syntax is not compatible with the Internet-Draft, but is used by some dialects of Lisp.
Comparison of binary formats
Format | Null | Booleans | Integer | Floating-point | String | Array | Associative array/object |
---|---|---|---|---|---|---|---|
ASN.1 (BER, PER or OER encoding) |
NULL type | BOOLEAN:
|
INTEGER:
|
REAL:
|
Multiple valid types (VisibleString, PrintableString, GeneralString, UniversalString, UTF8String) | Data specifications SET OF (unordered) and SEQUENCE OF (guaranteed order) | User definable type |
BSON | \x0A (1 byte) |
True: \x08\x01 False: \x08\x00 (2 bytes) |
int32: 32-bit little-endian 2's complement or int64: 64-bit little-endian 2's complement | Double: little-endian binary64 | UTF-8-encoded, preceded by int32-encoded string length in bytes | BSON embedded document with numeric keys | BSON embedded document |
Concise Binary Object Representation (CBOR) | \xf6 (1 byte) |
(1 byte) |
|
|
|
|
|
Efficient XML Interchange (EXI)[a] (Unpreserved lexical values format) |
xsi:nil is not allowed in binary context. | 1–2 bit integer interpreted as boolean. | Boolean sign, plus arbitrary length 7-bit octets, parsed until most-significant bit is 0, in little-endian. The schema can set the zero-point to any arbitrary number. Unsigned skips the boolean flag. |
|
Length prefixed integer-encoded Unicode. Integers may represent enumerations or string table entries instead. | Length prefixed set of items. | Not in protocol. |
FlatBuffers | Encoded as absence of field in parent object |
(1 byte) |
Little-endian 2's complement signed and unsigned 8/16/32/64 bits | UTF-8-encoded, preceded by 32-bit integer length of string in bytes | Vectors of any other type, preceded by 32-bit integer length of number of elements | Tables (schema defined types) or Vectors sorted by key (maps / dictionaries) | |
Ion[18] | \x0f [b]
|
|
|
|
|
\xbx Arbitrary length and overhead. Length in octets.
|
|
MessagePack | \xc0
|
|
|
Typecode (1 byte) + IEEE single/double |
encoding is unspecified[19] |
|
|
Netstrings[c] | Not in protocol. | Not in protocol. | Not in protocol. | Not in protocol. | Length-encoded as an ASCII string + ':' + data + ',' Length counts only octets between ':' and ',' |
Not in protocol. | Not in protocol. |
OGDL Binary | |||||||
Property list (binary format) |
|||||||
Protocol Buffers |
|
UTF-8-encoded, preceded by varint-encoded integer length of string in bytes | Repeated value with the same tag or, for varint-encoded integers only, values packed contiguously and prefixed by tag and total byte length | — | |||
Smile | \x21
|
|
|
IEEE single/double, BigDecimal
|
Length-prefixed "short" Strings (up to 64 bytes), marker-terminated "long" Strings and (optional) back-references | Arbitrary-length heterogenous arrays with end-marker | Arbitrary-length key/value pairs with end-marker |
Structured Data eXchange Formats (SDXF) | Big-endian signed 24-bit or 32-bit integer | Big-endian IEEE double | Either UTF-8 or ISO 8859-1 encoded | List of elements with identical ID and size, preceded by array header with int16 length | Chunks can contain other chunks to arbitrary depth. | ||
Thrift |
- ^ Any XML based representation can be compressed, or generated as, using EXI – "Efficient XML Interchange (EXI) Format 1.0 (Second Edition)".[17] – which is a "Schema Informed" (as opposed to schema-required, or schema-less) binary compression standard for XML.
- ^ All basic Ion types have a null variant, as its 0xXf tag. Any tag beginning with 0x0X other than 0x0f defines ignored padding.
- ^ Interpretation of Netstrings is entirely application- or schema-dependent.
See also
References
- ^ Apache Thrift
- ^ Bormann, Carsten (2018-12-26). "CBOR relationship with msgpack". GitHub. Retrieved 2023-08-14.
- ^ "Implementations". GitHub.
- ^ "HAPI FHIR - The Open Source FHIR API for Java". hapifhir.io.
- ^ cpython/Lib/pickle.py
- ^ "A Brief History of SOAP". www.xml.com.
- ^ Ben-Kiki, Oren; Evans, Clark; Net, Ingy döt (2009-10-01). "YAML Ain't Markup Language (YAML) Version 1.2". The Official YAML Web Site. Retrieved 2012-02-10.
- ^ "text_format.h - Protocol Buffers". Google Developers.
- ^ "JSON Mapping - Protocol Buffers". Google Developers.
- ^ "Avro Json Format".
- ^ "NSPropertyListSerialization class documentation". www.gnustep.org. Archived from the original on 2011-05-19. Retrieved 2009-10-28.
- ^ "Documentation Archive". developer.apple.com.
- ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Null Language-Independent Type for YAML Version 1.1". YAML.org. Retrieved 2009-09-12.
- ^ a b Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Boolean Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-02-11). "Integer Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^ Oren Ben-Kiki; Clark Evans; Brian Ingerson (2005-01-18). "Floating-Point Language-Independent Type for YAML Version 1.1". YAML.org. Clark C. Evans. Retrieved 2009-09-12.
- ^ "Efficient Extensible Interchange".
- ^ Ion Binary Encoding
- ^ "MessagePack is an extremely efficient object serialization library. It's like JSON, but very fast and small.: msgpack/msgpack". 2 April 2019 – via GitHub.