JSound 2.0

JSound

The complete reference

Edition specification version 2.0.8 for JSound 2.0

Daniela Florescu

Edited by

Ghislain Fourny

ETH Zurich
2018

Abstract

This document is a description of JSound, the JSON schema definition language. It describes how to declare constraints on the structure of JSON documents, and processes instances of the TYSON syntax for typed values.
1. Introduction
1.1. Requirements
2. Concepts
2.1. JSON
2.2. TYSON
2.3. Candidate instance
2.4. Annotated instance
2.5. Schema document
2.6. Schema set
2.7. Meta schema
2.8. Type
2.9. Type names
2.10. Lexical and value space
2.11. Type hierarchy
2.12. Compact syntax
3. Schema Documents
3.1. Scope
3.2. Schema document properties
3.3. Type names and references to types
3.4. Recursive definitions
3.5. Examples
3.6. Types
3.7. Derived type properties
3.8. Type hierarchy
4. Atomic Types
4.1. Scope
4.2. Examples
4.3. Builtin Atomic Types
4.4. Atomic facets
4.5. Mapping to a few languages
4.6. Validation
5. Object Types
5.1. Scope
5.2. Examples
5.3. Builtin Object Type
5.4. Object Facets
5.4.1. content
5.4.2. closed
5.5. Validation
6. Array Types
6.1. Scope
6.2. Examples
6.3. Builtin Array Type
6.4. Array facets
6.4.1. content
6.4.2. minLength
6.4.3. maxLength
6.5. Validation
7. Union Types
7.1. Scope
7.2. Examples
7.3. Union facets
7.3.1. content
7.4. Validation
8. Processing
8.1. Schema document soundness (aka compilation)
8.2. Validation
8.2.1. Validating a candidate instance
8.2.2. Mandatory validation against instance annotation
8.3. Annotation
8.3.1. Annotating a candidate instance
9. Shortcuts (Compact JSound)
9.1. Avoid verbosity
9.2. General mapping
9.3. Object type simple syntax
9.4. Array type simple syntax
9.5. Union simple syntax
9.6. Examples
10. Schema of Schemas
11. Error codes
A. Revision History
Index

Chapter 1. Introduction

Over the past decade, the need for more flexible and scalable databases has greatly increased. The NoSQL universe brings many new ideas on how to build both scalable data storage and scalable computing infrastructures.
XML and JSON are probably the most popular two data formats that emerged. While XML reached a level of maturity that gives it an enterprise-ready status, JSON databases are still in their early stages. Scalable data stores, such as MongoDB, ElasticSearch, Cosmos DB (DocumentDB), CouchDB, Couchbase, are already available. One fundamental piece for a full-fledged JSON database is a way to make sure that the data stored is consistent and sound. This is where schemas come into play.
Many lessons can be learned from 40 years of relational databases history and 15 years of XML. The goal of this document is to introduce a schema language, JSound, which is much simpler than XML Schema and than JSON Schema, just like JSON syntax is much simpler than XML syntax. At the same time, JSound bring many lessons from the desing of XML Schema, building on the shoulders of a giant. A comparison between JSound and JSON Schema is available here

1.1. Requirements

The JSound schema definition language is based on the following requirements:
  • JSound MUST be independent of any host query language or host programming language. It MUST be compatible with any host language (Python, Java, Scala, JavaScript, JSONiq...).
  • JSound MUST largely reuse the XML Schema paradigm, in order to build on the shoulders of giants.
  • While the schema definition language is greatly inspired from XML Schema, it MUST avoid its complexity. It must be simpler and more readable.
  • In particular, JSound MUST avoid the XML Schema model of restriction/extension of structured types. Instead, it MUST support a different, very simple model of subtyping based on classical object-oriented inheritance. In JSound, a subtype's value space MUST always be a subset of its base type's value space, with no exceptions.
  • A candidate instance for validation MUST be well-formed TYSON. This includes in particular JSON documents.
  • A schema document must be a well-formed JSON document in the sense that it parses against the JSON grammar.
  • A schema document MUST be valid in the sense that there is a JSound metaschema document against which all schema documents (including itself) are valid.
  • A validation engine MUST be able, if the validation is successful, to output a TYSON instance with all values annotated with the types according to the schema.
  • JSound must support a few additional atomic types widely acknowledged to be useful (dates, times, binaries, durations...) and that JSON does not support. These types must be compatible for use in TYSON syntax. These types must also be fully compatible with XML Schema atomic types for interoperability between data stores and exchange formats regardless of the syntax used.
  • JSound MUST support an alternate, simplified syntax. It should be possible, given a TYSON document, to turn it into a schema using that simplified syntax, against which it is valid with minimal changes.

Chapter 2. Concepts

2.1. JSON

JSON is a very simple, widely established syntax for storing and exchanging objects, arrays and atomic values that are strings, numbers, booleans and nulls.
{
  "name" : "Cooper",
  "first" : "Sheldon",
  "middle" : "Lee",
  "multiplication" : "MITOSIS",
  "birthdate" : "1980-02-26",
  "picture" : "0123456789abcdef",
  "friends" : [ 1, 2, 4, 5 ]
}

2.2. TYSON

TYSON, defined in a specification of its own, is a superset of JSON that supports type annotations in a language-agnostic way.
Object, array and atomic types can be added and defined by users at will, and used as annotations.
Validation of types beyond TYSON builtin types (object, array, string, integer, decimal, double, boolean, null) is beyond the scope of TYSON. This is where JSound comes into play. JSound allows users to define their own types, and perform validation and annotation of TYSON documents.
("person") {
  "name" : "Cooper",
  "first" : "Sheldon",
  "middle" : "Lee",
  "multiplication" : ("cell-division-kind") "MITOSIS",
  "birthdate" : ("date") "1980-02-26",
  "picture" : ("hexBinary") "0123456789abcdef",
  "friends" : ("ids") [ 1, 2, 4, 5 ]
}

2.3. Candidate instance

This is a TYSON value. A candidate instance may or not be valid against a type.
A candidate instance may be in particular a JSON value or even, often, a JSON object.
In the TYSON model, values can be objects, arrays, or atomics.
  • An object has an unordered list of field/value pairs, and is annotated with an object type (most often implicitly the builtin "object" type).
  • An array has an ordered list of values, and is annotated with an array type (most often implicitly the builtin "array" type).
  • An atomic has a value annotated with an atomic type.
Typically, the candidate instance will have been freshly parsed. In the case that it is JSON, it will only have atomics of type string, integer, decimal, double, boolean and null, as well as "raw" object and array types.

2.4. Annotated instance

An annotated instance is TYSON value. It is obtained from a valid candidate instance by annotating all values with the types, as specified in the schemas. If the type annotations in the candidate instance are already identical to those expected by the schema, it is identical to the candidate instance.

2.5. Schema document

A schema document is a JSON object which defines types against which candidate instances are being validated. A schema document is also a candidate instance and must be valid against the meta schema type. A schema document must also fulfill additional consistency constraints not captured by the meta schema.

2.6. Schema set

A schema set is a set of schemas that may refer to each other's type names. Schema sets may not have colliding type names.
It is left out of scope of this specification how to assemble schema documents into consistent schema sets, in particular, additional specifications may introduce naming conventions such as the use of namespaces to prevent collisions.

2.7. Meta schema

A Meta schema is a JSON object that defines the type against which all schema socuments are valid including itself.

2.8. Type

A schema document defines types, which may or may not be anonymous. A candidate instance may or may not be valid against a type.
A candidate instance can be annotated against a type, which results in an annotated instance.
There are four kinds of types: atomic, array, object and union. Types are represented with objects that are nested in a schema document. Named types can also be referred to with their names.

2.9. Type names

Type names are strings. Some strings are reserved by TYSON. Some others are reserved by JSound.
The type names defined and reserved by TYSON are: object, array, integer, decimal, double, string, boolean, null.
The type names defined and reserved by JSound are: value, atomic, anyURI, base64Binary, hexBinary, date, dateTime, time, dateTimeStamp, duration.
All above types are called builtin types.

2.10. Lexical and value space

The lexical space of an atomic type is the set of all (string) literals used to serialize this type in some syntax. For example, lexical values of the type boolean are "true" and "false"
The value space of an atomic type is the set of all (logical) typed values. For example, typed values of the type boolean are the mathematical concepts of trueness and falseness (boolean algebra).
An atomic type also includes a mapping from lexical to typed values, as well as a canonical mapping from a typed value to a "default" literal.
These definitions are consistent with the lexical and value spaces of XML Schema atomic types.
JSound focuses on validating lexical values, that is, literals. Converting these literals to typed values for further processing is left to the host language, based on its own specific type system.

2.11. Type hierarchy

JSound types are organized in a subtype hierarchy. There is a number of builtin types, and a schema document can define new derived types that restrict the value space of their base types.
JSound's type hierarchy is very strict about value spaces. The value space of a derived type is always a subset of the value space of its base type. To restrict its base type's value space, a derived type can provide new facets, or make existing facets more restrictive.
The subtype relationship is defined as the reflexive and transitive closure as the derived type relationship.
The supertype relationship is defined as the reflexive and transitive closure as the base type relationship.

2.12. Compact syntax

JSound also has a very compact syntax making schemas look similar to actual candidate instances. This syntax is only a syntactic sugar for the actual, more verbose JSound schema syntax. The design goal was that 80% of the schemas used in real life can be expressed with that compact syntax for high productivity, and the more verbose syntax can be used for more complex user needs.

Chapter 3. Schema Documents

3.1. Scope

Schema documents define multiple types.

3.2. Schema document properties

Schema documents are (serialized) JSON objects which have the following properties
  • metadata (JSON object): contains general schema properties such as name (string/anURI), previous (string/anyURI), date (string/date), authors (array of strings).
  • types (JSON array containing objects representing types) : the types defined in this document. How these objects look like is explained in subsequent sections.

3.3. Type names and references to types

Type names are strings. Type names are used to (optionally) name types, or to refer to another type as a base type.
Types that can be referred with their type name across an entire schema set.
It is forbidden to define types with builtin names (either TYSON or JSound). Otherwise, a static error JDST0013 is raised.
When a type name cannot be resolved, a static error JDST0002 is raised.
If two types in the same schema set have the same name, a static error JDST014 is raised. It is thus the responsibility of the user to not assemble a schema set with schema that use colliding type names. Grouping type names in namespaces, as well as a machinery to include namespaces in type names are out of scope of this specification, but conventions may be added at a later point in time.

3.4. Recursive definitions

Recursive definitions are allowed in general, for example, the type of a value in a pair of an object type may be identical to that object type to allow arbitrary nesting.
However, there are a few restrictions in order to avoid cycles in the type hierarchy: a type may not be derived from itself, indirectly or indirectly, and a union type may not be in the transitive closure of its own membership relationship.

3.5. Examples

This Schema document defines two atomic types and with the names "small-number" and "big-number".
{
  "types" : [
    {
      "name" : "small-number",
      "kind" : "atomic",
      "baseType" : "integer",
      "enumeration" : [ 1, 2, 4, 8 ]
    },
    {
      "name" : "big-number"
      "kind" : "atomic",
      "baseType" : "integer",
      "enumeration" : [ 1000, 2000, 4000, 8000 ]
    }
  ]
}
This schema document defines one object type named "small-and-big".
{
  "types" : [
    {
      "name" : "small-and-big",
      "kind" : "object",
      "content" : [
        {
          "name" : "small",
          "type" : "small-number",
          "required" : true
        },
        {
          "name" : "big",
          "type" : "big-number"
        }
      ]
    }
  ]
}
Given this set of two schema documents, the following JSON object:
{
  "small" : 4
}
is valid against the type named "small-and-big".
This JSON object is not valid against the type "small-and-big", because the value associated with "big" is not in the value space of the atomic type "big-number".
{
  "small" : 4,
  "big" : 3
}

3.6. Types

There are four kinds of types: atomic, object, array and union.
Types are either builtin or derived.
The topmost type is builtin and is named "value".
The topmost object type is builtin and is named "object".
The topmost array type is builtin and is named "array".
The topmost atomic type is builtin and is named "atomic". There are also further builtin atomic types such as "date", "time", "hexBinary", etc.
Derived types are always defined by restricting the value space of a base type by means of facets. They have a JSON object representation.
Derived object types may be derived from any other object type. Derived array types may be derived from any other array type. Derived union types are always directly derived from "value". Derived atomic types may be derived from any other atomic type except "atomic".

3.7. Derived type properties

A derived type has the following properties:
  • kind (string): the kind of the type. One of "atomic, "object", "array", "union".
  • name (string): a string containing the name (as defined above) of this type.
  • baseType (string): a string containing the name of the base type.
  • metadata (value): free content (documentation, comments, ...).
  • various facets properties. Which facets are available defines on the kind of the type.
There are the following constraints on these properties:
  • An error JDST0001 is raised if $kind is missing.
  • An error JDST0003 is raised if any other value is encountered for kind.
  • If name is used in a top-level type, it must match the corresponding object field, otherwise an error JDST0004 is raised.
  • baseType must refer to a known type - builtin, in the same schema document or in another schema document in the schema set. Otherwise, a static error JDST0002 is raised.
  • Recursion is not allowed through the baseType property, i.e., no cycles in the base type relation are allowed. If a cycle is detected, then error JDST0018 is raised.
  • If kind is "object", baseType must be the name of an existing object type. If absent, it is by default "object".
  • If kind is "array", baseType must be the name of an existing array type. If absent, it is by default "array".
  • If kind is "union", baseType must be "value" if provided. If absent, it is by default "value".
  • If kind is "atomic", baseType must be the name of an existing atomic type, but not "atomic". It cannot be absent.
If kind and baseType are not consistent as specified above, a static error JDST0007 is raised.
Here is an example of an invalid schema document, because it does not fulfill many of the above constraints.
{
  "types" : [
    {
      "name" : "type1",
      "kind" : "atomic",
      "baseType" : "object", (: base type MUST also be an atomic type :)
      "maxInclusive" : 4
    },
    {
      "name" : "object1",
      "kind" : "object",
      "baseType" : "type1" (: base type MUST be "object":)
      "content" : {}
    },
    {
      "name" : "object2",
      "kind" : "object",
      "baseType" : "object1" (: base type MUST be "object":)
    }
  }
}
A derived type inherits its base type's facets. It can set a new facet if its base type does not already have it, or it can redefine a facet. However, a facet can only be redefined if its new value is more restrictive that in its base type, otherwise, a static error JDST0005 is raised.
Some facets, like constraints, are accumulative facets. This means that redefining them is done by adding new constraints to the already present ones, without raising JDST0007.
There are two facet common to all types:
  • enumeration (array): Constrains a value space to a specified set of values. If one of these values is not valid against the type at hand, then JDST0006 is raised.
  • constraints (array of strings): implementation-defined constraints defined in an implementation-defined host language, the choice of which is orthogonal to this specification.
{
  "types" : [
    {
      "name" : "two-objects",
      "kind" : "object"
      (: "baseType" : "object" is implicit :)
      "enumeration" : [ { "foo" : "bar" }, {} ] (: only these two objects :)
    },
    {
      "name" : "uniform-array",
      "kind" : "array"
      (: "baseType" : "array" is implicit :)
      "constraints" : [ "every $i in 1 to size($$) satisfies deep-equals($$($i), $$(1))" ]  (: all members must be the same :)
    },
  ]
}
The following JSON object is valid against type "two-objects".
{ "foo" : "bar" }
The following JSON array is valid against type "uniform-array".
[ 42, 42, 42 ]

3.8. Type hierarchy

The entire type hierarchy is shown below. "value", "atomic" and user union types are not instantiable and cannot be used as type annotations in TYSON syntax. An instantiable subtype must be used instead.

Chapter 4. Atomic Types

4.1. Scope

Atomic types match atomics (TYSON leaf values: strings, numbers, booleans, nulls, etc).
Atomic types, except the topmost "atomic", have a lexical space (a set of literals denoting the values), a value space (a set of actual values), and a lexical mapping which maps the former into the latter. This is consistent with the atomic type requirements drawn out in the TYSON specification. These types may thus all be used as TYSON type annotations.
An atomic type can be either the topmost atomic, or a primitive builtin type, or a builtin type derived from a primitive type, or a user-defined type derived from any other atomic type (except atomic).
A derived atomic type can be defined by restricting the value space of another (non-topmost) atomic type by specifying atomic facets. A restriction can also be made with the general facet enumeration.

4.2. Examples

Given the following schema document:
{
  "types" : [
    {
      "name" : "foo-and-bar",
      "kind" : "atomic",
      "baseType" : "string",
      "enumeration" : [ "foo", "bar" ]
    },
    {
      "name" : "digits",
      "kind" : "atomic",
      "baseType" : "integer",
      "minInclusive" : 1,
      "maxExclusive" : 10
    },
    {
      "name" : "few-digits",
      "kind" : "atomic",
      "baseType" : "my:digits",
      "enumeration" : [ 4, 6 ]
    }
  ]
}
The strings "foo" and "bar" are valid against type named "foo-and-bar". The string "foobar" and the array [ "foo", "bar" ] are not.
The atomics (integers) 2 and 7 are valid against the type named "digits". The string "2", the integer 0 and the array [ "foo", "bar" ] are not.
The integer 4 is valid against the type named "few-digits". The integer 2, the integer 0 and the array [ "foo", "bar" ] are not.

4.3. Builtin Atomic Types

A number of builtin atomic types are predefined. Most of them have counterparts in XML Schema 1.1, because they are very useful also in JSON (for example : dates, times, ...). In particular, they have the same value space, the same lexical space, the same lexical mapping and (for primitive types) the same associated set of atomic facets.
Some of these builtin types are primitive and marked as such below. Others are derived from another builtin type.
There is also a special builtin type atomic, which is a supertype of all primitive types and, by transition, of all atomic types.
The lexical space of dateTime as defined in XML Schema 1.1 is a superset of the date representation defined in ECMAScript. In addition, JSound extends the lexical representation of respectively date, time, dateTime defined above, to allow the format defined in RFC 2822 (nonterminals date, time, date-time respectively). This is because many JavaScript implementations do so.

4.4. Atomic facets

Restriction is done using the general facets, or the following atomic facets (they must be available for the base type).
These facets are defined in XML Schema 1.1. For convenience, the summary from the XML Schema 1.1 specification is provided below. Which primitive type has which facets is defined in XML Schema 1.1 as well.
The following atomic facets are available for the primitive types string, anyURI, base64Binary, hexBinary:
  • length (integer): Constraining a value space to values with a specific number of units of length, where units of length varies depending on the base type.
  • minLength (integer): Constraining a value space to values with at least a specific number of units of length, where units of length varies depending on the base type.
  • maxLength (integer): Constraining a value space to values with at most a specific number of units of length, where units of length varies depending on the base type.
The following atomic facets are available for the primitive types date, dateTime, time, duration, decimal, double:
  • maxInclusive (atomic): Constraining a value space to values with a specific inclusive upper bound.
  • maxExclusive (atomic): Constraining a value space to values with a specific exclusive upper bound.
  • minExclusive (atomic): Constraining a value space to values with a specific exclusive lower bound.
  • minInclusive (atomic): Constraining a value space to values with a specific inclusive lower bound.
The following atomic facets are available for the primitive type decimal:
  • totalDigits (integer): Restricting the magnitude and arithmetic precision of values in the value spaces of decimal and datatypes derived from it.
  • fractionDigits (integer): Placing an upper limit on the arithmetic precision of decimal values.
The following atomic facets are available for the primitive types date, dateTime, time:
  • explicitTimezone ("required", "prohibited" or "optional"): Requiring or prohibiting the time zone offset in date/time datatypes.
All above facets are non-accumulative. Facets are all inherited by a derived type from its base type, but can be overriden. The error JDST0007 is raised if they are redefined in a less restrictive way.

4.5. Mapping to a few languages

This non-normative section shows how builtin types can be mapped to existing languages (Python, JavaScript). Asterisks (*) indirect an impedance mismatch where the mapping is not perfect.

Table 4.1. Mapping of atomic types

JSoundPythonJavaScript
stringstrString
booleanboolBoolean
nullany typeNull
doublefloatNumber
decimalfloat*Number*
integerintNumber*
anyURIstrString
base64BinarybytesUint8Array
hexBinarybytesUint8Array
datedateN/A
dateTimedatetimeN/A
timetimeN/A
dateTimeStampaware datetimeDate
durationtimedeltaN/A

4.6. Validation

Validation of a candidate instance I against an object type T is defined as follows.
If I is not an atomic value, then I is not valid agains T.
Otherwise, if T is "atomic", then I is valid against T.
Otherwise, if T is a primitive atomic type, then I is valid against T if its lexical representation is in the lexical space of this primitive atomic type.
Otherwise, I is valid against T if it is valid against all its facets (which means that the lexical representation is in the lexical space of the derived type). Note that, by design, this also implies that it is valid against the base type of T.

Chapter 5. Object Types

5.1. Scope

Object types match objects.
An object type can be defined by restricting the value space of "object" by specifying a layout (type of the values, required or not, ...). A restriction can also be made with the general type facets "enumeration" and "constraints".
Any object type may be used as a TYSON type annotation.

5.2. Examples

Against the following object type:
{
  "types" : [
    {
      "name" :"only-foo",
      "kind" : "object",
      "content" : [
        {
          "name" : "foo",
          "type" : "string",
          "required" : true 
        }
      ],
      "closed" : true
    },
    {
      "name" : "foo-bar-and-arrays",
      "kind" : "object",
      "content" : [
        {
          "name" : "foo",
          "type" : "string",
          "required" : true
        },
        {
         "name" : "bar",
         "type" : "boolean"
        }
      ]
    }
  ]
}
The objects { "foo" : "bar" } and { "foo" : "foo" } are valid against the type "only-foo" because the foo pairs are present, and are strings.
The object {} is not because the foo pair is missing.
The object { "foo" : "bar", "bar" : "foo" } is not because no other pair than "foo" is allowed (closed object type).
The objects { "foo" : "bar" } and { "foo" : "bar", "bar" : true, "foobar" : [ 3.14 ] } are valid against the type "foo-bar" because the foo pairs are strings, bar is not required and the object type is open.
The objects {} and { "bar" : "foo" } and { "foo" : "bar", "bar" : "foo" } are not because the foo pair is missing or the bar pair is not a boolean.

5.3. Builtin Object Type

There is one topmost, builtin object type named object, against which all objects are valid.
This topmost type has its content facet as the empty array, and its closed facet as false.

5.4. Object Facets

Restriction is done using the general facets, or the following object facets.

5.4.1. content

The content facet is an array of objects containing the layout definition. By default, it is the empty array.
Each member in content is called a field descriptor. Each field descriptor has the following properties.
  • name (string): the field name - required (JDST0008 if absent).
  • type (string or object) - required (JDST0008 if absent): the name of a type (a string) or the type itself (an object) that the value must match. JDST0002 is raised if if is a name that cannot be resolved.
  • required (boolean) - optional: indicates that the pair is required. Default is false.
  • default (value) - optional: indicates a default value to be taken the value is missing in the serialized instance. required is then ignored. Adding the default value to a valid instance is performed as part of the annotation process.
  • unique (boolean) - optional: indicates that the field value must be unique within the parent array, if the object type is used as content of an array type. Validity against unique is handled as part of array validation (see the appropriate section). Default is false.
If the closed facet of the base type is true, it is not allowed to add field descriptors for fields not present in the content facet of base type, or of its transitive closure, otherwise JDST0010 is raised.
The content facet is cumulative, in that the baseType's field descriptors are automatically inherited if absent in the derived type, and the values of the field descriptors are inherited as well if the field descriptor is present in both the base type and the derived type, but a field descriptor value is absent in the derived type.
If a field descriptor in the content facet is overriden, it must be more restrictive, i.e., its type must be a subtype of the type associated to this field by the closest super type which does so. Also, required, if true, cannot be set back to false. If any of these two constraints is not met, JDST0011 is raised.
The value of unique cannot be changed when redefining a field descriptor of the base type.
An object $o is valid against the content facet if the following conditions are met:
  • For each field descriptor $v such that $v.required is true and $v.default is absent, there must be a pair named $v.name in $o.
  • For each field descriptor $v in the field descriptor, if $o.($v.name) exists, then $o.($v.name) must be valid against the type $v.type.

5.4.2. closed

The closed facet is a boolean. It specifies whether pairs not specified in the content facet are to be refused. The default is the same as the baseType, in particular, the default is false if baseType is "object".
If the closed facet of the base type is true, it cannot be set back to false, otherwise DST0009 is raised.
All objects are valid against the closed facet if it is set to false. If it is set to true, an object $o is valid against the closed facet if all its fields have corresponding field descriptors in the content facet, or in the content facet of any super type.

5.5. Validation

Validation of a candidate instance I against an object type T is defined as follows.
If I is not an object, then I is not valid agains T.
If I is an object, it is valid against T if it is valid against both of the facets "content" and "closed" of T. Note that, by design, this also implies that it is valid against the base type of T.

Chapter 6. Array Types

6.1. Scope

Array types match arrays.
An array type can be defined by restricting the value space of an array base type by specifying a layout (type of the members) or size bounds. A restriction can also be made with the general type facet $enumeration.
Any array type may be used as a TYSON type annotation.

6.2. Examples

{
  "types" : [
    {
      "name" : "strings",
      "kind" : "array",
      "content" : "string"
    },
    {
      "name" : "less-than-five-members",
      "kind" : "array",
      "content" : "string",
      "maxLength" : 5
    },
    {
      "name" : "all-less-than-ten",
      "kind" : "array",
      "content" : "integer"
    }
  ]
}
[ "foo " "bar" ] is valid against the type named "strings" but not [ 1, 2, "foo" ].
[ "foo " "bar" ] is valid against the type named "less-than-five-members" but not [ "foo", "foo", "foo", "foo", "foo", "foo" ].
[ 1, 3, 5 ] is valid against the type named "all-less-than-ten" but not [ 1, 3, 72, null ].

6.3. Builtin Array Type

There is one topmost, builtin array type named array, against which all arrays are valid.

6.4. Array facets

Restriction is done using the general facet enumeration, or the following array facets.
JSound supports the following array facets.
These three facets are, by default, inherited from the base type but can be overriden. JDST0007 is raised if one of the facets is redefined by the derived type in a less restrictive way than in its base type.

6.4.1. content

The content facet is a string or an object. The name of a type (string) or the type itself (an object) that all members must match.
The content facet of the "array" type is absent and all arrays are valid against it.

6.4.2. minLength

The minLength facet is an integer. It constraints the number of members in the instance with a lower bound.
The minLength facet of the "array" type is 0 and all arrays are valid against it.

6.4.3. maxLength

The maxLength facet is an integer. It constraints the number of members in the instance with an upper bound.
The maxLength facet of the "array" type is positive infinity and all arrays are valid against it.

6.5. Validation

Validation of a candidate instance I against an array type T is defined as follows.
If I is not an array, then I is not valid agains T.
If I is an array, it is valid against T if it is valid against all three facets "content", "minLength" and "maxLength" of T. Note that, by design, this also implies that it is valid against the base type of T.
Furthermore, if the content of the array type is an object type and the "unique" field descriptor is used in the content of that object type, then across all object children of the candidate instance, all compound values associated with fields marked as unique must appear at most once. Structured values are recursively compared. Atomic values are compared in their value space.

Chapter 7. Union Types

7.1. Scope

The value space of a union type is the union of the value spaces of all its member types.
All union types have directly the topmost "value" as their base type and restrict the value space by specifying the content facet. General facets cannot be used. "value" can be considered a builtin union type with "object", "array" and "atomic" as its member types.
A union type MAY NOT be used as type annotations in TYSON, otherwise an error JDST0012 is raised. A member type of the union type must be used instead.
A consequence of the way union types are defined in a restrictivey way is that a union type will always be expressible as a flat union of non-union types.

7.2. Examples

{
  "types" : [
    {
      "name" : "string-or-integer-array",
      "kind" : "union",
      "content" : [ "string", { "$kind" : "array", "$content" : "integer" } ],
    },
    {
      "name" : "just-two",
      "kind" : "union",
      "content" : [ "string", { "$kind" : "array", "$content" : "integer" } ]
    }
  ]
}
"foo", "bar" and [ 1, 2, 3 ] are valid against the type named "string-or-integer-array" but 3.14 and true are not.
"foo", and [ 1, 2, 3, 4 ] are valid against the type named "just-two" but [ null ] and 3.14 are not.

7.3. Union facets

The specification of member types is done using one (compulsory) content facet, and optionally general facets.
Union types can be derived from another union type by overriding the content facet. JDST0017 is raised if it does hot hold that all members of the overriding content facet are subtypes (directly or indirectly) of a member of the content facet of the base type.
The content facet of the topmost "value" type is implicitly [ "object", "array", "atomic" ]: its value space contains all possible JSON/TYSON values.

7.3.1. content

The content facet is an array of strings or objects. Each member in the array is the name of a type (string) or the member type itself (an object). A value is valid against this facet if it is valid against any of the types in this list.
It is forbidden for a union type to appear in its own member list. The same is also forbidden transitively, i.e., a union type cannot appear in its transitive membership relation. If such a cycle is detected, then error JDST0018 is raised.

7.4. Validation

Validation of a candidate instance I against a union type T is defined as follows.
I is valid against T if it is valid against at least one member type (in the content facet of T).

Chapter 8. Processing

This section introduces the processing mechanism involving JSound schemas: validation and annotation.

8.1. Schema document soundness (aka compilation)

When a schema document is provided, it must be checked for consistency. For example, a type may not override its base type's facets in a less restrictive way. If an inconsistency is discovered in a schema document, a static error is raised.

8.2. Validation

A candidate instance can be validated against a type. The validation action takes a candidate instance and a type (typically, a schema set and the name of a type defined in one of the belonging schema documents). It results in a boolean that describes whether the candidate instance is valid against the type given its definition in the schema set. If the candidate instance is not valid, false is returned. The validation errors can be obtained in an implementation-defined way.
The exact machinery for providing schema sets and selecting a type to validate against is left out of scope.

8.2.1. Validating a candidate instance

The validation process is defined recursively in each one of the sections presenting atomic, object, array and union types.

8.2.2. Mandatory validation against instance annotation

If I has a TYSON type annotation U, it must also be checked for validation against U, in addition of being valid against T. Otherwise, overall validation fails and an error JDST0015 is thrown. This means that if an object that has an annotation U that differs from the expected schema type T, it must be validated twice for the validation step to succeed. Implementations can cache validity flags to avoid an exponentional complexity when this propagate recursively.
If the type U annotating an object in TYSON is not present in the schema set, an error JDST0016 is thrown.

8.3. Annotation

Annotation is the action of passing a valid candidate instance through a type (identified with a name in a schema set) and recursively annotating an object or an array with the current schema type as well as determining against which schema type the object values or array members will be annotated.
As the validation action, the annotation action takes a candidate instance, a type (typically, a schema set and the name of a type defined in one of the belonging schema documents) and results in a set of annotations on the instance that describe the types which the nested instances match. The annotation process also populates the instance with default values whenever values are missing and the type definition specifies default values.
When a type cannot be found that matches a given nested Candidate instance, an error is raised. Annotating an invalid instance raises an error as well.

8.3.1. Annotating a candidate instance

Annotation of a candidate instance I against a type T is defined as follows, recursively.
If I is not valid against T, an error JDST0017 is thrown.
If T is an anonymous type, in which case T has a named base type, then the annotation of I against T is the annotation of I against the base type of T.
If T is a union type, then the annotation of I against T is the annotation of I against the first member type of T against which it is valid.
If I is an atomic value (and thus T is an atomic type), bearing a type annotation (implicit or explicit) U against which it is valid. If U is a subtype of T, then A keeps its annotation U. Otherwise, A is annotated with T. In any case, the lexical value is unchanged.
If I is an object O (and thus T is an object type), bearing a type annotation (implicit or explicit) U against which it is valid. If U is a subtype of T, then O keeps its annotation U. Otherwise, O is annotated with T. Furthermore, the object value of I is recursively (that is, each child value) annotated according to the expected types in the schema definition of the chosen annotation (either T or U). In addition, when a field is missing, and the target type (T or U) specifies a default value for this field, then the annotated object is extended with this field and default value.
If I is an array A (and thus T is an array type), bearing a type annotation (implicit or explicit) U against which it is valid. If U is a subtype of T, then A keeps its annotation U. Otherwise, A is annotated with T. Furthermore, A is recursively (that is, each member) annotated according to the expected types in the schema definition of the chosen annotation (either T or U).

Chapter 9. Shortcuts (Compact JSound)

9.1. Avoid verbosity

JSound as such can be very verbose. For this reason, an alternate, very simple syntax is introduced. It cannot be mixed with the verbose syntax, meaning that these are two different file formats. Special characters (!, ?, =, @ and |) are forbidden in simple syntax except in the way specified. If type names or values contain these characters, the verbose syntax must be used.
A schema expressed in the simple syntax JSound-Compact mirrors the instance layout.
Note that the simple syntax only covers a subset of features of JSound, which was designed to cover 80% of the cases. If features are needed that are not covered by the simple syntax, then the verbose syntax MUST be used.

9.2. General mapping

This chapter describes a mapping M from any type expressed of the simple syntax T, to its verbose syntax M(T), recursively.
A JSound-Compact schema is an object mapping type names to types expressed in the simple syntax.
{
  "type1" : <T>,
  "type2" : <U>
}
is recursively mapped to:
{
  "types" : [
    { "name" : "type1",  <M(T)> },
    { "name" : "type2",  <M(U)> },
  ]
}

9.3. Object type simple syntax

An object type can be expressed as a layout in simple syntax.
The layout of an object type in simple syntax matches that of an instance, in that the names of the data fields are also fields in the simple syntax, recursively mapped to the simple syntax of the field descriptors.
An exclamation mark can be used at the end of a key to mark this key as required.
An equal sign can be used to give a default value.
For arrays of objects, @ is used as a suffix to a field to set its field descriptor $unique to true.
{
  "field1" : <T>,
  "field2!" : <U>
  "field3@" : "<t>=<v>"
}
is recursively mapped to:
{
  "kind" : "object",
  "content" : [
    { "name" : "field1", "type" : <M(T)> },
    { "name" : "field2", "type" : <M(U)>, "required" : true },
    { "name" : "field2", "type" : "<t>", "default" : "<v>", "unique" : true }
  ]
}
Below is a concrete example.
{
  "my-object" : {
    "foo" : "string=foobar",
    "bar" : {
      "foobar!" : "boolean"
    }
  }
}
is a shortcut for the verbose:
{
  "types" : [
    {
      "name" : "my-object",
      "kind" : "object",
      "content" : [
        {
          "name" : "foo",
          "type" : "string",
          "default" : "foobar"
        },
        {
          "name" : "bar",
          "type" : {
            "kind" : "object",
            "content" : [
              {
                "name" : "foobar",
                "type" : "boolean",
                "required" : true }
            ]
          }
        }
      ]
    }
  ]
}

9.4. Array type simple syntax

An array type can be expressed as an array recursively containing a type in simple syntax.
[ <T> ]
is recursively mapped to:
{
  "kind" : "array",
  "content" : <M(T)>
}
Below is a concrete example.
{
  "$types" : {
    "my-array" : [ "date" ],
    "my-array-of-objects" : [ { "my-key@" : "string", "foo" : "integer" } ]
  }
}
is a shortcut for
{
  "types" : [
    {
      "name" : "my-array",
      "kind" : "array",
      "content" : "date"
    },
    {
      "name" : "my-array-of-objects",
      "kind" : "array",
      "content" : [
        {
          "kind" : "object",
          "content" : [
            {
              "name" : "my-key",
              "type" : "string",
              "$unique" : true
            },
            {
              "name" : "foo",
              "type" : "integer"
            }
          ]
        }
      ]
    }
  ]
}

9.5. Union simple syntax

Union can be expressed as strings using the | symbol. The question mark can be used as a shortcut for "|null". The union simple syntax only works with type names. Type descriptors in simple syntax cannot be nested. More complex unions must use the verbose syntax.
If the | and ? are not used in a string in the simple syntax, the string is simple kept as is, as it is then a type name.
"<t>|<u>|<v>?"
is recursively mapped to:
{
  "kind" : "union",
  "content" : [ "<t>", "<u>", "<v>", "null" ]
}
Below is a concrete example.
{
  "types" : {
    "my-union" : "string|integer",
    "string-or-null" : "string?"
  }
}
is a shortcut for
{
  "types" : [
    {
      "name" : "my-union",
      "kind" : "union",
      "content" : [ "string", "integer" ]
    },
    {
      "name" : "string-or-null",
      "kind" : "union",
      "content" : [ "string", "null" ]
    }
  }
}

9.6. Examples

{
  "mytype" : {
    "foo" : "string",
    "bar" : [ "boolean" ],
    "foobar" : {
      "foo!" : "date",
      "bar@" : "hexBinary?"
    }
  }
}
{
  "types" : [
    {
      "name" : "mytype",
      "kind" : "object",
      "content" : [
        {
          "name" : "foo",
          "type" : "string"
        },
        {
          "name" : "bar",
          "type" : {
            "kind" : "array",
            "content" : "boolean"
          }
        },
        {
          "name" : "foobar",
          "kind" : "object",
          "content" : [
            {
              "name" : "foo",
              "type" : "date",
              "required" : true
            },
            {
              "name" : "bar",
              "type" : {
                "kind" : "union",
                "content" : [ "hexBinary", "null" ]
              },
              "unique" : true
            }
          ]
        }
      ]
    }
  ]
}

Chapter 10. Schema of Schemas

 
{
  "types" : [

    {
      "name" : "schema",
      "kind" : "object",
      "content" : [
        {
          "name" : "metadata",
          "type" : {
            "kind" : "object",
            "content" : [
              {
                "name" : "name",
                "type" : "string"
              },
              {
                "name" : "previous",
                "type" : "string"
              },
              {
                "name" : "date",
                "type" : "string"
              },
              {
                "name" : "authors",
                "type" : "array"
              }
            ]
          }
        },
        {
          "name" : "types",
          "type" : {
            "kind" : "array",
            "content" : "type"
          },
          "required" : true
        }
      ]
    },
    
    {
      "name" : "atomic-type-name",
      "kind" : "atomic",
      "baseType" : "string",
      "enumeration" : [ "atomic" ]
    },

    {
      "name" : "object-type-name",
      "kind" : "atomic",
      "baseType" : "string",
      "enumeration" : [ "object" ]
    },

    {
      "name" : "array-type-name",
      "kind" : "atomic",
      "baseType" : "string",
      "enumeration" : [ "array" ]
    },

    {
      "name" : "union-type-name",
      "kind" : "atomic",
      "baseType" : "string",
      "enumeration" : [ "union" ]
    },
    
    {
      "name" : "type",
      "kind" : "object",
      "content" : [
        { "name" : "name",  "type" : "string", "required" : true, "unique" : true },
        { "name" : "kind", "type" : "string, "required" : true },
        { "name" : "baseType", "type" : "string" }
      ]
    },

    {
      "name" : "atomic-type",
      "kind" : "object",
      "baseType" : "type",
      "content" : [
        { "name" : "kind", "type" : "atomic-type-name, "required" : true },
        { "name" : "baseType", "type" : "string", "required" : true },
        { "name" : "pattern", "type" : "string" },
        { "name" : "length", "type" : "integer" },
        { "name" : "minLength", "type" : "integer" },
        { "name" : "maxLength", "type" : "integer" },
        { "name" : "totalDigits", "type" : "integer" },
        { "name" : "fractionDigits", "type" : "integer" },
        { "name" : "maxInclusive", "type" : "atomic" },
        { "name" : "maxExclusive", "type" : "atomic" },
        { "name" : "minExclusive", "type" : "atomic" },
        { "name" : "minInclusive", "type" : "atomic" },
        { "name" : "enumeration", 
          "type" : {
            "kind" : "array",
            "content" : "atomic"
          }
        },
        { "name" : "explicitTimezone",
          "type" : {
            "kind" : "atomic",
            "baseType" : "string",
            "enumeration" : [ "required", "prohibited", "optional" ]
          }
        }
      }
    },
    
    {
      "name" : "object-type",
      "kind" : "object",
      "baseType" : "type",
      "content" : [
        { "name" : "kind", "type" : "object-type-name" },
        { "name" : "content", "type" : { "kind" : "array", "content" : "field-descriptor" } },
        { "name" : "value-type", "type" : "type-or-reference" },
        { "name" : "closed", "type" : "boolean" },
        { "name" : "enumeration",
          "type" : {
            "kind" : "array",
            "content" : [ "atomic" ]
          }
        },
      ]
    },

    {
      "name" : "field-descriptor",
      "kind" : "object",
      "baseType" : "type",
      "content" : [
        { "name" : "name",  "type" : "string", "required" : true, "unique" : true },
        { "name" : "type",  "type" : "type-or-reference", "required" : true },
        { "name" : "required", "type" : "boolean" }
        { "name" : "default", "type" : "value" }
    },
    
    {
      "name" : "array-type",
      "kind" : "object",
      "baseType" : "type",
      "content" : [
        { "name" : "kind",  "type" : "array-type-name" },
        { "name" : "content",  "type" : "array-content-descriptor" },
        { "name" : "minLength",  "type" : "integer" },
        { "name" : "maxLength",  "type" : "integer" },
        { "name" : "enumeration", 
          "type" : {
            "kind" : "array",
            "content" : [ "atomic" ]
          }
        }
      ]
    },

    {
      "name" : "array-content-descriptor",
      "kind" : "array",
      "content" : "type-or-reference",
      "minLength" : 1,
      "maxLength" : 1
    },

    {
      "name" : "union-type",
      "kind" : "object",
      "content" : [
        { "name" : "kind", "type" : "union-type-name" },
        { "name" : "content", "type" : "type-or-reference" }
      }
    },

    {
      "name" : "type-or-reference",
      "kind" : "union",
      "content" : [ "string", "type" ]
    }
  ]
}

Chapter 11. Error codes

This is a summary of all error codes

Table 11.1. Error codes

CodeReason
JDST0001Field $kind is missing in a type object.
JDST0002$baseType or $type cannot be resolved.
JDST0003Invalid value for field $kind in a type object.
JDST0004$name of an object is inconsistent with the field it is associated with in the schema document.
JDST0005Facet is less restrictive that that of base type.
JDST0006A value in $enumeration is not in the type value space.
JDST0007Inconsistent use of $kind and $baseType.
JDST0008Field $type or $name is missing in object content.
JDST0009The $closed facet cannot be set back to false if it is true for the base type.
JDST0010If the $closed facet of the base type is true, then no new fields can be added in the derived type's object content.
JDST0011The $required facet associated with a field cannot be set back to false if it is true for the base type.
JDST0012A union type cannot be used as a type annotation in TYSON.
JDST0013Builtin types are reserved and cannot be overriden or redefined.
JDST0014Two types may not have the same name within an assembled schema set.
JDST0015A candidate instance is not valid against its annotating type.
JDST0016A candidate instance is annotated with an unknown type.
JDST0017Annotation cannot be done if the candidate instance is invalid against the requested type.
JDST0018An unallowed cycle was detected either in the baseType relation or in the union membership relation.

Appendix A. Revision History

Revision History
Revision 2-8Fri Nov 2, 2018Ghislain Fourny
Changed annotation of union types to take the first type against which the instance is valid, even if it is anonymous.
Various editorial changes that do not modify the semantics of JSound.
Revision 2-7Tue Oct 16, 2018Ghislain Fourny
Allow recursive definitions.
Add definitions for subtype and supertype, based on derived type and base type.
Forbid cycles in the baseType relation.
Forbid cycles in the union type membership relation.
Adapt annotation to make the schema type win if no subtype relationship is detected.
Removed abstract property.
Revision 2-6Fri Sep 28, 2018Ghislain Fourny
Clean up object types
Clean up array types
Clean up atomic types
Clean up union types
Allow derivation of union types
Revision 2-5Fri Sep 21, 2018Ghislain Fourny
Switch to new layout with no dynamic fields
Add abstract property for types
Revision 2-4Fri Sep 14, 2018Ghislain Fourny
Clarify annotation machinery
Add table with atomic type mappings
Fine-tuned error codes
Revision 2-3Fri Sep 7, 2018Ghislain Fourny
Forbid adding new fields if the base type of an object is closed
Add illustrations to validation and annotation
Add picture of type hierarchy
Remove regular expression facet (pattern)
General cleanup
Reorganize the requirements
Add validation subsection in all type sections
Revision 2-2Mon Aug 6, 2018Ghislain Fourny
Add $unique field descriptor
Add validation semantics of arrays
Add @shorcut
Revision 2-1Tue Jul 24, 2018Ghislain Fourny
No need to escape special characters any more.
Removed a few more references to namespaces.
Changed $open to $closed and made open default.
Changed $optional to $required and made non-required default.
Rewrote schema of schemas in verbose syntax.
Added !, ?, = and | shortcuts

Index