For the vast majority of schema authors, we hope that these changes are minimally disruptive.
The most likely to be frustrating is that
format is no longer treated as a validation assertion by default (although it is still possible for an application or user to configure a validator to treat it as one). We decided this was acceptable because many schema authors are already extremely frustrated by its inconsistent behavior.
For implementors, there is a lot more to consider, and further guidance on implementation topics will be forthcoming.
- Incompatible Changes
- Semi-incompatible Changes
- Annotations, Errors, and Outputs
- Keyword Changes
For a basic list of changes to each document, see their change logs:
Incompatible Changes #
- By default,
formatis no longer an assertion. This has been done because the inconsistent implementation of
formatas an assertion has been an endless source of surprising problems for schema authors. The default behavior will now be predictable, if not ideal. There are several ways to turn on assertion functionality, as explained below. However, we recommend doing semantic validation in the application layer.
- Plain name fragments are no longer defined with
$id, but instead with the new keyword
$anchor(which has a different syntax).
$idcannot contain a fragment anymore (except possibly an empty fragment, although that is discouraged).
- In cases where multiple URIs could be used for the same schema, some are now discouraged. These are believed to have rarely been used, as the behavior involved was fairly confusing and not well explained until the updated version of draft-07 (draft-handrews-json-schema-01). If this doesn’t mean much to you, you are probably safe.
Semi-incompatible Changes #
The old syntax for these keywords is not an error (and the default meta-schema still validates them), so implementations can therefore offer a compatibility mode. However, migrating to the new keywords is straightforward and should be preferred.
dependencieshas been split into
Annotations, Errors, and Outputs #
Annotation keywords such as
default have always been a part of JSON Schema, but without any guidance on how to make use of them. This draft formalizes how implementations can make annotation information available to applications.
Similarly, there has not previously been guidance on what constitutes useful error reporting when validation fails.
To solve both of these problems, we now recommend that implementations support one or more of standardized output formats.
Keyword Changes #
All keywords have now been organized into vocabularies, with the Core and Validation specifications containing multiple vocabularies. In this process, some keywords have moved from Validation into Core.
Core Vocabulary #
||renamed||Note that the standard meta-schema still reserves
||changed||Only URI-references without fragments are allowed; see
||new||Used for extending recursive schemas such as meta-schemas|
||changed||Other keywords are now allowed alongside of it|
||new||Has effects only in meta-schemas, and is used to control what keywords an implementation must or can support in order to process a schema using that meta-schema|
Applicator Vocabulary #
These keywords were formerly found in the Validation Specification.
||split||This is the schema form of
The other applicator vocabulary keywords are
Validation Vocabulary #
||split||This is the string array form of
||new||Assertion for controlling how many times a subschema must be matched within an array|
Format Vocabulary #
format keywords has always been problematic due to its optional nature. There has never been a way to ensure that the implementation processing your schema supports
format at all, or if it does, to what degree it validates each type of format. In theory, since each format references a standard specification, if a format is supported, it should behave consistently. In practice, this is not the case.
There are two ways for an application to validate formats: It can rely on a JSON Schema implementation to validate them (which may or may not have the expected results), or it can note where the
format keyword has been used and perform its own validation based on that. This second approach is supported by treating
format as an annotation keyword and supporting the basic, detailed, or verbose output formats.
To impose some predictability on this system, the behavior has changed in several ways, as illustrated below. The key difference here is that
format validation is now predictably off by default, but can be configured to be turned on. In draft-07, it was on (but possibly unimplemented) by default and could be configured to be turned off.
In the following charts, the “supported” column refers to whether and (for
2019-09) to what degree the implementation claims to support the
format keyword. The “configuration” column refers to whether some non-default behavior for
format is configured somehow (in a configuration file, or through a command-line option, or whatever).
Summary of draft-07 behavior
|yes||default (on)||inconsistently validated|
Obviously, each implementation will behave consistently from schema to schema, although some formats may be supported more thoroughly than others despite the wording in the specification. However, complex formats are, in practice, supported to different degrees in each implementation. If they are supported at all.
Summary of 2019-09 behavior
The goal with this draft is to make the default behavior predictable, with the inconsistent behavior as an opt-in feature. This is not entirely satisfactory, but we feel that it is a good first step to reduce the number of complaints seen around surprising results. This way, there should at least be fewer surprises.
“best effort” validation is a fairly weak requirement, which matches how things work in practice today. Simple formats are probably fully valid, complex formats may be minimally validated or even not validated at all.
“full syntax” validation means that you can expect a reasonably thorough syntactic validation, probably corresponding to whatever commonly available libraries can do in the implementation language. For formats such as IP addresses and dates, this is expected to be complete validation. For more complex formats such as email addresses, support will probably still vary significantly. It’s unclear how many implementations have ever provided this level of support.
An outcome of vocabulary error means that the implementation will refuse to process the schema as it cannot satisfy the vocabulary requirement.
|best effort||default (off)||false||not validated|
|best effort||default (off)||true||vocabulary error|
|best effort||on||false||best effort validation|
|best effort||on||true||vocabulary error|
|full syntax||default (off)||false||not validated|
|full syntax||default (off)||true||full syntax validation|
|full syntax||on||false||full syntax validation|
|full syntax||on||true||full syntax validation|
Note that, given that almost no draft-07 or earlier implementations have offered strict and complete validation of every single format, it seems unlikely that any implementations will support option 3 option in practice.
Additionally, two new formats were added, and a specification reference was updated:
||added||The duration format is from the ISO 8601 ABNF as given in Appendix A of RFC 3339|
||updated||Use RFC 1123 instead of RFC 1034; this allows for a leading digit|
||added||A string instance is valid against this attribute if it is a valid string representation of a UUID, according to RFC4122|
Content Vocabulary #
These keywords are now specified purely as annotations, and never assertions. Some guidance is provided around how an implementation can optionally offer further automatic processing of this information outside of the validation process.
||updated||Encodings from RFC 4648 are now allowed, and take precedence over RFC 2045 when there is a difference|
||added||Schema for use with the decoded content string; note that it is not automatically applied as not all content media types can be understood in advance|
Meta-Data Vocabulary #
||added||Used to indicate that a field is deprecated in some application-specific manner|
Hyper-Schema Vocabulary #
||changed||Can now be an array of values instead of just a string|