Schema Identification

Base URI

$ref

$id

$defs

Recursion

Extending Recursive Schemas

Bundling When writing computer programs of even moderate complexity, it's commonly accepted that "structuring" the program into reusable functions is better than copying-and-pasting duplicate bits of code everywhere they are used. Likewise in JSON Schema, for anything but the most trivial schema, it's really useful to structure the schema into parts that can be reused in a number of places. This chapter will present the tools available for reusing and structuring schemas as well as some practical examples that use those tools. Schema Identification ¶ Like any other code, schemas are easier to maintain if they can be broken down into logical units that reference each other as necessary. In order to reference a schema, we need a way to identify a schema. Schema documents are identified by non-relative URIs. Schema documents are not required to have an identifier, but you will need one if you want to reference one schema from another. In this documentation, we will refer to schemas with no identifier as "anonymous schemas". In the following sections we will see how the "identifier" for a schema is determined. URI terminology can sometimes be unintuitive. In this document, the following definitions are used. URI [[1]](https://datatracker.ietf.org/doc/html/rfc3986#section-3) or non-relative URI : A full URI containing a scheme ( https ). It may contain a URI fragment ( #foo ). Sometimes this document will use "non-relative URI" to make it extra clear that relative URIs are not allowed.

Draft-specific info : Draft 4 Draft 4-7 In Draft 4, $id is just id (without the dollar sign).

This is analogous to the <base> tag in HTML. When the $id keyword appears in a subschema, it means something slightly different. See the bundling section for more. Let's assume the URIs https://example.com/schema/address and https://example.com/schema/billing-address both identify the following schema. schema schema 1 { 2 " $id " : "/schemas/address" , 3 ﻿

4 " type " : "object" , 5 " properties " : { 6 " street_address " : { " type " : "string" } , 7 " city " : { " type " : "string" } , 8 " state " : { " type " : "string" } 9 } , 10 " required " : [ "street_address" , "city" , "state" ] 11 } No matter which of the two URIs is used to retrieve this schema, the base URI will be https://example.com/schemas/address , which is the result of the $id URI-reference resolving against the Retrieval URI. However, using a relative reference when setting a base URI can be problematic. For example, we couldn't use this schema as an anonymous schema because there would be no Retrieval URI and you can't resolve a relative reference against nothing. For this and other reasons, it's recommended that you always use an absolute URI when declaring a base URI with $id . The base URI of the following schema will always be https://example.com/schemas/address no matter what the Retrieval URI was or if it's used as an anonymous schema. schema schema 1 { 2 " $id " : "https://example.com/schemas/address" , 3 ﻿

4 " type " : "object" , 5 " properties " : { 6 " street_address " : { " type " : "string" } , 7 " city " : { " type " : "string" } , 8 " state " : { " type " : "string" } 9 } , 10 " required " : [ "street_address" , "city" , "state" ] 11 } JSON Pointer ¶ In addition to identifying a schema document, you can also identify subschemas. The most common way to do that is to use a JSON Pointer in the URI fragment that points to the subschema. A JSON Pointer describes a slash-separated path to traverse the keys in the objects in the document. Therefore, /properties/street_address means: 1) find the value of the key properties

2) within that object, find the value of the key street_address The URI https://example.com/schemas/address#/properties/street_address identifies the highlighted subschema in the following schema. schema schema 1 { 2 " $id " : "https://example.com/schemas/address" , 3 " type " : "object" , 4 " properties " : { 5 " street_address " : 6 { " type " : "string" } , 7 " city " : { " type " : "string" } , 8 " state " : { " type " : "string" } 9 } , 10 " required " : [ "street_address" , "city" , "state" ] 11 } $anchor ¶ A less common way to identify a subschema is to create a named anchor in the schema using the $anchor keyword and using that name in the URI fragment. Anchors must start with a letter followed by any number of letters, digits, - , _ , : , or . .

Draft-specific info : Draft 4 Draft 6-7 In Draft 4, you declare an anchor the same way you do in Draft 6-7 except that $id is just id (without the dollar sign).

If a named anchor is defined that doesn't follow these naming rules, then behavior is undefined. Your anchors might work in some implementation, but not others. The URI https://example.com/schemas/address#street_address identifies the subschema on the highlighted part of the following schema. schema schema 1 { 2 " $id " : "https://example.com/schemas/address" , 3 " type " : "object" , 4 " properties " : { 5 " street_address " : 6 { 7 " $anchor " : "street_address" , " type " : "string" 8 } , 9 " city " : { " type " : "string" } , 10 " state " : { " type " : "string" } 11 } , 12 " required " : [ "street_address" , "city" , "state" ] 13 } $ref ¶ A schema can reference another schema using the $ref keyword. The value of $ref is a URI-reference that is resolved against the schema's Base URI. When evaluating a $ref , an implementation uses the resolved identifier to retrieve the referenced schema and applies that schema to the instance. Draft-specific info In Draft 4-7, $ref behaves a little differently. When an object contains a $ref property, the object is considered a reference, not a schema. Therefore, any other properties you put in that object will not be treated as JSON Schema keywords and will be ignored by the validator. $ref can only be used where a schema is expected. For this example, let's say we want to define a customer record, where each customer may have both a shipping and a billing address. Addresses are always the same — they have a street address, city and state — so we don't want to duplicate that part of the schema everywhere we want to store an address. Not only would that make the schema more verbose, but it makes updating it in the future more difficult. If our imaginary company were to start doing international business in the future and we wanted to add a country field to all the addresses, it would be better to do this in a single place rather than everywhere that addresses are used. schema schema 1 { 2 " $id " : "https://example.com/schemas/customer" , 3 ﻿

4 " type " : "object" , 5 " properties " : { 6 " first_name " : { " type " : "string" } , 7 " last_name " : { " type " : "string" } , 8 " shipping_address " : { " $ref " : "/schemas/address" } , 9 " billing_address " : { " $ref " : "/schemas/address" } 10 } , 11 " required " : [ "first_name" , "last_name" , "shipping_address" , "billing_address" ] 12 } The URI-references in $ref resolve against the schema's Base URI ( https://example.com/schemas/customer ) which results in https://example.com/schemas/address . The implementation retrieves that schema and uses it to evaluate the "shipping_address" and "billing_address" properties. When using $ref in an anonymous schema, relative references may not be resolvable. Let's assume this example is used as an anonymous schema schema schema 1 { 2 " type " : "object" , 3 " properties " : { 4 " first_name " : { " type " : "string" } , 5 " last_name " : { " type " : "string" } , 6 " shipping_address " : { " $ref " : "https://example.com/schemas/address" } , 7 " billing_address " : { " $ref " : "/schemas/address" } 8 } , 9 " required " : [ "first_name" , "last_name" , "shipping_address" , "billing_address" ] 10 } The $ref at /properties/shipping_address can resolve just fine without a non-relative base URI to resolve against, but the $ref at /properties/billing_address can't resolve to a non-relative URI and therefore can't can be used to retrieve the address schema. $defs ¶ Sometimes we have small subschemas that are only intended for use in the current schema and it doesn't make sense to define them as separate schemas. Although we can identify any subschema using JSON Pointers or named anchors, the $defs keyword gives us a standardized place to keep subschemas intended for reuse in the current schema document. Let's extend the previous customer schema example to use a common schema for the name properties. It doesn't make sense to define a new schema for this and it will only be used in this schema, so it's a good candidate for using $defs . schema schema 1 { 2 " $id " : "https://example.com/schemas/customer" , 3 ﻿

4 " type " : "object" , 5 " properties " : { 6 " first_name " : { " $ref " : "#/$defs/name" } , 7 " last_name " : { " $ref " : "#/$defs/name" } , 8 " shipping_address " : { " $ref " : "/schemas/address" } , 9 " billing_address " : { " $ref " : "/schemas/address" } 10 } , 11 " required " : [ "first_name" , "last_name" , "shipping_address" , "billing_address" ] , 12 ﻿

13 " $defs " : { 14 " name " : { " type " : "string" } 15 } 16 } $defs isn't just good for avoiding duplication. It can also be useful for writing schemas that are easier to read and maintain. Complex parts of the schema can be defined in $defs with descriptive names and referenced where it's needed. This allows readers of the schema to more quickly and easily understand the schema at a high level before diving into the more complex parts. It's possible to reference an external subschema, but generally you want to limit a $ref to referencing either an external schema or an internal subschema defined in $defs . Recursion ¶ The $ref keyword may be used to create recursive schemas that refer to themselves. For example, you might have a person schema that has an array of children , each of which are also person instances. schema schema 1 { 2 " type " : "object" , 3 " properties " : { 4 " name " : { " type " : "string" } , 5 " children " : { 6 " type " : "array" , 7 " items " : { " $ref " : "#" } 8 } 9 } 10 } A snippet of the British royal family tree data 1 { 2 " name " : "Elizabeth" , 3 " children " : [ 4 { 5 " name " : "Charles" , 6 " children " : [ 7 { 8 " name " : "William" , 9 " children " : [ 10 { " name " : "George" } , 11 { " name " : "Charlotte" } 12 ] 13 } , 14 { 15 " name " : "Harry" 16 } 17 ] 18 } 19 ] 20 } compliant to schema compliant to schema Above, we created a schema that refers to itself, effectively creating a "loop" in the validator, which is both allowed and useful. Note, however, that a $ref referring to another $ref could cause an infinite loop in the resolver, and is explicitly disallowed. schema schema 1 { 2 " $defs " : { 3 " alice " : { " $ref " : "#/$defs/bob" } , 4 " bob " : { " $ref " : "#/$defs/alice" } 5 } 6 } Extending Recursive Schemas ¶ New in draft 2019-09 New in draft 2019-09 Documentation Coming Soon Bundling ¶ Working with multiple schema documents is convenient for development, but it's often more convenient for distribution to bundle all of your schemas into a single schema document. This can be done using the $id keyword in a subschema. When $id is used in a subschema, it indicates an embedded schema. The identifier for the embedded schema is the value of $id resolved against the Base URI of the schema it appears in. A schema document that includes embedded schemas is called a Compound Schema Document. Each schema with an $id in a Compound Schema Document is called a Schema Resource.

Draft-specific info : Draft 4 Draft 4-7 In Draft 4, $id is just id (without the dollar sign).