Table Schema
A simple format to declare a schema for tabular data.

Authors Paul Walsh
Rufus Pollock
Version 1.0.0-rc.1
Last Updated 30 January 2017
Created 12 November 2012

Abstract

Table Schema is a simple, language and implementation agnostic, way to declare a schema for tabular data. The Table Schema specification has been designed to be expressible in JSON.

Goals

Table Schema shares the design philosophy of all Frictionless Data Specifications, being:

  • Requirements that are driven by simplicity
  • Extensibility and customisation by design
  • Metadata that is human-editable and machine-usable
  • Reuse of existing standard formats for data
  • Language-, technology- and infrastructure-agnostic

Changelog

See the Changelog for information.

Language

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

Specification

Table Schema is a simple language- and implementation-agnostic way to declare a schema for tabular data. Table Schema is well suited for use cases around handling and validating tabular data in text formats such as CSV, but its utility extends well beyond this core usage, towards a range of application a where data benefits from a portable schema format.

In its simplest form, Table Schema is an object with a fields property as an array, and each item in the array is a field descriptor, with information on the expected type for data in each field. In this basis, Table Schema also supports declaring more complex shapes for data via format and constraints options per field, as well as more advanced data structures with handling of primaryKey and foreignKeys constructs.

While field descriptors can support complex types via type, format and constraints, The only required property for a field is name, and the absence of the additional modifiers indicates that the field is a string of the default format (meaning, any format), and values are not required (constraints.required == false).

As a foreign key often needs to be able to reference other data resources, The referenced resources need to be in a specific structure in order to work. Therefore, foreign keys only work across valid Tabular Data Resource objects. The special case of reference.resource as an empty string is a self-reference, for foreign keys to self.

Table Schema is heavily based on similar specifications and implementations, in particular XML Schema, GoogleBigQuery, JSON Schema, DSPL, HTML5 Forms, and Elasticsearch.

See below for examples, and a full description of the properties found on Table Schema.

Examples

Example 1

A minimal Table Schema looks as follows.

{
  "fields": [
    {
      "name": "code",
      "type": "string"
    },
    {
      "name": "parent",
      "type": "string"
    }
  ]
}

Example 2

A slightly expanded Table Schema with a primary key and a foreign key looks as follows.

{
  "fields": [
    {
      "name": "code",
      "type": "string",
      "constraints": {
        "required": true
      }
    },
    {
      "name": "parent",
      "type": "string",
      "constraints": {
        "required": true
      }
    }
  ],
  "primaryKey": [
    "code",
    "parent"
  ],
  "foreignKeys": [
    {
      "fields": [
        "parent"
      ],
      "reference": {
        "resource": "",
        "fields": [
          "code"
        ]
      }
    }
  ]
}

Example 3

A complex Table Schema using many features of the specification looks as follows.

{
  "fields": [
    {
      "name": "code",
      "type": "string",
      "constraints": {
        "required": true,
        "minLength": 4,
        "maxLength": 4
      }
    },
    {
      "name": "parent",
      "type": "string",
      "constraints": {
        "required": true,
        "minLength": 4,
        "maxLength": 4
      }
    },
    {
      "name": "title",
      "type": "string",
      "constraints": {
        "required": true
      }
    },
    {
      "name": "description",
      "type": "string"
    },
    {
      "name": "contact",
      "type": "string",
      "format": "email",
      "constraints": {
        "unique": true
      }
    },
    {
      "name": "employees",
      "type": "integer",
      "constraints": {
        "required": true
      }
    },
    {
      "name": "rating",
      "type": "number"
    }
  ],
  "primaryKey": [
    "code",
    "parent"
  ],
  "foreignKeys": {
    "fields": [
      "parent"
    ],
    "reference": {
      "resource": "",
      "fields": [
        "code"
      ]
    }
  },
  "missingValues": [
    "", "-", "None"
  ]
}

Descriptor

A valid Table Schema descriptor is an object conforming with the formal reference outlined in Properties, and, and the following more general requirements.

Form

The descriptor MUST be valid JSON, as described in RFC 4627, and SHOULD be in one of the following forms:

  1. A file named tableschema.json.
  2. An object, either on its own or nested in another data structure.

Media type

The media type for Table Schema descriptors as MUST be application/vnd.tableschema+json. This media type is registered with IANA).

URIs

Several properties are defined as URI-formatted strings, which are to be considered as a subset of the formal URI specification described in RFC 3986. The additional constraints imposed are as follows:

  1. The only supported schemes are http and https. Absence of a scheme indicates either a POSIX path or a JSON Pointer (see below).
  2. URLs, indicated by http or https, MUST be fully qualified.
  3. POSIX paths, are supported for referencing local files, with the security restraint that they MUST be relative siblings or children of the descriptor. Absolute paths (/) and relative parent paths (../) MUST NOT be used, and implementations SHOULD NOT support these path types.
  4. JSON Pointers are supported as a general referencing mechanism to other properties in the same descriptor, and therefore MUST start with the pound symbol (#).

Properties

This section presents a complete description of required and optional properties for a Table Schema descriptor.

Adherence to the specification does not imply that additional, non-specified properties cannot be used: a descriptor MAY include any number of properties in additional to those described as required and optional fields.

Required properties

A Table Schema descriptor MUST include the following properties.

fields

An `array` of Table Schema Field objects.

Examples
"fields": [ { "name": "my-field-name" } ]
"fields": [ { "name": "my-field-name", "type": "number" }, { "name": "my-field-name-2", "type": "string", "format": "email" } ]
Items

Each item in the array is a **Table Schema Field** object. The name property is **required**.

A minimal example of Table Schema Field looks like:

{ "name": "my-name", "type": "string" }

All specified Table Schema Field properties are as follows:

name

An identifier string. Lower case characters with '.', '_', '-' and '/' are allowed.

title

A human-readable title.

description

A text description. Markdown is encouraged.

type

The primitive type for this field.

string

The field contains strings, that is, sequences of characters.

number

The field contains numbers of any kind including decimals.

integer

The field contains integers - that is whole numbers.

date

The field contains temporal date values.

time

The field contains temporal time values.

datetime

The field contains temporal datetime values.

year

A calendar year, being an integer with 4 digits. Equivalent to [gYear in XML Schema](https://www.w3.org/TR/xmlschema-2/#gYear)

yearmonth

A calendar year month, being an integer with 1 or 2 digits. Equivalent to [gYearMonth in XML Schema](https://www.w3.org/TR/xmlschema-2/#gYearMonth)

boolean

The field contains boolean (true/false) data.

object

The field contains data which can be parsed as a valid JSON object.

geopoint

The field contains data describing a geographic point.

geojson

The field contains a JSON object according to GeoJSON or TopoJSON spec.

array

The field contains data that can be parsed as a JSON array.

duration

The field contains a duration of time.

any

Any value is accepted, including values that are not captured by the type/format/constraint requirements of the specification.

string

The field contains strings, that is, sequences of characters.

number

The field contains numbers of any kind including decimals.

integer

The field contains integers - that is whole numbers.

date

The field contains temporal date values.

time

The field contains temporal time values.

datetime

The field contains temporal datetime values.

year

A calendar year, being an integer with 4 digits. Equivalent to [gYear in XML Schema](https://www.w3.org/TR/xmlschema-2/#gYear)

yearmonth

A calendar year month, being an integer with 1 or 2 digits. Equivalent to [gYearMonth in XML Schema](https://www.w3.org/TR/xmlschema-2/#gYearMonth)

boolean

The field contains boolean (true/false) data.

object

The field contains data which can be parsed as a valid JSON object.

geopoint

The field contains data describing a geographic point.

geojson

The field contains a JSON object according to GeoJSON or TopoJSON spec.

array

The field contains data that can be parsed as a JSON array.

duration

The field contains a duration of time.

any

Any value is accepted, including values that are not captured by the type/format/constraint requirements of the specification.

rdfType

The RDF Type for this field.

constraints

Table Schema Field Constraints

examples

Optional properties

A Table Schema descriptor SHOULD include the following properties.

primaryKey

A primary key is an array of field names, whose values `MUST` uniquely identify each row in the table.

Each string in the `primaryKey` array `MUST` be unique, and `MUST` match a field name in the associated table. It is acceptable to have an array with a single value, indicating that the value of a single field is the primary key.
Examples
"primaryKey": [ "name" ]
"primaryKey": [ "first", "last_name" ]
Items

Each item in the array is a **** string. The property is **required**.

foreignKeys

Examples
"foreignKeys": [ { "fields": "state", "reference": { "resource": "the-resource", "fields": "state_id" } } ]
"foreignKeys": [ { "fields": "state", "reference": { "resource": "__self__", "fields": "id" } } ]
Items

Each item in the array is a **Table Schema Foreign Key** object. The fields, reference properties are **required**.

All specified Table Schema Foreign Key properties are as follows:

fields

Fields that make up the primary key.

Items

Each item in the array is a **** string. The property is **required**.

reference

missingValues

Values that when encountered in the source, should be considered as `null`, 'not present', or 'blank' values.

Many datasets arrive with missing data values, either because a value was not collected or it never existed. Missing values may be indicated simply by the value being empty in other cases a special value may have been used e.g. `-`, `NaN`, `0`, `-9999` etc. The `missingValues` property provides a way to indicate that these values should be interpreted as equivalent to null. `missingValues` are strings rather than being the data type of the particular field. This allows for comparison prior to casting and for fields to have missing value which are not of their type, for example a `number` field to have missing values indicated by `-`. The default value of `missingValue` for a non-string type field is the empty string `''`. For string type fields there is no default for `missingValue` (for string fields the empty string `''` is a valid value and need not indicate null).
Examples
"missingValues": [ "-", "NaN", "" ]
Items

Each item in the array is a **** string. The property is **required**.