Tabular Data Resource
A simple format to describe tabular data with a schema and metadata.

Authors Paul Walsh
Rufus Pollock
Version 1.0-rc.1
Last Updated 30 January 2017
Created 15 December 2017

Abstract

Tabular Data Resource is a simple container format used to describe and package a tabular data source with a schema that describes it, and additional metadata about that data source. By providing a minimum set of required properties and a range of optional properties, the format enables a simple contract for data interoperability that is governed by minimalism.

Goals

Tabular Data Resource shares the design philosophy of all Frictionless Data Specifications, being:

  • Requirements that are driven by simplicity
  • Extensibility and customisation by design
  • Metadata that is human-editable and machine-usable
  • Reuse of existing standard formats for data
  • Language-, technology- and infrastructure-agnostic

Changelog

See the Changelog for information.

Language

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

Specification

A Tabular Data Resource is a simple container format that describes and packages a data source with additional information about that source.

At a minimum, a Tabular Data Resource requires a name property, and one of the path or data properties. name provides a human and machine-readable identifier for the Tabular Data Resource. data provides the data source inlined directly into the descriptor. path is a URI or an array of URIs: to a file(s) on a file system, or over HTTP.

A range of other properties can be declared to provide a richer set of metadata.

CSV file requirements

CSV files in the wild come in a bewildering array of formats. There is a standard for CSV files described in RFC 4180, but unfortunately this standard does not reflect reality. In Tabular Data Resource, CSV files MUST follow RFC 4180 with the following important exceptions allowed:

  • Files are encoded as UTF-8 by default, or they must be encoded according to the encoding property of the Tabular Data Resource (the RFC requires 7-bit ASCII encoding)
  • Dialect conformance SHOULD be declared on the dialect property of the Tabular Data Resource, which is a CSV Dialect descriptor

Examples

Example 1

A minimal Tabular Data Resource looks as follows.

# with data and a schema accessible via the local filesystem
{
  "name": "resource-name",
  "path": "resource-path.csv",
  "schema": "tableschema.json"
}

# with data accessible via http
{
  "name": "resource-name",
  "path": "http://example.com/resource-path.csv",
  "schema": "http://example.com/tableschema.json"
}

Example 2

A minimal Tabular Data Resource example using the <code>data</code> property to inline data looks as follows.

{
  "name": "resource-name",
  "data": [
    {
      "id": 1,
      "first_name": "Louise"
    },
    {
      "id": 2,
      "first_name": "Julia"
    }
  ],
  "schema": {
    "fields": [
      {
        "name": "id",
        "type": "integer"
      },
      {
        "name": "first_name",
        "type": "string"
      }
    ],
    "primaryKey": "id"
  }
}

Example 3

A comprehensive Tabular Data Resource example with all required, recommended and optional properties looks as follows.

{
  "name": "solar-system",
  "path": "http://example.com/solar-system.csv",
  "title": "The Solar System",
  "description": "My favourite data about the solar system.",
  "format": "csv",
  "mediatype": "text/csv",
  "encoding": "utf-8",
  "bytes": 1,
  "hash": "",
  "schema": {
    "fields": [
      {
        "name": "id",
        "type": "integer"
      },
      {
        "name": "name",
        "type": "string"
      },
      {
        "name": "description",
        "type": "string"
      }
    ],
    "primaryKey": "id"
  },
  "sources": "",
  "licenses": ""
}

Descriptor

A valid Tabular Data Resource descriptor is an object conforming with the formal reference outlined in Properties, and, and the following more general requirements.

Form

The descriptor MUST be valid JSON, as described in RFC 4627, and SHOULD be in one of the following forms:

  1. A file named dataresource.json.
  2. An object, either on its own or nested in another data structure.

Media type

The media type for Tabular Data Resource descriptors as MUST be application/vnd.dataresource+json. This media type is registered with IANA).

URIs

Several properties are defined as URI-formatted strings, which are to be considered as a subset of the formal URI specification described in RFC 3986. The additional constraints imposed are as follows:

  1. The only supported schemes are http and https. Absence of a scheme indicates either a POSIX path or a JSON Pointer (see below).
  2. URLs, indicated by http or https, MUST be fully qualified.
  3. POSIX paths, are supported for referencing local files, with the security restraint that they MUST be relative siblings or children of the descriptor. Absolute paths (/) and relative parent paths (../) MUST NOT be used, and implementations SHOULD NOT support these path types.
  4. JSON Pointers are supported as a general referencing mechanism to other properties in the same descriptor, and therefore MUST start with the pound symbol (#).

Properties

This section presents a complete description of required and optional properties for a Tabular Data Resource descriptor.

Adherence to the specification does not imply that additional, non-specified properties cannot be used: a descriptor MAY include any number of properties in additional to those described as required and optional fields.

Required properties

A Tabular Data Resource descriptor MUST include the following properties.

profile

The profile of this descriptor.

Every Package and Resource descriptor has a profile. The default profile, if none is declared, is `default`. The namespace for the profile is the type of descriptor, so, `default` for a Package descriptor is not the same as `default` for a Resource descriptor.
Examples
"profile": "tabular"
"profile": "fiscal"
"profile": "http://example.com/my-profiles-json-schema.json"

name

An identifier string. Lower case characters with '.', '_', '-' and '/' are allowed.

This is ideally a url-usable and human-readable name. Name `SHOULD` be invariant, meaning it `SHOULD NOT` change when its parent diescriptor is updated.
Examples
"name": "my-nice-name"

data

A reference to the data for this resource. `data` `MUST` be an array of valid URIs.

The dereferenced value of each referenced data source in the `data` `array` `MUST` be commensurate with a native, dereferenced representation of the data the resource describes. For example, in a *Tabular* Data Resource, this means that the dereferenced value of `data` `MUST` be an array.
Examples
"data": [ "file.csv", "file2.csv" ]
"data": [ "http://example.com/file.csv", "http://example.com/file2.csv" ]
"data": [ "#/data/my-data", "#/data/my-data2" ]
Items

Each item in the Data array is a **URI** string. The property is **required**.

A minimal example of URI looks like:

"uri": "file.csv"

schema

A Table Schema for this resource, compliant with the [Table Schema](/tableschema/) specification.

Optional properties

A Tabular Data Resource descriptor SHOULD include the following properties.

title

A human-readable title.

Examples
"title": "My Package Title"

description

A text description. Markdown is encouraged.

Examples
# My Package description
All about my package.

homepage

The home on the web that is related to this data package.

Examples
"homepage": { "name": "My Web Page", "uri": "http://example.com/" }

sources

The raw sources for this resource.

Examples
"sources": [ { "name": "World Bank and OECD", "uri": "http://data.worldbank.org/indicator/NY.GDP.MKTP.CD" } ]
Items

Each item in the Sources array is a **Source** object. The property is **required**.

All specified Source properties are as follows:

name

An identifier string. Lower case characters with '.', '_', '-' and '/' are allowed.

uri

A URI (with some restrictions), being a fully qualified HTTP address, a relative POSIX path, or a JSON Pointer.

email

An email address.

licenses

The license(s) under which the resource is published.

This property is not legally binding and does not guarantee that the package is licensed under the terms defined herein.
Examples
"licenses": [ { "name": "ODC-PDDL-1.0", "uri": "http://opendatacommons.org/licenses/pddl/" } ]
Items

Each item in the Licenses array is a **License** object. The property is **required**.

All specified License properties are as follows:

name

An identifier string. Lower case characters with '.', '_', '-' and '/' are allowed.

uri

A URI (with some restrictions), being a fully qualified HTTP address, a relative POSIX path, or a JSON Pointer.

title

A human-readable title.

dialect

The CSV dialect descriptor.

Examples
"dialect": { "delimiter": ";" }
"dialect": { "delimiter": "\t", "quoteChar": "'" }

format

The file format of this resource.

`csv`, `xls`, `json` are examples of common formats.
Examples
"format": "xls"

mediatype

The media type of this resource. Can be any valid media type listed with [IANA](https://www.iana.org/assignments/media-types/media-types.xhtml).

Examples
"mediatype": "text/csv"

encoding

The file encoding of this resource.

Examples
"encoding": "utf-8"

bytes

The size of this resource in bytes.

Examples
"bytes": 2082

hash

The MD5 hash of this resource. Indicate other hashing algorithms with the {algorithm}:{hash} format.

Examples
"hash": "d25c9c77f588f5dc32059d2da1136c02"
"hash": "SHA256:5262f12512590031bbcc9a430452bfd75c2791ad6771320bb4b5728bfb78c4d0"