Tabular Data Resource

A simple format to describe a single tabular data resource such as a CSV file. It includes support both for metadata such as author and title and a schema to describe the data, for example the types of the fields/columns in the data.

Authors Paul Walsh
Rufus Pollock
Media Type application/vnd.dataresource+json
JSON Schema
(for spec)
specs.frictionlessdata.io/schemas/tabular-data-resource.json
Version 1.0-rc.2
Last Updated 2 May 2017
Created 15 December 2017

Language

The key words MUST, MUST NOT, REQUIRED, SHALL, SHALL NOT, SHOULD, SHOULD NOT, RECOMMENDED, MAY, and OPTIONAL in this document are to be interpreted as described in RFC 2119.

Table Of Contents

Introduction

A Tabular Data Resource is a type of Data Resource specialized for describing tabular data like CSV files or spreadsheets.

Tabular Data Resource extends Data Resource in following key ways:

  • The schema property MUST follow the Table Schema specification
  • A new dialect property to describe the CSV dialect. This property follows the CSV Dialect specification.

Examples

A minimal Tabular Data Resource looks as follows.

// with data and a schema accessible via the local filesystem
{
  "profile": "tabular-data-resource",
  "name": "resource-name",
  "path": [ "resource-path.csv" ],
  "schema": "tableschema.json"
}

// with data accessible via http
{
  "profile": "tabular-data-resource",
  "name": "resource-name",
  "path": [ "http://example.com/resource-path.csv" ],
  "schema": "http://example.com/tableschema.json"
}

A minimal Tabular Data Resource example using the data property to inline data looks as follows.

{
  "profile": "tabular-data-resource",
  "name": "resource-name",
  "data": [
    {
      "id": 1,
      "first_name": "Louise"
    },
    {
      "id": 2,
      "first_name": "Julia"
    }
  ],
  "schema": {
    "fields": [
      {
        "name": "id",
        "type": "integer"
      },
      {
        "name": "first_name",
        "type": "string"
      }
    ],
    "primaryKey": "id"
  }
}

A comprehensive Tabular Data Resource example with all required, recommended and optional properties looks as follows.

{
  "profile": "tabular-data-resource",
  "name": "solar-system",
  "path": "http://example.com/solar-system.csv",
  "title": "The Solar System",
  "description": "My favourite data about the solar system.",
  "format": "csv",
  "mediatype": "text/csv",
  "encoding": "utf-8",
  "bytes": 1,
  "hash": "",
  "schema": {
    "fields": [
      {
        "name": "id",
        "type": "integer"
      },
      {
        "name": "name",
        "type": "string"
      },
      {
        "name": "description",
        "type": "string"
      }
    ],
    "primaryKey": "id"
  },
  "dialect": {
    "delimiter": ",",
    "doubleQuote": true
  },
  "sources": "",
  "licenses": ""
}

Specification

A Tabular Data Resource MUST be a Data Resource, that is it MUST conform to the Data Resource specification.

In addition:

  • The Data Resource schema property MUST follow the Table Schema specification
  • There MUST be a profile property with the value tabular-data-resource
  • The data the Data Resource describes MUST:
    • If non-inline: Be a CSV file
    • If inline data: be "JSON tabular data" that is array of data rows where each row is an array or object (see below)

CSV file requirements

CSV files in the wild come in a bewildering array of formats. There is a standard for CSV files described in RFC 4180, but unfortunately this standard does not reflect reality. In Tabular Data Resource, CSV files MUST follow RFC 4180 with the following important exceptions allowed:

File encoding

Files MUST:

  • EITHER be encoded as UTF-8 (the default)
  • OR the Tabular Data Resource MUST include an encoding property and the files MUST follow that encoding

NB: the RFC requires 7-bit ASCII encoding.

CSV Dialect

The line terminator character MUST be LF or CRLF (the RFC allows CRLF only).

If the CSV differs from this or the RFC in any other way regarding dialect (e.g. line terminators, quote charactors, field delimiters), the Tabular Data Resource MUST contain a dialect property describing its dialect. The dialect property MUST follow the CSV Dialect specification.

JSON Tabular Data

JSON Tabular Data MUST be an array where each item in the array MUST be:

  • EITHER: an array where each entry in the array is the value for that cell in the table
  • OR: an object where each key corresponds to the header for that row and the value corresponds to the cell value for that row for that header

Row Arrays

[
  [ "A", "B", "C" ],
  [ 1, 2, 3 ],
  [ 4, 5, 6 ]
]

Row Objects

[
  { "A": 1, "B": 2, "C": 3 },
  { "A": 4, "B": 5, "C": 6 } 
]

Changelog

See the Changelog for information.

Implementations

The following implementations are available for tabular-data-resource:

See the implementation page for further information on writing an implementation of a Frictionless Data specification.