# Data Package Identifier
Data Package Identifiers are small JSON-oriented structure or strings which identify a Data Package (and, usually, its location).
Author(s) | Rufus Pollock |
---|---|
Created | 17 August 2014 |
Updated | 02 November 2014 |
JSON Schema | data-package.json |
Version | 1 |
# Language
The key words MUST
, MUST NOT
, REQUIRED
, SHALL
, SHALL NOT
, SHOULD
, SHOULD NOT
, RECOMMENDED
, MAY
, and OPTIONAL
in this document are to be interpreted as described in RFC 2119
# Introduction
Data Package Identifiers are a simple way to identify a Data Package (and its location) using a string or small JSON object.
It exists because of the consistent need across applications to identify a Data Package. For example, in command line tools or libraries one will frequently want to take a Data Package Identifier as an argument.
For example, DataHub (opens new window)’s data-cli
tool has commands like:
# gdp is a Data Package identifier
data info gdp
# https://github.com/datasets/gold-prices is a Data Package identifier
data install https://github.com/datasets/gold-prices
# Identifier Object Structure
The object structure looks like:
{
// URL to base of the Data Package
// This URL should *always* have a trailing slash ('/')
url: ...
// URL to datapackage.json
dataPackageJsonUrl: ...
// name of the Data Package
name: ...
// version of the Data Package
version: ...
// if parsed from a Identifier String this is the original
// specString
original:
}
It can be parsed (and less importantly) serialized to a simple string. Spec strings will be frequently used on e.g. the command line to identify a data package.
# Identifier String
An Identifier String is a single string (rather than JSON object) that points to a Data Package. An Identifier String can be, in decreasing order of explicitness:
- A URL that points directly to the
datapackage.json
(no resolution needed):
http://mywebsite.com/mydatapackage/datapackage.json (opens new window)
- A URL that points directly to the Data Package (that is, the directory containing the
datapackage.json
):
http://mywebsite.com/mydatapackage/ (opens new window)
resolves to:
http://mywebsite.com/mydatapackage/datapackage.json (opens new window)
- A GitHub URL:
http://github.com/datasets/gold-prices (opens new window)
resolves to:
https://raw.githubusercontent.com/datasets/gold-prices/master/datapackage.json (opens new window)
- The
name
of a dataset in the Core Datasets registry (opens new window):
gold-prices
resolves to:
https://datahub.io/core/gold-prices/datapackage.json (opens new window)