Data Package Identifier

Data Package Identifiers are a simple way to identify a Data Package (and its location) using a string or small JSON object.

It exists because of the consistent need across applications to identify a Data Package. For example, in command line tools or libraries one will frequently want to take a Data Package Identifier as an argument.

For example, consider the dpm (the Data Package Manager) has commands like:

# gdp is a Data Package identifier
dpm info gdp

# is a Data Package identifier
dpm install

Identifier Object Structure

The object structure looks like:

{% highlight javascript %} { // URL to base of the Data Package // This URL should always have a trailing slash ('/') url: ... // URL to datapackage.json dataPackageJsonUrl: ... // name of the Data Package name: ... // version of the Data Package version: ... // if parsed from a Identifier String this is the original // specString original: } {% endhighlight %}

It can be parsed (and less importantly) serialized to a simple string. Spec strings will be frequently used on e.g. the command line to identify a data package.

Identifier String

An Identifier String is a single string (rather than JSON object) that points to a Data Package. An Identifier String can be, in decreasing order of explicitness:

  • A URL that points directly to the datapackage.json (no resolution needed):
  • A URL that points directly to the Data Package (that is, the directory containing the datapackage.json):

    resolves to:
  • A GitHub URL:

    resolves to:
  • The name of a dataset in the Core Datasets registry:


    resolves to: