Version: 5.0.0

Frictionless Specifications for InterMine

Introduction#

What are Frictionless Specifications?#

At the core of Frictionless is a set of patterns for describing data including Data Package (for datasets), Data Resource (for files) and Table Schema (for tables). For more info about the project as a whole, please visit frictionlessdata.io.

What's a data package?#

A Data Package is a simple container format used to describe and package a collection of data (a dataset).

Data Package for InterMine#

InterMine allows users to query a diverse data sources through its webapps. InterMine's new data package will help users to understand the query results in a simplified manner. It will describe the primary keys, data types of attributes/columns, descriptions and ontology links of attributes among other things. For a sample InterMine Data Package, click here.

Frictionless Specifications used in InterMine Data Package#

InterMine uses Tabular Data Package and Tabular Data Resource since InterMine's biological data is tabular-style.

InterMine's Data Package#

How to export?#

While exporting query results, there'll be a new option for Frictionless Data Package. You can use it to export the datapackage along with the results.

Please note that if you want to export the data package, it will be exported in a zip file along with the query results.

Description of InterMine's Data Package Fields#

Some of the fields in the data package are standard fields followed by frictionless specifications. These are highlighted with the keywork FIXED in the third column otherwise examples are specified.

Key	Description	Value/Example
profile [outer level]	specifies that the specification used is tabular data package	tabular-data-package [FIXED]
name [outer level]	describes the name and version of the mine	flymine@v51
profile [inner level]	specifies that the resource used is tabular data resource	tabular-data-resource [FIXED]
name [inner level]	the name of the resource, depends on the mineName	flymine-query-data-resource
path	exports the top 10 rows of results of query	example below
format	format of the query results file	csv/tsv/json/xml
schema	describes fields of query results and primary/candidate keys	example below
fields	an array of objects describing all the fields in query results	example below
name [in fields]	name of the field/column header	firstAuthor
type [in fields]	type of the field/column header	String/Integer/etc.
class path [in fields]	class path of attribute/field	Protein > Organism . Name
class ontology link [in fields]	ontology link for the class of attribute	http://semanticscience.org/resource/SIO_010043
attribute ontology link [in fields]	ontology link for the attribute	http://edamontology.org/data_2909
primaryKey	an array of candidate keys	[primaryIdentifier, primaryAccession]
sources	an array of objects each describing a data source	example below
title [in sources]	name/title of data source	GenomeNet
url [in sources]	url of the data source	http://www.genome.jp/en/

Sample data package#

{
  "profile" : "tabular-data-package",
  "name" : "biotestmine@v31",
  "resources" : [ {
    "profile" : "tabular-data-resource",
    "name" : "intermine-query-data-resource",
    "path" : "http://localhost:8080/biotestmine/service/query/results?query=%3Cquery+name%3D%22%22+model%3D%22genomic%22+view%3D%22Protein.primaryIdentifier+Protein.primaryAccession+Protein.organism.name+Protein.publications.firstAuthor+Protein.publications.title+Protein.publications.year+Protein.publications.journal+Protein.publications.volume+Protein.publications.pages+Protein.publications.pubMedId%22+longDescription%3D%22%22+sortOrder%3D%22Protein.primaryIdentifier+asc%22%3E%3Cconstraint+path%3D%22Protein.organism.name%22+op%3D%22%3D%22+value%3D%22Plasmodium+falciparum+3D7%22%2F%3E%3C%2Fquery%3E&format=tab",
    "format" : "tsv",
    "schema" : {
      "fields" : [ {
        "name" : "primaryIdentifier",
        "type" : "String",
        "class path" : "Protein > DB identifier",
        "class ontology link" : "http://semanticscience.org/resource/SIO_010043",
        "attribute ontology link" : "http://semanticscience.org/resource/SIO_000675"
      }, {
        "name" : "primaryAccession",
        "type" : "String",
        "class path" : "Protein > Primary Accession",
        "class ontology link" : "http://semanticscience.org/resource/SIO_010043",
        "attribute ontology link" : "http://edamontology.org/data_2907"
      }, {
        "name" : "name",
        "type" : "String",
        "class path" : "Protein > Organism . Name",
        "class ontology link" : "http://semanticscience.org/resource/SIO_010000",
        "attribute ontology link" : "http://edamontology.org/data_2909"
      }, {
        "name" : "firstAuthor",
        "type" : "String",
        "class path" : "Protein > Publications > First Author",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42781"
      }, {
        "name" : "title",
        "type" : "String",
        "class path" : "Protein > Publications > Title",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : "http://semanticscience.org/resource/SIO_000185"
      }, {
        "name" : "year",
        "type" : "Integer",
        "class path" : "Protein > Publications > Year",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : null
      }, {
        "name" : "journal",
        "type" : "String",
        "class path" : "Protein > Publications > Journal",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : "http://semanticscience.org/resource/SIO_000160"
      }, {
        "name" : "volume",
        "type" : "String",
        "class path" : "Protein > Publications > Volume",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : null
      }, {
        "name" : "pages",
        "type" : "String",
        "class path" : "Protein > Publications > Pages",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : null
      }, {
        "name" : "pubMedId",
        "type" : "String",
        "class path" : "Protein > Publications > PubMed ID",
        "class ontology link" : "http://semanticscience.org/resource/SIO_000087",
        "attribute ontology link" : "http://edamontology.org/data_1187"
      } ],
      "primaryKey" : [ "primaryIdentifier", "secondaryIdentifier", "primaryAccession" ]
    }
  } ],
  "sources" : [ {
    "title" : "GenomeNet",
    "url" : "http://www.genome.jp/en/"
  } ]
}