Version: Next

Frictionless Specifications for InterMine

Introduction#

What are Frictionless Specifications?#

At the core of Frictionless is a set of patterns for describing data including Data Package (for datasets), Data Resource (for files) and Table Schema (for tables). For more info about the project as a whole, please visit frictionlessdata.io.

What's a data package?#

A Data Package is a simple container format used to describe and package a collection of data (a dataset).

Data Package for InterMine#

InterMine allows users to query a diverse data sources through its webapps. InterMine's new data package will help users to understand the query results in a simplified manner. It will describe the primary keys, data types of attributes/columns, descriptions and ontology links of attributes among other things. For a sample InterMine Data Package, click here.

Frictionless Specifications used in InterMine Data Package#

InterMine uses Tabular Data Package and Tabular Data Resource since InterMine's biological data is tabular-style.

InterMine's Data Package#

How to export?#

While exporting query results, there'll be a new option for Frictionless Data Package. You can use it to export the datapackage along with the results.

Please note that if you want to export the data package, it will be exported in a zip file along with the query results.

Description of InterMine's Data Package Fields#

Some of the fields in the data package are standard fields followed by frictionless specifications. These are highlighted with the keywork FIXED in the third column otherwise examples are specified.

KeyDescriptionValue/Example
profile [outer level]specifies that the specification used is tabular data packagetabular-data-package [FIXED]
name [outer level]describes the name and version of the mineflymine@v51
profile [inner level]specifies that the resource used is tabular data resourcetabular-data-resource [FIXED]
name [inner level]the name of the resource, depends on the mineNameflymine-query-data-resource
pathexports the top 10 rows of results of queryexample below
formatformat of the query results filecsv/tsv/json/xml
schemadescribes fields of query results and primary/candidate keysexample below
fieldsan array of objects describing all the fields in query resultsexample below
name [in fields]name of the field/column headerfirstAuthor
type [in fields]type of the field/column headerString/Integer/etc.
class path [in fields]class path of attribute/fieldProtein > Organism . Name
class ontology link [in fields]ontology link for the class of attributehttp://semanticscience.org/resource/SIO_010043
attribute ontology link [in fields]ontology link for the attributehttp://edamontology.org/data_2909
primaryKeyan array of candidate keys[primaryIdentifier, primaryAccession]
sourcesan array of objects each describing a data sourceexample below
title [in sources]name/title of data sourceGenomeNet
url [in sources]url of the data sourcehttp://www.genome.jp/en/

Sample data package#

{
"profile" : "tabular-data-package",
"name" : "biotestmine@v31",
"resources" : [ {
"profile" : "tabular-data-resource",
"name" : "intermine-query-data-resource",
"path" : "http://localhost:8080/biotestmine/service/query/results?query=%3Cquery+name%3D%22%22+model%3D%22genomic%22+view%3D%22Protein.primaryIdentifier+Protein.primaryAccession+Protein.organism.name+Protein.publications.firstAuthor+Protein.publications.title+Protein.publications.year+Protein.publications.journal+Protein.publications.volume+Protein.publications.pages+Protein.publications.pubMedId%22+longDescription%3D%22%22+sortOrder%3D%22Protein.primaryIdentifier+asc%22%3E%3Cconstraint+path%3D%22Protein.organism.name%22+op%3D%22%3D%22+value%3D%22Plasmodium+falciparum+3D7%22%2F%3E%3C%2Fquery%3E&format=tab",
"format" : "tsv",
"schema" : {
"fields" : [ {
"name" : "primaryIdentifier",
"type" : "String",
"class path" : "Protein > DB identifier",
"class ontology link" : "http://semanticscience.org/resource/SIO_010043",
"attribute ontology link" : "http://semanticscience.org/resource/SIO_000675"
}, {
"name" : "primaryAccession",
"type" : "String",
"class path" : "Protein > Primary Accession",
"class ontology link" : "http://semanticscience.org/resource/SIO_010043",
"attribute ontology link" : "http://edamontology.org/data_2907"
}, {
"name" : "name",
"type" : "String",
"class path" : "Protein > Organism . Name",
"class ontology link" : "http://semanticscience.org/resource/SIO_010000",
"attribute ontology link" : "http://edamontology.org/data_2909"
}, {
"name" : "firstAuthor",
"type" : "String",
"class path" : "Protein > Publications > First Author",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : "http://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#C42781"
}, {
"name" : "title",
"type" : "String",
"class path" : "Protein > Publications > Title",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : "http://semanticscience.org/resource/SIO_000185"
}, {
"name" : "year",
"type" : "Integer",
"class path" : "Protein > Publications > Year",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : null
}, {
"name" : "journal",
"type" : "String",
"class path" : "Protein > Publications > Journal",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : "http://semanticscience.org/resource/SIO_000160"
}, {
"name" : "volume",
"type" : "String",
"class path" : "Protein > Publications > Volume",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : null
}, {
"name" : "pages",
"type" : "String",
"class path" : "Protein > Publications > Pages",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : null
}, {
"name" : "pubMedId",
"type" : "String",
"class path" : "Protein > Publications > PubMed ID",
"class ontology link" : "http://semanticscience.org/resource/SIO_000087",
"attribute ontology link" : "http://edamontology.org/data_1187"
} ],
"primaryKey" : [ "primaryIdentifier", "secondaryIdentifier", "primaryAccession" ]
}
} ],
"sources" : [ {
"title" : "GenomeNet",
"url" : "http://www.genome.jp/en/"
} ]
}