Frictionless Specifications for InterMine
#What are Frictionless Specifications?
At the core of Frictionless is a set of patterns for describing data including Data Package (for datasets), Data Resource (for files) and Table Schema (for tables). For more info about the project as a whole, please visit frictionlessdata.io.
#What's a data package?
A Data Package is a simple container format used to describe and package a collection of data (a dataset).
#Data Package for InterMine
InterMine allows users to query a diverse data sources through its webapps. InterMine's new data package will help users to understand the query results in a simplified manner. It will describe the primary keys, data types of attributes/columns, descriptions and ontology links of attributes among other things. For a sample InterMine Data Package, click here.
#Frictionless Specifications used in InterMine Data Package
InterMine uses Tabular Data Package and Tabular Data Resource since InterMine's biological data is tabular-style.
#InterMine's Data Package
#How to export?
While exporting query results, there'll be a new option for Frictionless Data Package. You can use it to export the datapackage along with the results.
Please note that if you want to export the data package, it will be exported in a zip file along with the query results.
#Description of InterMine's Data Package Fields
Some of the fields in the data package are standard fields followed by frictionless specifications. These are highlighted with the keywork FIXED in the third column otherwise examples are specified.
|profile [outer level]||specifies that the specification used is tabular data package||tabular-data-package [FIXED]|
|name [outer level]||describes the name and version of the mine||flymine@v51|
|profile [inner level]||specifies that the resource used is tabular data resource||tabular-data-resource [FIXED]|
|name [inner level]||the name of the resource, depends on the mineName||flymine-query-data-resource|
|path||exports the top 10 rows of results of query||example below|
|format||format of the query results file||csv/tsv/json/xml|
|schema||describes fields of query results and primary/candidate keys||example below|
|fields||an array of objects describing all the fields in query results||example below|
|name [in fields]||name of the field/column header||firstAuthor|
|type [in fields]||type of the field/column header||String/Integer/etc.|
|class path [in fields]||class path of attribute/field||Protein > Organism . Name|
|class ontology link [in fields]||ontology link for the class of attribute||http://semanticscience.org/resource/SIO_010043|
|attribute ontology link [in fields]||ontology link for the attribute||http://edamontology.org/data_2909|
|primaryKey||an array of candidate keys||[primaryIdentifier, primaryAccession]|
|sources||an array of objects each describing a data source||example below|
|title [in sources]||name/title of data source||GenomeNet|
|url [in sources]||url of the data source||http://www.genome.jp/en/|