Data Licences
You are using InterMine to integrate several data sets into a single database, for ease of querying for your end users. It's important that you make it very clear to your users how the data in your mine is licenced and how it can be re-used.
#
New DataSet.licence fieldIn InterMine 4.0, we've added licence
to the "data set" model as a text field. This column is meant to be a URL to point to the standard data licence, e.g. https://creativecommons.org/licenses/by/4.0/
#
How is this information being used?These data can be displayed prominently on the report page and in query results. We'll also use the licences in the RDF generation.
#
Why does it have to be a URL to a standard data licence?The contents of DataSet.licence
should be a URL that points to a standard data licence.
#
Why can't I put a URL to the fair use policy?If you put a URL to the data source's fair use policy for example, the URL might change. Also, sometimes the fair use policy is vague, contradictory or just hard to understand. It's better to only use standard data licences.
#
Why can't I put a short snippet about the fair use policy for these data?If you summarise the fair use policy, there is a danger that you get it wrong, or the data policy changes.
Providing no information about the data licence is better than having bad information about the data licence.
#
How to add licence to an InterMine?If you want to add a licence to your datasets in your mine, you can do so by updating the associated data source that loads that data set.
#
Core data sourcesInterMine core data parsers either parse a standard file type, e.g. FASTA, GFF or a specific file type from a specific data source, e.g. OMIM, UniProt
Standard file types
To update the data licence, add the licence information to the project XML file. An example:
FASTA
NB: The prefix has to match the type
of the data source.
OBO
All others
We've updated all InterMine core data sources with the correct data licence. This requires no action from you. Use the library as normal, and the data parser will populate the DataSet.licence
field.
However, not every core data source has a data licence. About 1/3 of the InterMine data sets have libraries that have data licences. The rest only have text about fair use. We hope that as data licences become more popular and visible, this number will rise.
#
Your data sourcesDataSet now has a licence field, so you will want to update this field in your data parser.
Here is an example using the Java API:
If you are using the BioFileConverter
, you can use the constructor like so:
This will update the data set licence field for you.
#
None of my data sources have data licencesWe discovered that only a minority of data sets have a licence: of the 26 core data set types that InterMine supports, only 9 have a data set licence, although 14 had some text about fair use.
Please see our blog posts for more details.