Homologue Data Sources Overview
InterMine comes with several data converters for homologue data, e.g. TreeFam, PANTHER, OrthoDB, Homlogene, etc. Follow the instructions below to include these datasets in your InterMine.
- Ensembl Compara
The default rule for bio-InterMine is to put the MOD identifiers (eg. MGI:XXX or ZDB-GENE-XXX) in the primaryIdentifier field. This is tricky because some homologue sources use the Ensembl identifiers (Ensembl identifiers belong in the Gene.crossReferences collection).
To solve this problem, each homologue source uses the NCBI identifier resolver. This resolver takes the Ensembl ID and replaces it with the corresponding MOD identifier.
#How to use an ID resolver
Download the identifier file -
Unzip the file to
Warning Make sure permissions on the file are correct so the build process can read this file.
See ID Resolvers for details on how ID resolvers work in InterMine.
Warning The entrez identifiers file appears to only have the sequence identifier for worm instead of the WBgene identifier
Alternately you can load identifier sources.
Here are the download scripts we use here at InterMine:
We use WormMart but are happy to hear of a better source for worm identifiers.
Here are the project XML entries used by FlyMine: