Frequently Asked Questions
Below please find our most commonly asked questions. We also have a list of the most common errors and their fixes.
If you still don't find what you need, please contact us.
- Data warehouse
- My data takes too long to load into the database. How long should it …
- My data isn't loading, I'm getting an error, etc
- Where I can find a listing of all the existing data formats that can be …
- What are the other models do we have besides the genomic model and how …
- How do you define a primary key for a model?
- When we define a new model (e.g., MY-NEW_model.xml), in which directory …
- Once a new model is defined, how do we include it Intermine and use it?
- How and where can I set information for an organism?
- Since FASTA sequences can either be in nucleotide or protein, is there a …
- Beside 'protein', what are other values can be assigned to …
- There are several post processing tasks listed, what do they do?
- Do we have an ant build-all target that do build-db, integrate all the …
- What database schema is used for InterMine?
- Webapp
- How do I make templates and lists show up on the templates/lists page?
- How do I make a public template or list show up on the homepage?
- How can I set which fields are links/used to create lists on the results …
- How can I customise how data is displayed on the report page?
- How can I add my own logo and change the colour scheme?
- If I rebuild a mine, all user profiles and their saved info (queries, …
- Where can I set the list of default templates?
- My quick search doesn't work. What arguments does the quick search …
- How can I customise the data categories on the main page?
- How can I customise the data categories on the report page?
- Where can I set the password for the superuser.account?
- IQL
See also: GettingStarted, InterMineOverview, FlyMineFAQ
Data warehouse
My data takes too long to load into the database. How long should it take? How can I make it faster?
There are improvements you can make. Mainly, setting "ignoreDuplicates=true" switches off a lot of performance enhancements that are not compatible with it, and makes the build run much slower. So if possible, you should make sure that there are no duplicated objects at all in each data source, and then switch off "ignoreDuplicates". It is alright for objects to be duplicated across data sources, because then the objects will merge, but each object must appear only once in each data source. The new release branch will contain code that will tell you if it sees any duplicated objects, and which objects they are.
As far as Postgres settings go, we have a set of settings that seem to serve us pretty well. We would recommend version 8.3 of PostgreSQL, as it contains features that help quite a bit. Some of the settings we change are:
- shared_buffers: Set to around 150MB
- temp_buffers: Set to around 80MB
- work_mem: Set to around 1500MB
- maintenance_work_mem: Set to around 500MB
- default_statistics_target: Set to around 250
- random_page_cost: Set to around 2.0, rather than 4.0
- effective_cache_size: Set to about 2/3 the amount of RAM in the computer
Obviously these settings should be adjusted to how much RAM there is in the computer - the work_mem shouldn't be more than a third of the RAM in the computer or so.
My data isn't loading, I'm getting an error, etc
If you need help, please contact us. We're always happy to talk to InterMine users!
It's usually most helpful if you send us the detailed error message. Try running Ant with the verbose flag:
ant -verbose build-db
You can also check the logs. The error messages should be in intermine.log in the directory you are currently in, eg /dbmodel or /integrate.
See: TroubleshootingTips
Where I can find a listing of all the existing data formats that can be loaded into InterMine?
BioSources gives an overview of the data formats we already have parsers for. Each format is loaded by a 'source', see bio/sources. Many of these can easily be re-used for other organisms and data files. There isn't a document yet that lists the properties that each source takes but you can see how they are used in the FlyMine project.xml.
What are the other models do we have besides the genomic model and how would I use them?
Currently all biological mines call their model "genomic". That's a bit confusing bit it's necessary because the model name is used to create the Java package name and we need to have the same package in all mines so that we can reuse code.
We do have one non-"genomic" model that might be useful for you to look at. It's called "testmodel" and as expected it's used for testing. It's defined in this file: testmodel_model.xml
Unlike the biological mines, we define the testmodel in that one file, rather than having many additions.xml files.
How do you define a primary key for a model?
Currently, if you're building on the "genomic" model (ie. you have
<property name="target.model" value="genomic"/>
in your project.xml, all primary keys are defined in the file: genomic_keyDefs.properties
We realise that having all keys in one place isn't very scalable but it's the only solution we have at the moment.
You would need to add a line like: Staff.key_identifier=identifier or: Staff.key_name=name (or both) to that file.
There can be multiple primary keys for each class (examples with many keys are Gene and Protein) so each source must configure which key to use when merging.
See: PrimaryKeys
When we define a new model (e.g., MY-NEW_model.xml), in which directory should we put it under? In bio/sources/MY-NEW?
Do you mean a new source? If so, then bio/sources/MY-NEW is correct.
When you say "define a new model" do you mean that you would like a complete new data model (ie. without Gene, Protein etc. but with your classes) or you would like to add to/modify the existing model?
Starting from scratch will take a lot of work. All of the mines we work on are based on the model in bio/core/core.xml and bio/core/genomic_additions.xml which define basic classes like "Organism" and "Chromosome". We recommend that you build on those to make your model.
All of the mines call their model by the same name "genomic", which is specified in the project.xml using the target.model property. We suggest you name your model "genomic" too because a lot of code (eg. in the bio/sources directory) expects the Java package for the generated model code to be org.intermine.model.bio.
See: GettingStarted, AnatomyOfASource
Once a new model is defined, how do we include it Intermine and use it?
- Include your new source in project.xml.
- Update your additions file to include any new classes.
- In <MINE_NAME>/dbmodel, run this command:
ant build-db
Running build-db will destroy any existing data loaded in the production database and re-create all the tables.
See: SourceHowto
How and where can I set information for an organism?
There is a source called entrez-organism. This looks for all organism taxon ids in the database and contacts the NCBI web service to fill in the rest of the information. This is why we just use taxon ids in all sources.
Just run the source last and it should get filled in.
See: BioSources
Since FASTA sequences can either be in nucleotide or protein, is there a way that I can set this?
Yes, there is a property that can be passed to the fasta source - fasta.sequenceType. The default is dna, but it can be set to protein. Here's an example:
<source name="flybase-dmel-translation-fasta" type="fasta">
<property name="fasta.taxonId" value="7227"/>
<property name="fasta.className" value="org.flymine.model.genomic.Translation"/>
<property name="fasta.classAttribute" value="organismDbId"/>
<property name="fasta.includes" value="dmel-all-translation-*.fasta"/>
<property name="fasta.sequenceType" value="protein"/>
<property name="src.data.dir" location="/shared/data/flybase/dmel/release_5_1/fasta"/>
</source>
Beside 'protein', what are other values can be assigned to fasta.sequenceType?
The InterMine fasta loader uses the fileToBiojava() method in the BioJava SeqIOTools package. It looks like the options are "dna", "rna" or "protein".
There are several post processing tasks listed, what do they do?
See: PostProcessing
Do we have an ant build-all target that do build-db, integrate all the data sources, build-db-userprofile, create the war file, remove the war file, and deploy the war file?
Sorry, there's no target that does all that. Probably a small script would do the trick for you.
What database schema is used for InterMine?
We don't have a diagram of our database schema - we design the model at the object level and the database schema is automatically generated.
Webapp
How do I make templates and lists show up on the templates/lists page?
- Log into your site's super user account.
- Tag the template or list as "im:public".
See: Tagging
How do I make a public template or list show up on the homepage?
- Log into your site's super user account.
- Tag the template or list as "im:frontpage".
See: Tagging
How can I set which fields are links/used to create lists on the results page?
Add them to the class_keys file.
See: WebappConfig
How can I customise how data is displayed on the report page?
Make a custom jsp page.
See: LongDisplayers
How can I add my own logo and change the colour scheme?
See: WebappAppearance
If I rebuild a mine, all user profiles and their saved info (queries, lists, etc.) associated with that mine are deleted. Is this the case? If so, how can I save the profiles and their info and import them into the newly rebuilt mine?
No, all the data will be saved unless you do ant build-db-userprofile in the webapp directory. However, saved lists work with internal ids which change when a new build of the mine is done. To solve this you write the userprofile to XML first and re-import it.
- While you still have your old build do ant write-userprofile-xml in webapp and copy the userprofile.xml file somewhere.
- When the new build is ready copy userprofile.xml back into the build directory.
- Run ant read-userprofile-xml to read it back in, this should do queries to update the lists to new ids.
I would check this works before you risk losing a userprofile database. Of course, if you only have a couple of lists you can just re-import them from the webapp.
Where can I set the list of default templates?
Update the default-template-queries.xml file.
See: User Profile
My quick search doesn't work. What arguments does the quick search expect?
At the moment the quick search is configured to run a particular template query. We use a query called A_IdentifierSynonym_Object this is configured in webapp/resources/web.properties.
All biological feature classes in the model have a collection of Synonyms objects to represent alternative identifiers. We also create synonyms for each object identifier, e.g for a Gene with identifier 'eve' we also create a Synonym with value 'eve'. This means we can just search through the synonym table to find any feature type.
How can I customise the data categories on the main page?
- Customise your categories in aspects.xml
- Run ant default remove-webapp release-webapp to update the customized categories. The "default" target makes forces a re-build of the WAR file before releasing.
See: DataCategories
How can I customise the data categories on the report page?
- Fields
- Tag classes with the im:aspect:Genomics tag. When this class appears on a report page, it will be displayed underneath the Genomics data category.
- Templates
- Tag templates with the im:public and im:aspect:Genomics tags.
- Update the identifier template constraint to be the correct field.
See: DataCategories
Where can I set the password for the superuser.account?
There is a property for setting an initial password, along with the property for setting the superuser's email address:
superuser.account=some@email.address superuser.initialPassword=somepassword
This should only be used to set an initial password - obviously the fact that the password is stored in a properties file makes it insecure, so you should change the password as soon as possible. If you don't set that property, then a random password will be generated (but since it is stored in hashed form and the original is thrown away, there is no way to find out what that is).
If you do not know the superuser's password, then you can as with any other account use the "Forgotten password" facility on the login page. This will send you an email containing a link that you can follow to your webapp to change the password.
IQL
My fields have been renamed to 'intermine_from' and 'intermine_to' in the database. Why?
"from" is a reserved word in IQL. You need to surround it with double quotes, in a rather bizarre manner, like this:
SELECT object."from" FROM object
See: QueryPackage
Is the order important in WHERE clause in IQL?
No. "name = 'Fred' AND age = 5" is the same as "5 = age AND name = 'Fred'".
See: IQL
My query is taking too long. How long should queries take? How can I make the queries faster?
The database sits on top of Postgres, and the methods by which Postgres answers queries are deep magic that can cause all sorts of unexpected timing phenomena.
Adding a constraint to reduce the amount of results can make the query slower, because Postgres may have to read just as much data from the database, but it has to do more work to filter the results by the constraint. On the other hand, an extra constraint could also make the query faster, if it allows the database to make use of an index or choose a faster algorithm.
If there are a lot of rows in the results, then it is worth trying to use a large batch size on the Results object, if you are running this from Java, by calling Results.setBatchSize(10000) or so. I believe 1000 is the default batch size, so 10000 should help a bit.
Lastly, make sure the database has been analysed properly (which should be done automatically as part of the build process).
