InterMine 0.93
In InterMine 0.93 we have done a large amount of internal refactoring of code to make the core functionality more robust. In addition there are several bug fixes, validation improvements and a new 'hints' help system.
Web interface
- New hints system displaying short help text at the top of pages. Hints are included by default but you can easily add your own, see Hints.
- Configuration errors in webconfig-model.xml are logged instead of throwing exception on deployment.
- The QueryBuilder now uses inner joins as the default when adding new fields to the view list
- Fixes to help text in the QueryBuilder
- Display informative error message if the superuser in the database doesn't match the superuser in the properties file.
- Removed the option to filter lists by tag on the template form, this made the interface clumsy and wasn't very useful
- Enrichment widgets - fixed rounding error, adding link to documentation on each widget.
- Use better MIME types for GFF3 and FASTA export - previously export links opened in a browser window for many users
- now uses text/tab-separated-values for GFF3 and chemical/x-fasta for FASTA.
- Fix bug that allowed lists to be saved with empty names if only whitespace characters were entered
- Fixed bad interaction with saved list names and anti-spam JavaScript for the contact form.
- Link outs can post identifiers as multipart/form encrypted data, required for Reactome skypainter
- Option to include a link to the contact form just below the header ...
Bio/sources
new
See BioSources for details on how to add these sources to your mine.
- KEGG homologues
- Miranda - mirBase targets from the Sanger Institute.
updated
- UniProt converter
- Improved memory management so converter can load larger files and works faster.
- Restored the option to load the whole UniProt XML files.
- BioGRID
- If organisms are specified in project.xml, process interactions with ONLY those organisms.
- If organism is not specified in project.xml, process all files in data directory.
- OrthologueEvidence now has an evidence code and a publication collection.
- BioPAX
- Doesn't throw an exception if a bad file names are found.
- Synonyms now merging correctly.
Performance
- Fixes for several bugs that were causing slow export of very large data sets, export should now create temporary precomputed tables correctly to stage the data to be exported. Speed of export should be significantly improved.
- Better JavaScript and reduced HTML markup have made the list analysis page a bit faster to load.
Web services
- Embedded templates now use custom column descriptions as column headers if they have been configured.
- Option to fetch the release version of a Mine from the web service see web service documentation.
Web properties config
- OrthologueConverter - Add links to other intermines to list analysis pages. See OrthologueConverter for details.
- Meta tag properties can now be configured in web.properties. See WebProperties for more information. Example configuration:
# in web.properties, populates meta tags meta.keywords = microarray, bioinformatics, drosophila, genomics meta.description = Integrated queryable database for Drosophila and Anopheles genomics
- Mines can now link to Reactome's SkyPainter, an example entry from WebProperties:
attributelink.reactomeListProt.Protein.7227.primaryIdentifier.list.url=http://www.reactome.org/cgi-bin/skypainter2?QUERY=<<attributeValue>>&SUBMIT=1&DB=gk_current attributelink.reactomeListProt.Protein.7227.primaryIdentifier.list.text=Reactome attributelink.reactomeListProt.Protein.7227.primaryIdentifier.list.usePost=true attributelink.reactomeListProt.Protein.7227.primaryIdentifier.list.enctype=multipart/form-data attributelink.reactomeListProt.Protein.7227.primaryIdentifier.list.delimiter=NEWLINE
InterMine 0.92
InterMine 0.92 contains the following changes. The upgrade from 0.91 should require no changes to existing mines and sources, see UpgradeInterMine.
Data loading performance
- Fixed a bug in query caching while loading data, should improve speed of loading data.
- Run database analyse less frequently, can be very slow in large databases.
Error messages
- We have tried to make some of the most commonly encountered error messages better describe the problem and possible solutions. If you spot an error message we could make better, please let us know!
Mine configuration
- Improved validation of priorities file:
- validation is done before data loading starts
- checks that specific classes and field names exist
- checks that configured source names exist in project.xml
- Simplified system for configuring integration keys - you can now specify data integration keys entirely in the sourcename_keys.properties file in a source, without first defining the key in genome_keyDefs.properties.
- More information available here: PrimaryKeys.
- Note that the old system still works so you don't need to update anything you have already set up.
New widgets
- A feature length widget that can be used for any genome feature. This draws a normal distribution of lengths for featues in the list and for all features of that type for the selected organism.
- Pathway enrichment widget - enrichment of KEGG/Reactome pathways.
- Enrichment widgets now have an option to view all entries analysed by widget in a results table
Bio parsers
- Added a new source bio/sources/biopax to parse BioPAX format files.
- FileConverter now has option to parse pipe delimited files.
- Fetch GO annotation directly from UniProt XML files, set <property name="creatego" value="true"/> in `project.xml
Web application
- Fix concurrency issue when saving queries to a user profile - previously had a small chance of entering an infinite loop.
- A new option to export data as a gzipped file.
- Performance:
- Consider existing precomputed tables when attempting to precompute to boost export speed.
- Fix bug that prevented clean up of temporary precomputed tables
- Fix problem in QueryBuilder when using subclasses and outer joins in a query.
- When linking out with a list of ids add option to configure delimiter to use.
- Improved results table headers, narrower columns with abbreviated header, tooltip to get full header
- Better wrapping of long text on report pages, no longer breaks within words.
- Can configure results table column summaries to treat some numbers as strings - e.g. for Publication.year it is better to see the counts per year rather than mean and standard deviation.
InterMine 0.91
InterMine 0.91 contains the following changes. The upgrade from 0.90 should require no changes to existing mines and sources, see UpgradeInterMine.
QueryBuilder
There are several improvements to advanced features of the QueryBuilder
- The Query Overview in the top right pane is now more consistently ordered.
- Constraint Logic: queries with multiple sets of OR constraints now work correctly, subsequent OR constraints were being ignored.
- Constraint Logic: removing constraints no longer resets all operations from OR to AND, and no longer removes brackets.
- ‘loop queries’ and outer joins now work correctly together, loop constraints are not permitted across outer joins.
- New constraints added to the query always default to inner joins.
- Paths are no longer added to the Query Overview when you cancel adding a constraint
- A space is no longer a valid separator in LOOKUP constraints (in the QueryBuilder and templates), e.g. 'Drosophila melanogaster' is now treated as a single value. Comma is still used as a delimiter.
Export
Some bugs have been fixed for some edge cases in export.
- Fixed a bug where export failed in some cases with complex queries.
- Exported column headings now use the description if this has been specified in a Template Query – to match the results table behaviour.
- The column description for the root node is no longer lost when exporting.
- Rows are no longer duplicated when exporting a query with multiple outer joined collections that each contain only one row.
- Removing the column the original query was sorted by no longer gives an error when exporting.
User passwords
- Passwords are now hashed in the userprofile database and are not sent as reminders in e-mail. There is a new reset password system that sends a link containing a unique token to the user, this link can be used to reset the password.
- Passwords in existing userprofile databases will become hashed when upgrading between releases with write-userprofile-xml and read-userprofile-xml.
- When a new userprofile database is created the superuser password will now be hashed, therefore there is a new superuser.initialPassword in the xxxmine.properties file to set the password to some initial known value. The password should then be changed.
Ontology model change
- There is now a parents collection on the OntologyTerm class (and its subclasses) containing all parents of the term and the term itself. The collection is automatically filled in for any obo source. This makes it easier to create queries that search for genes that have a particular term applied or any child of that term, e.g. by searching for Gene.goAnnotation.ontologyTerm.parents.name = 'DNA binding'.
bio/sources
- Added a new parser for TreeFam orthologue data in bio/sources/treefam.
- Updated bio/sources/go-annotation with a new, faster parser. The go-annotation source now uses less memory and no longer requires that the gene association file is ordered by the gene/protein identifier.
- The make_source script can now create sources to load .obo files.
Data integration priorities
It is now possible to specify data integration priorities in a more flexible manner. Previous you could set a priority only for the class which declared the field (e.g. BioEntity.primaryIdentifier), so it wasn't possible to have different priority configurations for different subclasses (i.e. Gene.primaryIdentifier and Protein.primaryIdentifier had to be the same).
This is now fixed. You can specify priorities for Gene.primaryIdentifier and Protein.primaryIdentfier separately. The only restriction is that you can't override a priority set by a superclass, i.e. you can't set LocatedSequenceFeature.primaryIdentifier and Exon.primaryIdentifier because Exon is a subclass of LocatedSequenceFeature.
GFF3 export
- Fixed a bug where exporting as GFF3 would set features on the negative strand if no strand was set in the feature location, a '.' is now added according to the standard.
Dataloading performace
- dataloading speed should be slightly improved, we have fixed a bug where the Model object which has a complex hashCode() method was used as a HashMap key when the model name was sufficient.
