Keyword Search
InterMine uses Solr for its keyword search index.
By default the index will include the text fields of all objects in the database. Each object in the database becomes a document in the index with text attributes attached. You can configure classes to ignore, such as locations and scores that don't provide text information. You can also add related information to an object, for example, you can configure that the synonyms, pathways and GO terms should be included in the Gene's entry.
fields in the results
determined by WebConfigModel
type
class of object
score
determined by the Lucene search, from 0 to 1
lists
Users can make lists from search results but only if all results are of the same type.
To inspect the index directly: http://localhost:8983/solr/
#
Config fileThe config file is located at MINE_NAME/dbmodel/resources/keyword_search.properties
index.temp.directory
- directory for search index
index.references.<CLASS_NAME>
- eg. index.references.Gene
- index these objects' references in addition to the normal indexing
- eg. if Gene.pathways is indexed so that when users search for pathways, the associated genes are also returned as search results
index.ignore
- do not index these classes
index.ignore.fields
- do not index these fields
- eg
index.ignore.fields = SNP.type SNP.alleles
facets
- Will appear as filters on the left panel in the search results
- choose
single
for references,multi
for collections - Note: you must index any references used as facets. (see above at '''index.references''').
index.boost.<CLASS_NAME>
- weight this class heavier than other objects
search.debug
- debug setting off, used only for testing
index.optimize
- Boolean, defaults to false.
- If set to
true
, reorganises the index so chunks are placed together in storage, which might improve the search time. (Similar to defragmentation of a hard disk). Requires an empty space in the storage as large as the index, and takes additional time.
#
Search IndexYou can rebuild the search index by running this command in your mine:
You would need to re-release your webapp.
To inspect the index directly: http://localhost:8983/solr/
#
SolrSee Solr for details on how to install Solr.
#
Solr Partial String Match ConfigurationIn its default configuration, Solr will not match partial search terms. For example a gene named REVOLUTA will be returned in the search results for search term "REVOLUTA" but not for search term "REV." In order to have Solr return partial string matches, you must edit its configuration on the Solr server. To do this:
- ADD the following to /var/solr/data/[mine]-search/conf/managed-schema. (This example implements it for hits against Gene.primaryIdentifier and Gene.secondaryIdentifier.)
- REMOVE the gene_primaryidentifier and gene_secondaryidentifier field definitions from the earlier part of the file. They look like this:
OR, simply UPDATE the existing records, replacing the parameters with: type="text_ngram" indexed="true" stored="true".
- RESTART Solr to load the new config, e.g. under System V: :
- REBUILD the search index using the Solr-related postprocesses:
Your keyword search will now return results on partial matches for the attributes that you configured in Solr (Gene.primaryIdentifier and Gene.secondaryIdentifier in this example).