Query performance (precomputed tables)
InterMine can make use of precomputed tables (analagous to materialised views) for faster execution of queries. These can represent any SQL query (or InterMine query) and can automatically be substituted into incoming queries by our own cost-based query optimiser. For example, a precompute that joins three tables could be used in a larger query that includes that join thus reducing the total number of tables joined in the query. Template queries can be precomputed completely so that for any value entered in an editable constraint, the query will be executed from a single database table.
#
Template queries#
WebappAs the superuser, when you create a new template or edit an existing one there is a 'precompute' link on the MyMine saved templates list. Clicking this will create a precomputed table for just this query. It can take some time to create the tables and requests aren't put in a queue so it is not a good idea to click many of these links at once. The 'precompute' link will change to 'precomputed' once there is a precomputed table created.
#
Command linePrecomputing template queries makes sure that public templates will always run quickly. You can precompute all templates saved as the superuser in your userprofile database from the command line. This checks each template first to see if it is already precomputed.
#
Manual specification of queriesYou can specify any IQL query to precompute in the file MINE_NAME/dbmodel/resources/genomic_precompute.properties
. These allow you to design queries you think are likely to be created commonly or be parts of larger queries. It is the place to put queries that will be used in list upload and widgets to ensure they run fast.
Here is an example query:
You can also specify the classes involved:
#
Dropping precomputed tablesTo drop all precomputed tables in a database:
#
Size of precomputed tablesYou can see the names and sizes of all precomputed tables by running this SQL query in psql:
Note that this only lists the table sizes, there may be many indexes associated with each table which may also be large. To see the size of all tables and indexes in the database use:
#
Template SummariesAfter the templates are precomputed, they are "summarised". This means any dropdowns for the templates will be updated to only include valid values for that specific templates. Here's how it's done:
- All editable constraints are dropped, non-editable constraints are kept
- Valid values (summaries) for dropdowns are recalculated
For example, if you have a template with an option to select a chromosome, all chromosomes in the database will be displayed. However if you have a non-editable constraint setting the value of the organism to be human, only the human chromosomes will be displayed after summarisation.
#
FAQs#
How do you know what to put in the precomputes file?This is what we did for FlyMine:
- Common joins to be done, e.g. Gene to protein
- Widgets - see what queries the widgets are running, add those queries
- Problem areas being reported, certain queries being slower than expected, e.g. interaction queries
These three things, along with precomputing templates, seem to work best.
Ideally we would have some sort of query profiling and would be able to tell where precomputing helps.
#
How do you tell if what you put in there is actually helping?When the query is logged, it gives the execution time as well:
> bag tables: 0 ms, generate: 1 ms, optimise: 0 ms, ms, estimate: 9 ms, execute: 61 ms, convert results: 7 ms, extra queries: 0 ms, total: 78 ms, rows: 806
This lets you compare query speeds. You can tell the query used a precomputed table by checking the logs for the prefix precomp_
#
Were all these queries (in the flymine file) created by hand?No. We ran all of our analysis tools on the list analysis page, e.g GO enrichment widget and captured the queries being run via the logs.
#
PostgreSQL is not using my precomputed table when running a query. Help!You must restart Tomcat after you have created all of the precomputed tables or else your new tables won't be used
PostgreSQL uses EXPLAIN to decide which query is fastest. If using your table isn't going to be faster, it won't use it. PostgreSQL may be wrong, but that's how it decides which table to use. See
http://www.postgresql.org/docs/9.2/static/using-explain.html for
details.
#
A Log EntryThe LOG records three queries:
- the IQL (InterMine Query Language) query
- the generated SQL query
- the optimised query <-- this is where you will see your precomputed tables used
#
IQL#
Generated SQL#
Optimised sqlbag tables: 0 ms, generate: 1 ms, optimise: 0 ms, ms, estimate: 14 ms, execute: 11 ms, convert results: 0 ms, extra queries: 27 ms, total: 53 ms, rows: 1
Note the FROM
clause now includes precomp_45503
. You can query for this name in the database:
You can also run IQL queries directly in the console: