intermine package¶
Subpackages¶
Submodules¶
intermine.constraints module¶
-
class
intermine.constraints.
BinaryConstraint
(path, op, value, code='A')[source]¶ Bases:
intermine.constraints.CodedConstraint
These constraints assert a relationship between the value represented by the path (it must be a representation of a value, ie an Attribute) and another value - ie. the operator takes two parameters.
In all case the ‘left’ side of the relationship is the path, and the ‘right’ side is the supplied value.
- Valid operators are:
- = (equal to)
- != (not equal to)
- < (less than)
- > (greater than)
- <= (less than or equal to)
- >= (greater than or equal to)
- LIKE (same as equal to, but with implied wildcards)
- CONTAINS (same as equal to, but with implied wildcards)
- NOT LIKE (same as not equal to, but with implied wildcards)
-
OPS
= {'!=', '>=', 'LIKE', 'NOT LIKE', '>', '=', 'CONTAINS', '<=', '<'}¶
-
class
intermine.constraints.
CodedConstraint
(path, op, code='A')[source]¶ Bases:
intermine.constraints.Constraint
,intermine.constraints.LogicNode
Constraints that have codes are the principal logical filters on queries, and need to be refered to individually (hence the codes). They will all have a logical operation they embody, and so have a reference to an operator.
This class is not meant to be instantiated directly, but instead inherited from to supply default behaviour.
-
OPS
= set()¶
-
-
class
intermine.constraints.
Constraint
(path)[source]¶ Bases:
intermine.pathfeatures.PathFeature
All constraints inherit from this class, which simply defines the type of element for the purposes of serialisation.
-
child_type
= 'constraint'¶
-
-
class
intermine.constraints.
ConstraintFactory
[source]¶ Bases:
object
A constraint factory is responsible for finding an appropriate constraint class for the given arguments and instantiating the constraint.
-
CONSTRAINT_CLASSES
= {<class 'intermine.constraints.SubClassConstraint'>, <class 'intermine.constraints.BinaryConstraint'>, <class 'intermine.constraints.ListConstraint'>, <class 'intermine.constraints.UnaryConstraint'>, <class 'intermine.constraints.LoopConstraint'>, <class 'intermine.constraints.TernaryConstraint'>, <class 'intermine.constraints.RangeConstraint'>, <class 'intermine.constraints.IsaConstraint'>, <class 'intermine.constraints.MultiConstraint'>}¶
-
-
exception
intermine.constraints.
EmptyLogicError
[source]¶ Bases:
ValueError
An error representing the fact that an the logic string to be parsed was empty
-
class
intermine.constraints.
IsaConstraint
(path, op, values, code='A')[source]¶ Bases:
intermine.constraints.MultiConstraint
These constraints require that the value of the path they constrain should be an instance of one of the classes provided.
- Valid operators:
- ISA : The value is an instance of one of the provided classes.
For example:
SequenceFeature ISA [Exon, Intron]-
OPS
= {'ISA'}¶
-
class
intermine.constraints.
ListConstraint
(path, op, list_name, code='A')[source]¶ Bases:
intermine.constraints.CodedConstraint
These constraints assert a membership relationship between the object represented by the path (it must always be an object, ie. a Reference or a Class) and a List. Lists are collections of objects in the database which are stored in InterMine datawarehouses. These lists must be set up before the query is run, either manually in the webapp or by using the webservice API list upload feature.
- Valid operators are:
- IN
- NOT IN
-
OPS
= {'IN', 'NOT IN'}¶
-
class
intermine.constraints.
LogicGroup
(left, op, right, parent=None)[source]¶ Bases:
intermine.constraints.LogicNode
A logic group is a logic node with two child nodes, which are either connected by AND or by OR logic.
-
LEGAL_OPS
= frozenset({'AND', 'OR'})¶
-
-
class
intermine.constraints.
LogicNode
[source]¶ Bases:
object
Objects which can be represented as nodes in the AST of a constraint logic graph should inherit from this class, which defines methods for overloading built-in operations.
-
exception
intermine.constraints.
LogicParseError
(message, cause=None)[source]¶ Bases:
intermine.util.ReadableException
An error representing problems in parsing constraint logic.
-
class
intermine.constraints.
LogicParser
(query)[source]¶ Bases:
object
Instances of this class are used to parse logic strings into abstract syntax trees, and then logic groups. This aims to provide robust parsing of logic strings, with the ability to identify syntax errors in such strings.
-
check_syntax
(infix_tokens)[source]¶ Syntax is checked before parsing to provide better errors, which should hopefully lead to more informative error messages.
- This checks for:
- correct operator positions (cannot put two codes next to each other without intervening operators)
- correct grouping (all brackets are matched, and contain valid expressions)
@param infix_tokens: The input parsed into a list of tokens. @type infix_tokens: iterable
@raise LogicParseError: if there is a problem.
-
get_constraint
(code)[source]¶ This method fetches the constraint from the parent query with the matching code.
@see: intermine.query.Query.get_constraint @rtype: intermine.constraints.CodedConstraint
-
get_priority
(op)[source]¶ Operators have a specific precedence, from highest to lowest:
- ()
- AND
- OR
This method returns an integer which can be used to compare operator priorities.
@rtype: int
-
infix_to_postfix
(infix_tokens)[source]¶ Take in a set of infix tokens and return the set parsed to a postfix sequence.
@param infix_tokens: The list of tokens @type infix_tokens: iterable
@rtype: list
-
ops
= {'OR': 'OR', '||': 'OR', ')': ')', 'AND': 'AND', '&': 'AND', '|': 'OR', '(': '(', '&&': 'AND'}¶
-
parse
(logic_str)[source]¶ Takes a string such as “A and B or C and D”, and parses it into a structure which represents this logic as a binary abstract syntax tree. The above string would parse to “(A and B) or (C and D)”, as AND binds more tightly than OR.
Note that only singly rooted trees are parsed.
@param logic_str: The logic defininition as a string @type logic_str: string
@rtype: LogicGroup
@raise LogicParseError: if there is a syntax error in the logic
-
-
class
intermine.constraints.
LoopConstraint
(path, op, loopPath, code='A')[source]¶ Bases:
intermine.constraints.CodedConstraint
These constraints assert that two paths refer to the same object.
- Valid operators:
- IS
- IS NOT
The operators IS and IS NOT map to the ops “=” and ”!=” when they are used in XML serialisation.
-
OPS
= {'IS', 'IS NOT'}¶
-
SERIALISED_OPS
= {'IS': '=', 'IS NOT': '!='}¶
-
class
intermine.constraints.
MultiConstraint
(path, op, values, code='A')[source]¶ Bases:
intermine.constraints.CodedConstraint
These constraints require the value they constrain to be either a member of a set of values, or not a member.
- Valid operators:
- ONE OF
- NONE OF
These constraints are similar in use to List constraints, with the following differences:
- The list in this case is a defined set of values that is passed along with the query itself, rather than anything stored independently on a server.
- The object of the constaint is the value of an attribute, rather than an object’s identity.
-
OPS
= {'NONE OF', 'ONE OF'}¶
-
class
intermine.constraints.
RangeConstraint
(path, op, values, code='A')[source]¶ Bases:
intermine.constraints.MultiConstraint
These constraints require that the value of the path they constrain should lie in relationship to the set of values passed according to the specific operator.
- Valid operators:
- OVERLAPS : The value overlaps at least one of the given ranges
- WITHIN : The value is wholly outside the given set of ranges
- CONTAINS : The value contains all the given ranges
- DOES NOT CONTAIN : The value does not contain all the given ranges
- OUTSIDE : Some part is outside the given set of ranges
- DOES NOT OVERLAP : The value does not overlap with any of the ranges
For example:
4 WITHIN [1..5, 20..25] => TrueThe format of the ranges depends on the value being constrained and what range parsers have been configured on the target server. A common range parser for biological mines is the one for Locations:
Gene.chromosomeLocation OVERLAPS [2X:54321..67890, 3R:12345..456789]-
OPS
= {'OUTSIDE', 'OVERLAPS', 'DOES NOT OVERLAP', 'CONTAINS', 'WITHIN', 'DOES NOT CONTAIN'}¶
-
class
intermine.constraints.
SubClassConstraint
(path, subclass)[source]¶ Bases:
intermine.constraints.Constraint
If an object has a reference X to another object of type A, and type B extends type A, then any object of type B may be the value of the reference X. If you only want to see X’s which are B’s, this may be achieved with subclass constraints, which allow the type of an object to be limited to one of the subclasses (at any depth) of the class type required by the attribute.
These constraints do not use operators. Since they cannot be conditional (eg. “A is a B or A is a C” would not be possible in an InterMine query), they do not have codes and cannot be referenced in logic expressions.
-
class
intermine.constraints.
TemplateBinaryConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.BinaryConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateConstraint
(editable=True, optional='locked')[source]¶ Bases:
object
Constraints on templates can also be designated as “on”, “off” or “locked”, which refers to whether they are active or not. Inactive constraints are still configured, but behave as if absent for the purpose of results. In addition, template constraints can be editable or not. Only values for editable constraints can be provided when requesting results, and only constraints that can participate in logic expressions can be editable.
-
OPTIONAL_OFF
= 'off'¶
-
OPTIONAL_ON
= 'on'¶
-
REQUIRED
= 'locked'¶
-
required
¶ True if a value must be provided for this constraint.
@rtype: bool
-
separate_arg_sets
(args)[source]¶ dict -> (dict, dict)
Splits a dictionary of arguments into two separate dictionaries, one with arguments for the main constraint, and one with arguments for the template portion of the behaviour
-
switched_off
¶ True if this constraint is currently inactive.
@rtype: bool
-
-
class
intermine.constraints.
TemplateConstraintFactory
[source]¶ Bases:
intermine.constraints.ConstraintFactory
A constraint factory is responsible for finding an appropriate constraint class for the given arguments and instantiating the constraint. TemplateConstraintFactories make constraints with the extra set of TemplateConstraint qualities.
-
CONSTRAINT_CLASSES
= {<class 'intermine.constraints.TemplateListConstraint'>, <class 'intermine.constraints.TemplateIsaConstraint'>, <class 'intermine.constraints.TemplateUnaryConstraint'>, <class 'intermine.constraints.TemplateSubClassConstraint'>, <class 'intermine.constraints.TemplateBinaryConstraint'>, <class 'intermine.constraints.TemplateTernaryConstraint'>, <class 'intermine.constraints.TemplateRangeConstraint'>, <class 'intermine.constraints.TemplateLoopConstraint'>, <class 'intermine.constraints.TemplateMultiConstraint'>}¶
-
-
class
intermine.constraints.
TemplateIsaConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.IsaConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateListConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.ListConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateLoopConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.LoopConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateMultiConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.MultiConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateRangeConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.RangeConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateSubClassConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.SubClassConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateTernaryConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.TernaryConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TemplateUnaryConstraint
(*a, **d)[source]¶ Bases:
intermine.constraints.UnaryConstraint
,intermine.constraints.TemplateConstraint
-
class
intermine.constraints.
TernaryConstraint
(path, op, value, extra_value=None, code='A')[source]¶ Bases:
intermine.constraints.BinaryConstraint
These constraints request a wide-ranging search for matching fields over all aspects of an object, including up to coercion from related classes.
- Valid operators:
- LOOKUP
To aid disambiguation, Ternary constaints accept an extra_value as well as the main value.
-
OPS
= {'LOOKUP'}¶
-
class
intermine.constraints.
UnaryConstraint
(path, op, code='A')[source]¶ Bases:
intermine.constraints.CodedConstraint
These constraints are simple assertions about the object/value refered to by the path. The set of valid operators is:
- IS NULL
- IS NOT NULL
-
OPS
= {'IS NOT NULL', 'IS NULL'}¶
intermine.errors module¶
-
exception
intermine.errors.
ServiceError
(message, cause=None)[source]¶ Bases:
intermine.util.ReadableException
Errors in the creation and use of the Service object
intermine.idresolution module¶
-
class
intermine.idresolution.
Job
(service, uid)[source]¶ Bases:
object
Users can submit requests to resolve sets of IDs to objects in the data-store. These jobs begin in a PENDING state, and transition through RUNNING to either SUCCESS or ERROR.
Upon completion, the results of the job may be fetched, and the job may be deleted on the server.
-
INITIAL_BACKOFF
= 0.05¶
-
INITIAL_DECAY
= 1.25¶
-
MAX_BACKOFF
= 60¶
-
intermine.model module¶
-
class
intermine.model.
Attribute
(name, type_name, class_origin)[source]¶ Bases:
intermine.model.Field
The Attribute class inherits all the behaviour of L{intermine.model.Field}
-
fieldtype
¶
-
-
class
intermine.model.
Class
(name, parents, model, interface=True)[source]¶ Bases:
object
These objects refer to the table objects in the InterMine ORM layer.
>>> service = Service("http://www.flymine.org/query/service") >>> model = service.model >>> >>> if "Gene" in model.classes: ... gene_cd = model.get_class("Gene") ... print "Gene has", len(gene_cd.fields), "fields" ... for field in gene_cd.fields: ... print " - ", field.name
Each class can have attributes (columns) of various types, and can have references to other classes (tables), on either a one-to-one (references) or one-to-many (collections) basis
Classes should not be instantiated by hand, but rather used as part of the model they belong to.
-
attributes
¶ @rtype: list(L{Attribute})
-
collections
¶ @rtype: list(L{Collection})
-
fields
¶ The fields are returned sorted by name. Fields includes all Attributes, References and Collections
@rtype: list(L{Field})
-
get_field
(name)[source]¶ The standard way of retrieving a field
@raise ModelError: if the Class does not have such a field
@rtype: subclass of L{intermine.model.Field}
-
isa
(other)[source]¶ This method validates statements about inheritance. Returns true if the “other” is, or is within the ancestry of, this class
Other can be passed as a name (str), or as the class object itself
@rtype: boolean
-
references
¶ @rtype: list(L{Reference})
-
-
class
intermine.model.
Collection
(name, type_name, class_origin, reverse_ref=None)[source]¶ Bases:
intermine.model.Reference
Collections have all the same behaviour and properties as References
-
fieldtype
¶
-
-
class
intermine.model.
Column
(path, model, subclasses={}, query=None, parent=None)[source]¶ Bases:
object
Column objects allow constraints to be constructed in something close to a declarative style
-
class
intermine.model.
ComposedClass
(parts, model)[source]¶ Bases:
intermine.model.Class
These objects are structural unions of two or more different data-types.
-
field_dict
¶ The combined field dictionary of all parts
-
has_id
¶
-
name
¶
-
parent_classes
¶ The flattened list of parent classes, with the parts
-
parents
¶
-
-
class
intermine.model.
Field
(name, type_name, class_origin)[source]¶ Bases:
object
The base class for attributes, references and collections. All columns in DB tables are represented by fields
>>> service = Service("http://www.flymine.org/query/service") >>> model = service.model >>> cd = model.get_class("Gene") >>> print "Gene has", len(cd.fields), "fields" >>> for field in gene_cd.fields: ... print " - ", field Gene has 45 fields - CDSs is a group of CDS objects, which link back to this as gene - GLEANRsymbol is a String - UTRs is a group of UTR objects, which link back to this as gene - alleles is a group of Allele objects, which link back to this as gene - chromosome is a Chromosome - chromosomeLocation is a Location - clones is a group of CDNAClone objects, which link back to this as gene - crossReferences is a group of CrossReference objects, which link back to this as subject - cytoLocation is a String - dataSets is a group of DataSet objects, which link back to this as bioEntities - downstreamIntergenicRegion is a IntergenicRegion - exons is a group of Exon objects, which link back to this as gene - flankingRegions is a group of GeneFlankingRegion objects, which link back to this as gene - goAnnotation is a group of GOAnnotation objects - homologues is a group of Homologue objects, which link back to this as gene - id is a Integer - interactions is a group of Interaction objects, which link back to this as gene - length is a Integer ...
@see: L{Attribute} @see: L{Reference} @see: L{Collection}
-
fieldtype
¶
-
-
class
intermine.model.
Model
(source, service=None)[source]¶ Bases:
object
An abstraction of the database schema
>>> service = Service("http://www.flymine.org/query/service") >>> model = service.model >>> model.get_class("Gene") <intermine.model.Class: Gene>
This class represents the data model - ie. an abstraction of the database schema. It can be used to introspect what data is available and how it is inter-related
-
LOG
= <logging.Logger object>¶
-
NUMERIC_TYPES
= frozenset({'int', 'Float', 'Double', 'long', 'float', 'double', 'Long', 'short', 'Short', 'Integer'})¶
-
get_class
(name)[source]¶ >>> model = Model("http://www.flymine.org/query/service/model") >>> model.get_class("Gene") <intermine.model.Class: Gene> >>> model.get_class("Gene.proteins") <intermine.model.Class: Protein>
This is the recommended way of retrieving a class from the model. As well as handling class names, you can also pass in a path such as “Gene.proteins” and get the corresponding class back (<intermine.model.Class: Protein>)
@raise ModelError: if the class name refers to a non-existant object
@rtype: L{intermine.model.Class}
-
make_path
(path, subclasses={})[source]¶ >>> path = model.make_path("Gene.organism.name") <intermine.model.Path: Gene.organism.name>
This is recommended manner of constructing path objects.
@type path: str @type subclasses: dict
@raise PathParseError: if there is a problem parsing the path string
@rtype: L{intermine.model.Path}
-
parse_model
(source)[source]¶ The xml can be provided as a file, url or string. This method is called during instantiation - it does not need to be called directly.
@param source: the model.xml, as a local file, string, or url @raise ModelParseError: if there is a problem parsing the source
-
parse_path_string
(path_string, subclasses={})[source]¶ >>> parts = Model.parse_path_string(string)
This method is used when making paths from a model, and when validating path strings. It probably won’t need to be called directly.
@see: L{intermine.model.Model.make_path} @see: L{intermine.model.Model.validate_path} @see: L{intermine.model.Path}
-
to_ancestry
(cd)[source]¶ >>> classes = Model.to_ancestry(cd)
Returns the class’ parents, and all the class’ parents’ parents
@rtype: list(L{intermine.model.Class})
-
to_classes
(classnames)[source]¶ >>> classes = model.to_classes(["Gene", "Protein", "Organism"])
This simply maps from a list of strings to a list of classes in the calling model.
@raise ModelError: if the list of class names includes ones that don’t exist
@rtype: list(L{intermine.model.Class})
-
validate_path
(path_string, subclasses={})[source]¶ >>> try: ... model.validate_path("Gene.symbol") ... return "path is valid" ... except PathParseError: ... return "path is invalid" "path is valid"
When you don’t need to interrogate relationships between paths, simply using this method to validate a path string is enough. It guarantees that there is a descriptor for each section of the string, with the appropriate relationships
@raise PathParseError: if there is a problem parsing the path string
-
-
exception
intermine.model.
ModelParseError
(message, source, cause=None)[source]¶ Bases:
intermine.model.ModelError
-
class
intermine.model.
Path
(path, model, subclasses={})[source]¶ Bases:
object
A path represents a connection between records and fields
>>> service = Service("http://www.flymine.org/query/service") model = service.model path = model.make_path("Gene.organism.name") path.is_attribute() ... True >>> path2 = model.make_path("Gene.proteins") path2.is_attribute() ... False >>> path2.is_reference() ... True >>> path2.get_class() ... <intermine.model.Class: gene>
This class is used for performing validation on dotted path strings. The simple act of parsing it into existence will validate the path to some extent, but there are additional methods for verifying certain relationships as well
-
append
(*elements)[source]¶ >>> p1 = Path("Gene.exons", model) >>> p2 = p1.append("name") >>> print p2 ... Gene.exons.name
This is the inverse of prefix.
-
end
¶ The descriptor for the last part of the string.
@rtype: L{model.Class} or L{model.Field}
-
end_class
¶ Return the class object for this path, if it refers to a class or a reference. Attribute paths return None
@rtype: L{model.Class}
-
get_class
()[source]¶ Return the class object for this path, if it refers to a class or a reference. Attribute paths return None
@rtype: L{model.Class}
-
is_attribute
()[source]¶ Return true if the path refers to an attribute, eg: Gene.length
@rtype: boolean
-
is_reference
()[source]¶ Return true if the path is a reference, eg: Gene.organism or Gene.proteins Note: Collections are ALSO references
@rtype: boolean
-
prefix
()[source]¶ >>> p1 = Path("Gene.exons.name", model) >>> p2 = p1.prefix() >>> print p2 ... Gene.exons
-
root
¶ The descriptor for the first part of the string. This should always a class descriptor.
@rtype: L{intermine.model.Class}
-
-
exception
intermine.model.
PathParseError
(message, cause=None)[source]¶ Bases:
intermine.model.ModelError
-
class
intermine.model.
Reference
(name, type_name, class_origin, reverse_ref=None)[source]¶ Bases:
intermine.model.Field
In addition the the behaviour and properties of Field, references may also have a reverse reference, if the other record points back to this one as well. And all references will have their type upgraded to a type_class during parsing
-
fieldtype
¶
-
intermine.pathfeatures module¶
-
class
intermine.pathfeatures.
Join
(path, style='OUTER')[source]¶ Bases:
intermine.pathfeatures.PathFeature
-
INNER
= 'INNER'¶
-
OUTER
= 'OUTER'¶
-
child_type
= 'join'¶
-
valid_join_styles
= ['OUTER', 'INNER']¶
-
-
class
intermine.pathfeatures.
PathDescription
(path, description)[source]¶ Bases:
intermine.pathfeatures.PathFeature
-
child_type
= 'pathDescription'¶
-
-
class
intermine.pathfeatures.
SortOrder
(path, order)[source]¶ Bases:
intermine.pathfeatures.PathFeature
-
ASC
= 'asc'¶
-
DESC
= 'desc'¶
-
DIRECTIONS
= frozenset({'asc', 'desc'})¶
-
-
class
intermine.pathfeatures.
SortOrderList
(*sos)[source]¶ Bases:
object
This class exists to hold the sort order information for a query. It handles appending elements, and the stringification of the sort order.
intermine.query module¶
-
exception
intermine.query.
ConstraintError
(message, cause=None)[source]¶ Bases:
intermine.query.QueryError
-
class
intermine.query.
Query
(model, service=None, validate=True, root=None)[source]¶ Bases:
object
Objects of this class have properties that model the attributes of the query, and methods for performing the request.
example:
>>> service = Service("http://www.flymine.org/query/service") >>> query = service.new_query() >>> >>> query.add_view("Gene.symbol", "Gene.pathways.name", "Gene.proteins.symbol") >>> query.add_sort_order("Gene.pathways.name") >>> >>> query.add_constraint("Gene", "LOOKUP", "eve") >>> query.add_constraint("Gene.pathways.name", "=", "Phosphate*") >>> >>> query.set_logic("A or B") >>> >>> for row in query.rows(): ... handle_row(row)
OR, using an SQL style DSL:
>>> s = Service("www.flymine.org/query") >>> query = s.query("Gene").\ ... select("*", "pathways.*").\ ... where("symbol", "=", "H").\ ... outerjoin("pathways").\ ... order_by("symbol") >>> for row in query.rows(start=10, size=5): ... handle_row(row)
OR, for a more SQL-alchemy, ORM style:
>>> for gene in s.query(s.model.Gene).filter(s.model.Gene.symbol == ["zen", "H", "eve"]).add_columns(s.model.Gene.alleles): ... handle(gene)
Query objects represent structured requests for information over the database housed at the datawarehouse whose webservice you are querying. They utilise some of the concepts of relational databases, within an object-related ORM context. If you don’t know what that means, don’t worry: you don’t need to write SQL, and the queries will be fast.
To make things slightly more familiar to those with knowledge of SQL, some syntactical sugar is provided to make constructing queries a bit more recognisable.
The data model represents tables in the databases as classes, with records within tables as instances of that class. The columns of the database are the fields of that object:
The Gene table - showing two records/objects +---------------------------------------------------+ | id | symbol | length | cyto-location | organism | +----------------------------------------+----------+ | 01 | eve | 1539 | 46C10-46C10 | 01 | +----------------------------------------+----------+ | 02 | zen | 1331 | 84A5-84A5 | 01 | +----------------------------------------+----------+ ... The organism table - showing one record/object +----------------------------------+ | id | name | taxon id | +----------------------------------+ | 01 | D. melanogaster | 7227 | +----------------------------------+
Columns that contain a meaningful value are known as ‘attributes’ (in the tables above, that is everything except the id columns). The other columns (such as “organism” in the gene table) are ones that reference records of other tables (ie. other objects), and are called references. You can refer to any field in any class, that has a connection, however tenuous, with a table, by using dotted path notation:
Gene.organism.name -> the name column in the organism table, referenced by a record in the gene table
These paths, and the connections between records and tables they represent, are the basis for the structure of InterMine queries.
- A query has two principle sets of properties:
- its view: the set of output columns
- its constraints: the set of rules for what to include
A query must have at least one output column in its view, but constraints are optional - if you don’t include any, you will get back every record from the table (every object of that type)
In addition, the query must be coherent: if you have information about an organism, and you want a list of genes, then the “Gene” table should be the basis for your query, and as such the Gene class, which represents this table, should be the root of all the paths that appear in it:
So, to take a simple example:
I have an organism name, and I want a list of genes:
The view is the list of things I want to know about those genes:
>>> query.add_view("Gene.name") >>> query.add_view("Gene.length") >>> query.add_view("Gene.proteins.sequence.length")
Note I can freely mix attributes and references, as long as every view ends in an attribute (a meaningful value). As a short-cut I can also write:
>>> query.add_views("Gene.name", "Gene.length", "Gene.proteins.sequence.length")
or:
>>> query.add_views("Gene.name Gene.length Gene.proteins.sequence.length")
They are all equivalent. You can also use common SQL style shortcuts such as “*” for all attribute fields:
>>> query.add_views("Gene.*")
You can also use “select” as a synonymn for “add_view”
Now I can add my constraints. As, we mentioned, I have information about an organism, so:
>>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster")
(note, here I can use “where” as a synonymn for “add_constraint”)
If I run this query, I will get literally millions of results - it needs to be filtered further:
>>> query.add_constraint("Gene.proteins.sequence.length", "<", 500)
If that doesn’t restrict things enough I can add more filters:
>>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"])
Now I am guaranteed to get only information on genes I am interested in.
Note, though, that because I have included the link (or “join”) from Gene -> Protein, this, by default, means that I only want genes that have protein information associated with them. If in fact I want information on all genes, and just want to know the protein information if it is available, then I can specify that with:
>>> query.add_join("Gene.proteins", "OUTER")
And if perhaps my query is not as simple as a strict cumulative filter, but I want all D. mel genes that EITHER have a short protein sequence OR come from one of my favourite genes (as unlikely as that sounds), I can specify the logic for that too:
>>> query.set_logic("A and (B or C)")
Each letter refers to one of the constraints - the codes are assigned in the order you add the constraints. If you want to be absolutely certain about the constraints you mean, you can use the constraint objects themselves:
>>> gene_is_eve = query.add_constraint("Gene.symbol", "=", "eve") >>> gene_is_zen = query.add_constraint("Gene.symbol", "=", "zne") >>> >>> query.set_logic(gene_is_eve | gene_is_zen)
By default the logic is a straight cumulative filter (ie: A and B and C and D and ...)
Putting it all together:
>>> query.add_view("Gene.name", "Gene.length", "Gene.proteins.sequence.length") >>> query.add_constraint("Gene.organism.name", "=", "D. melanogaster") >>> query.add_constraint("Gene.proteins.sequence.length", "<", 500) >>> query.add_constraint("Gene.symbol", "ONE OF", ["eve", "zen", "h"]) >>> query.add_join("Gene.proteins", "OUTER") >>> query.set_logic("A and (B or C)")
This can be made more concise and readable with a little DSL sugar:
>>> query = service.query("Gene") >>> query.select("name", "length", "proteins.sequence.length"). ... where('organism.name' '=', 'D. melanogaster'). ... where("proteins.sequence.length", "<", 500). ... where('symbol', 'ONE OF', ['eve', 'h', 'zen']). ... outerjoin('proteins'). ... set_logic("A and (B or C)")
And the query is defined.
calling ”.rows()” on a query will return an iterator of rows, where each row is a ResultRow object, which can be treated as both a list and a dictionary.
Which means you can refer to columns by name:
>>> for row in query.rows(): ... print "name is %s" % (row["name"]) ... print "length is %d" % (row["length"])
As well as using list indices:
>>> for row in query.rows(): ... print "The first column is %s" % (row[0])
Iterating over a row iterates over the cell values as a list:
>>> for row in query.rows(): ... for column in row: ... do_something(column)
Here each row will have a gene name, a gene length, and a sequence length, eg:
>>> print row.to_l ["even skipped", "1359", "376"]
To make that clearer, you can ask for a dictionary instead of a list:
>>> for row in query.rows() ... print row.to_d {"Gene.name":"even skipped","Gene.length":"1359","Gene.proteins.sequence.length":"376"}
If you just want the raw results, for printing to a file, or for piping to another program, you can request the results in one of these formats: json’, ‘rr’, ‘tsv’, ‘jsonobjects’, ‘jsonrows’, ‘list’, ‘dict’, ‘csv’
>>> for row in query.result("<format name>", size = <size>) ... print(row)
Results can also be processing on a record by record basis. If you have a query that has output columns of “Gene.symbol”, “Gene.pathways.name” and “Gene.proteins.proteinDomains.primaryIdentifier”, than processing it by records will return one object per gene, and that gene will have a property named “pathways” which contains objects which have a name property. Likewise there will be a proteins property which holds a list of proteinDomains which all have a primaryIdentifier property, and so on. This allows a more object orientated approach to database records, familiar to users of other ORMs.
This is the format used when you choose to iterate over a query directly, or can be explicitly chosen by invoking L{intermine.query.Query.results}:
>>> for gene in query: ... print gene.name, map(lambda x: x.name, gene.pathways)
The structure of the object and the information it contains depends entirely on the output columns selected. The values may be None, of course, but also any valid values of an object (according to the data model) will also be None if they were not selected for output. Attempts to access invalid properties (such as gene.favourite_colour) will cause exceptions to be thrown.
Not that you have to actually write any of this! The webapp will happily generate the code for any query (and template) you can build in it. A good way to get started is to use the webapp to generate your code, and then run it as scripts to speed up your queries. You can always tinker with and edit the scripts you download.
To get generated queries, look for the “python” link at the bottom of query-builder and template form pages, it looks a bit like this:
. +=====================================+============= | | | Perl | Python | Java [Help] | | | +==============================================
-
LEADING_OP_PATTERN
= re.compile('^\\s*(and|or)\\s*', re.IGNORECASE)¶
-
LOGIC_SPLIT_PATTERN
= re.compile('\\s*(?:and|or|\\(|\\))\\s*', re.IGNORECASE)¶
-
ORPHANED_OP_PATTERN
= re.compile('(?:\\(\\s*(?:and|or)\\s*|\\s*(?:and|or)\\s*\\))', re.IGNORECASE)¶
-
SO_SPLIT_PATTERN
= re.compile('\\s*(asc|desc)\\s*', re.IGNORECASE)¶
-
TRAILING_OP_PATTERN
= re.compile('\\s*(and|or)\\s*$', re.IGNORECASE)¶
-
add_constraint
(*args, **kwargs)[source]¶ example:
query.add_constraint("Gene.symbol", "=", "zen")
This method will try to make a constraint from the arguments given, trying each of the classes it knows of in turn to see if they accept the arguments. This allows you to add constraints of different types without having to know or care what their classes or implementation details are. All constraints derive from intermine.constraints.Constraint, and they all have a path attribute, but are otherwise diverse.
Before adding the constraint to the query, this method will also try to check that the constraint is valid by calling Query.verify_constraint_paths()
@see: L{intermine.constraints}
@rtype: L{intermine.constraints.Constraint}
-
add_join
(*args, **kwargs)[source]¶ example:
query.add_join("Gene.proteins", "OUTER")
A join statement is used to determine if references should restrict the result set by only including those references exist. For example, if one had a query with the view:
"Gene.name", "Gene.proteins.name"
Then in the normal case (that of an INNER join), we would only get Genes that also have at least one protein that they reference. Simply by asking for this output column you are placing a restriction on the information you get back.
If in fact you wanted all genes, regardless of whether they had proteins associated with them or not, but if they did you would rather like to know _what_ proteins, then you need to specify this reference to be an OUTER join:
query.add_join("Gene.proteins", "OUTER")
Now you will get many more rows of results, some of which will have “null” values where the protein name would have been,
This method will also attempt to validate the join by calling Query.verify_join_paths(). Joins must have a valid path, the style can be either INNER or OUTER (defaults to OUTER, as the user does not need to specify inner joins, since all references start out as inner joins), and the path must be a reference.
@raise ModelError: if the path is invalid @raise TypeError: if the join style is invalid
@rtype: L{intermine.pathfeatures.Join}
-
add_path_description
(*args, **kwargs)[source]¶ example:
query.add_path_description("Gene.proteins.proteinDomains", "Protein Domain")
This allows you to alias the components of long paths to improve the way they display column headers in a variety of circumstances. In the above example, if the view included the unwieldy path “Gene.proteins.proteinDomains.primaryIdentifier”, it would (depending on the mine) be displayed as “Protein Domain > DB Identifer”. These setting are taken into account by the webservice when generating column headers for flat-file results with the columnheaders parameter given, and always supplied when requesting jsontable results.
@rtype: L{intermine.pathfeatures.PathDescription}
-
add_sort_order
(path, direction='asc')[source]¶ example:
Query.add_sort_order("Gene.name", "DESC")
This method adds a sort order to the query. A query can have multiple sort orders, which are assessed in sequence.
If a query has two sort-orders, for example, the first being “Gene.organism.name asc”, and the second being “Gene.name desc”, you would have the list of genes grouped by organism, with the lists within those groupings in reverse alphabetical order by gene name.
This method will try to validate the sort order by calling validate_sort_order()
Also available as Query.order_by
-
add_view
(*paths)[source]¶ example:
query.add_view("Gene.name Gene.organism.name")
This is the main method for adding views to the list of output columns. As well as appending views, it will also split a single, space or comma delimited string into multiple paths, and flatten out lists, or any combination. It will also immediately try to validate the views.
Output columns must be valid paths according to the data model, and they must represent attributes of tables
- Also available as:
- add_views
- add_column
- add_columns
- add_to_select
@see: intermine.model.Model @see: intermine.model.Path @see: intermine.model.Attribute
-
children
()[source]¶ This method is used during the serialisation of queries to xml. It is unlikely you will need access to this as a whole. Consider using “path_descriptions”, “joins”, “constraints” instead
@see: Query.path_descriptions @see: Query.joins @see: Query.constraints
@return: the child element of this query @rtype: list
-
clone
()[source]¶ This method will produce a clone that is independent, and can be altered without affecting the original, but starts off with the exact same state as it.
The only shared elements should be the model and the service, which are shared by all queries that refer to the same webservice.
@return: same class as caller
-
coded_constraints
¶ Query.coded_constraints S{->} list(intermine.constraints.CodedConstraint)
This returns an up to date list of the constraints that can be used in a logic expression. The only kind of constraint that this excludes, at present, is SubClassConstraints
@rtype: list(L{intermine.constraints.CodedConstraint})
-
constraints
¶ Query.constraints S{->} list(intermine.constraints.Constraint)
Constraints are returned in the order of their code (normally the order they were added to the query) and with any subclass contraints at the end.
@rtype: list(Constraint)
-
count
()[source]¶ Obtain the number of rows a particular query will return, without having to fetch and parse all the actual data. This method makes a request to the server to report the count for the query, and is sugar for a results call.
Also available as Query.size
@rtype: int @raise WebserviceError: if the request is unsuccessful.
-
first
(row='jsonobjects', start=0, **kw)[source]¶ Return the first result, or None if the results are empty
-
classmethod
from_xml
(xml, *args, **kwargs)[source]¶ This method is used to instantiate serialised queries. It is used by intermine.webservice.Service objects to instantiate Template objects and it can be used to read in queries you have saved to a file.
@param xml: The xml as a file name, url, or string
@raise QueryParseError: if the query cannot be parsed @raise ModelError: if the query has illegal paths in it @raise ConstraintError: if the constraints don’t make sense
@rtype: L{Query}
-
get_constraint
(code)[source]¶ Returns the constraint with the given code, if if exists. If no such constraint exists, it throws a ConstraintError
@return: the constraint corresponding to the given code @rtype: L{intermine.constraints.CodedConstraint}
-
get_default_sort_order
()[source]¶ This method is called to determine the sort order if none is specified
@raise QueryError: if the view is empty
@rtype: L{intermine.pathfeatures.SortOrderList}
-
get_list_append_uri
()[source]¶ Query.get_list_append_uri() -> str
This method is used internally when performing list operations on queries.
@rtype: str
-
get_list_upload_uri
()[source]¶ Query.get_list_upload_uri() -> str
This method is used internally when performing list operations on queries.
@rtype: str
-
get_logic
()[source]¶ This returns the up to date logic expression. The default value is the representation of all coded constraints and’ed together.
If the logic is empty and there are no constraints, returns an empty string.
The LogicGroup object stringifies to a string that can be parsed to obtain itself (eg: “A and (B or C or D)”).
@rtype: L{intermine.constraints.LogicGroup}
-
get_results_list
(*args, **kwargs)[source]¶ This method is a shortcut so that you do not have to do a list comprehension yourself on the iterator that is normally returned. If you have a very large result set (and these can get up to 100’s of thousands or rows pretty easily) you will not want to have the whole list in memory at once, but there may be other circumstances when you might want to keep the whole list in one place.
It takes all the same arguments and parameters as Query.results
Also available as Query.all
@see: L{intermine.query.Query.results}
-
get_results_path
()[source]¶ Query.get_results_path() -> str
Internally, this just calls a constant property in intermine.service.Service
@rtype: str
-
get_sort_order
()[source]¶ This method returns the sort order if set, otherwise it returns the default sort order
@raise QueryError: if the view is empty
@rtype: L{intermine.pathfeatures.SortOrderList}
-
get_subclass_dict
()[source]¶ This method returns a mapping of classes used by the model for assessing whether certain paths are valid. For intance, if you subclass MicroArrayResult to be FlyAtlasResult, you can refer to the .presentCall attributes of fly atlas results. MicroArrayResults do not have this attribute, and a path such as:
Gene.microArrayResult.presentCall
would be marked as invalid unless the dictionary is provided.
Users most likely will not need to ever call this method.
@rtype: dict(string, string)
-
make_list_constraint
(path, op)[source]¶ Implementation of trait that allows use of these objects in list constraints
-
results
(row='object', start=0, size=None, summary_path=None)[source]¶ Usage:
>>> query = service.model.Gene.select("symbol", "length") >>> total = 0 >>> for gene in query.results(): ... print gene.symbol # handle strings ... total += gene.length # handle numbers >>> for row in query.results(row="rr"): ... print row["symbol"] # handle strings by dict index ... total += row["length"] # handle numbers by dict index ... print row["Gene.symbol"] # handle strings by full dict index ... total += row["Gene.length"] # handle numbers by full dict index ... print row[0] # handle strings by list index ... total += row[1] # handle numbers by list index >>> for d in query.results(row="dict"): ... print row["Gene.symbol"] # handle strings ... total += row["Gene.length"] # handle numbers >>> for l in query.results(row="list"): ... print row[0] # handle strings ... total += row[1] # handle numbers >>> import csv >>> csv_reader = csv.reader(q.results(row="csv"), delimiter=",", quotechar='"') >>> for row in csv_reader: ... print row[0] # handle strings ... length_sum += int(row[1]) # handle numbers >>> tsv_reader = csv.reader(q.results(row="tsv"), delimiter=" ") >>> for row in tsv_reader: ... print row[0] # handle strings ... length_sum += int(row[1]) # handle numbers
This is the general method that allows access to any of the available result formats. The example above shows the ways these differ in terms of accessing fields of the rows, as well as dealing with different data types. Results can either be retrieved as typed values (jsonobjects, rr [‘ResultRows’], dict, list), or as lists of strings (csv, tsv) which then require further parsing. The default format for this method is “objects”, where information is grouped by its relationships. The other main format is “rr”, which stands for ‘ResultRows’, and can be accessed directly through the L{rows} method.
Note that when requesting object based results (the default), if your query contains any kind of collection, it is highly likely that start and size won’t do what you think, as they operate only on the underlying rows used to build up the returned objects. If you want rows back, you are recommeded to use the simpler rows method.
If no views have been specified, all attributes of the root class are selected for output.
- @param row: The format for each result. One of “object”, “rr”,
- “dict”, “list”, “tsv”, “csv”, “jsonrows”, “jsonobjects”
@type row: string @param start: the index of the first result to return (default = 0) @type start: int @param size: The maximum number of results to return (default = all) @type size: int @param summary_path: A column name to optionally summarise. Specifying a path
will force “jsonrows” format, and return an iterator over a list of dictionaries. Use this when you are interested in processing a summary in order of greatest count to smallest.@type summary_path: str or L{intermine.model.Path}
@rtype: L{intermine.webservice.ResultIterator}
@raise WebserviceError: if the request is unsuccessful
-
rows
(start=0, size=None)[source]¶ This is a shortcut for results(“rr”)
Usage:
>>> for row in query.rows(start=10, size=10): ... print row["proteins.name"]
@param start: the index of the first result to return (default = 0) @type start: int @param size: The maximum number of results to return (default = all) @type size: int @rtype: iterable<intermine.webservice.ResultRow>
-
select
(*paths)[source]¶ example:
query.select("*", "proteins.name")
This method is intended to provide an API familiar to those with experience of SQL or other ORM layers. This method, in contrast to other view manipulation methods, replaces the selection of output columns, rather than appending to it.
Note that any sort orders that are no longer in the view will be removed.
@param paths: The output columns to add
-
set_logic
(value)[source]¶ example:
Query.set_logic("A and (B or C)")
This sets the logic to the appropriate value. If the value is already a LogicGroup, it is accepted, otherwise the string is tokenised and parsed.
The logic is then validated with a call to validate_logic()
raise LogicParseError: if there is a syntax error in the logic
-
summarise
(summary_path, **kwargs)[source]¶ - Usage::
>>> query = service.select("Gene.*", "organism.*").where("Gene", "IN", "my-list") >>> print query.summarise("length")["average"] ... 12345.67890 >>> print query.summarise("organism.name")["Drosophila simulans"] ... 98
This method allows you to get statistics summarising the information from just one column of a query. For numerical columns you get dictionary with four keys (‘average’, ‘stdev’, ‘max’, ‘min’), and for non-numerical columns you get a dictionary where each item is a key and the values are the number of occurrences of this value in the column.
Any key word arguments will be passed to the underlying results call - so you can limit the result size to the top 100 items by passing “size = 100” as part of the call.
@see: L{intermine.query.Query.results}
@param summary_path: The column to summarise (either in long or short form) @type summary_path: str or L{intermine.model.Path}
@rtype: dict This method is sugar for particular combinations of calls to L{results}.
-
to_Node
()[source]¶ This is an intermediate step in the creation of the xml serialised version of the query. You probably won’t need to call this directly.
@rtype: xml.minidom.Node
-
to_formatted_xml
()[source]¶ This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice, only more readably.
@return: the serialised xml string @rtype: string
-
to_query_params
()[source]¶ The query is responsible for producing its own query parameters. These consist simply of:
- query: the xml representation of the query
@rtype: dict
-
to_xml
()[source]¶ This method serialises the current state of the query to an xml string, suitable for storing, or sending over the internet to the webservice.
@return: the serialised xml string @rtype: string
-
validate_logic
(logic=None)[source]¶ Attempts to validate the logic by checking that every coded_constraint is included at least once
@raise QueryError: if not every coded constraint is represented
-
validate_sort_order
(*so_elems)[source]¶ - Checks that the sort order paths are:
- valid paths
- in the view
@raise QueryError: if the sort order is not in the view @raise ModelError: if the path is invalid
-
verify
()[source]¶ Invalid queries will fail to run, and it is not always obvious why. The validation routine checks to see that the query will not cause errors on execution, and tries to provide informative error messages.
This method is called immediately after a query is fully deserialised.
@raise ModelError: if the paths are invalid @raise QueryError: if there are errors in query construction @raise ConstraintError: if there are errors in constraint construction
-
verify_constraint_paths
(cons=None)[source]¶ This method will check the path attribute of each constraint. In addition it will:
- Check that BinaryConstraints and MultiConstraints have an Attribute as their path
- Check that TernaryConstraints have a Reference as theirs
- Check that SubClassConstraints have a correct subclass relationship
- Check that LoopConstraints have a valid loopPath, of a compatible type
- Check that ListConstraints refer to an object
- Don’t even try to check RangeConstraints: these have variable semantics
@param cons: The constraints to check (defaults to all constraints on the query)
@raise ModelError: if the paths are not valid @raise ConstraintError: if the constraints do not satisfy the above rules
-
verify_join_paths
(joins=None)[source]¶ Joins must have valid paths, and they must refer to references.
@raise ModelError: if the paths are invalid @raise QueryError: if the paths are not references
-
verify_pd_paths
(pds=None)[source]¶ Checks for consistency with the data model
@raise ModelError: if the paths are invalid
-
exception
intermine.query.
QueryParseError
(message, cause=None)[source]¶ Bases:
intermine.query.QueryError
-
class
intermine.query.
Template
(*args, **kwargs)[source]¶ Bases:
intermine.query.Query
Templates are ways of saving queries and allowing others to run them simply. They are the main interface to querying in the webapp
example:
service = Service("http://www.flymine.org/query/service") template = service.get_template("Gene_Pathways") for row in template.results(A={"value":"eve"}): process_row(row) ...
A template is a subclass of query that comes predefined. They are typically retrieved from the webservice and run by specifying the values for their existing constraints. They are a concise and powerful way of running queries in the webapp.
Being subclasses of query, everything is true of them that is true of a query. They are just less work, as you don’t have to design each one. Also, you can store your own templates in the web-app, and then access them as a private webservice method, from anywhere, making them a kind of query in the cloud - for this you will need to authenticate by providing log in details to the service.
The most significant difference is how constraint values are specified for each set of results.
@see: L{Template.results}
-
count
(**con_values)[source]¶ Obtain the number of rows a particular query will return, without having to fetch and parse all the actual data. This method makes a request to the server to report the count for the query, and is sugar for a results call.
@rtype: int @raise WebserviceError: if the request is unsuccessful.
-
editable_constraints
¶ Template.editable_constraints -> list(intermine.constraints.Constraint)
Templates have a concept of editable constraints, which is a way of hiding complexity from users. An underlying query may have five constraints, but only expose the one that is actually interesting. This property returns this subset of constraints that have the editable flag set to true.
-
get_adjusted_template
(con_values)[source]¶ Template.get_adjusted_template(con_values) S{->} Template
When templates are run, they are first cloned, and their values are changed to those desired. This leaves the original template unchanged so it can be run again with different values. This method does the cloning and changing of constraint values
@raise ConstraintError: if the constraint values specify values for a non-editable constraint.
@rtype: L{Template}
-
get_results_list
(row='object', start=0, size=None, **con_values)[source]¶ This method performs the same as the method of the same name in Query, and it shares the semantics of Template.results().
@see: L{intermine.query.Query.get_results_list} @see: L{intermine.query.Template.results}
@rtype: list
-
get_results_path
()[source]¶ Template.get_results_path() S{->} str
Internally, this just calls a constant property in intermine.service.Service
This overrides the method of the same name in Query
@return: the path to the REST resource @rtype: string
-
get_row_list
(start=0, size=None, **con_values)[source]¶ Return a list of the rows returned by this query
-
results
(row='object', start=0, size=None, **con_values)[source]¶ This method returns the same values with the same options as the method of the same name in Query (see intermine.query.Query). The main difference in in the arguments.
The template result methods also accept a key-word pair set of arguments that are used to supply values to the editable constraints. eg:
template.results( A = {"value": "eve"}, B = {"op": ">", "value": 5000} )
The keys should be codes for editable constraints (you can inspect these with Template.editable_constraints) and the values should be a dictionary of constraint properties to replace. You can replace the values for “op” (operator), “value”, and “extra_value” and “values” in the case of ternary and multi constraints.
@rtype: L{intermine.webservice.ResultIterator}
-
rows
(start=0, size=None, **con_values)[source]¶ Get an iterator over the rows returned by this query
-
to_query_params
()[source]¶ Template.to_query_params() -> dict(string, string)
Overrides the method of the same name in query to provide the parameters needed by the templates results service. These are slightly more complex:
name: The template’s name
- for each constraint: (where [i] is an integer incremented for each constraint)
- constraint[i]: the path
- op[i]: the operator
- value[i]: the value
- code[i]: the code
- extra[i]: the extra value for ternary constraints (optional)
@rtype: dict
-
intermine.registry module¶
-
intermine.registry.
getData
(mine)[source]¶ example:
>>> from intermine import registry >>> registry.getData('flymine') Name: Affymetrix array: Drosophila1 Name: Affymetrix array: Drosophila2 Name: Affymetrix array: GeneChip Drosophila Genome 2.0 Array Name: Affymetrix array: GeneChip Drosophila Genome Array Name: Anoph-Expr data set Name: BDGP cDNA clone data set.....
-
intermine.registry.
getInfo
(mine)[source]¶ example:
>>> from intermine import registry >>> registry.getInfo('flymine') Description: An integrated database for Drosophila genomics URL: http://www.flymine.org/flymine API Version: 25 Release Version: 45.1 2017 August InterMine Version: 1.8.5 Organisms: D. melanogaster Neighbours: MODs
intermine.results module¶
-
class
intermine.results.
EnrichmentLine
(*args, **kwargs)[source]¶ Bases:
collections.UserDict
These objects operate as dictionaries as well as objects with predefined properties.
-
class
intermine.results.
FlatFileIterator
(connection, parser)[source]¶ Bases:
object
This iterator can be used as the sub iterator in a ResultIterator
-
class
intermine.results.
InterMineURLOpener
(credentials=None, token=None)[source]¶ Bases:
object
Provides user agent and authentication headers, and handling of errors
-
JSON
= 'application/json'¶
-
PLAIN_TEXT
= 'text/plain'¶
-
USER_AGENT
= "InterMine-Client-1.11.0/python-sys.version_info(major=3, minor=5, micro=2, releaselevel='final', serial=0)"¶
-
http_error_400
(url, fp, errcode, errmsg, headers, data=None)[source]¶ 400 errors indicate that something about our request was incorrect
@raise WebserviceError: in all circumstances
-
http_error_401
(url, fp, errcode, errmsg, headers, data=None)[source]¶ 401 errors indicate we don’t have sufficient permission for the resource we requested - usually a list or a tempate
@raise WebserviceError: in all circumstances
-
http_error_403
(url, fp, errcode, errmsg, headers, data=None)[source]¶ 401 errors indicate we don’t have sufficient permission for the resource we requested - usually a list or a tempate
@raise WebserviceError: in all circumstances
-
http_error_404
(url, fp, errcode, errmsg, headers, data=None)[source]¶ 404 errors indicate that the requested resource does not exist - usually a template that is not longer available.
@raise WebserviceError: in all circumstances
-
http_error_500
(url, fp, errcode, errmsg, headers, data=None)[source]¶ 500 errors indicate that the server borked during the request - ie: it wasn’t our fault.
@raise WebserviceError: in all circumstances
-
-
class
intermine.results.
JSONIterator
(connection, parser)[source]¶ Bases:
object
This iterator can be used as the sub iterator in a ResultIterator
-
LOG
= <logging.Logger object>¶
-
check_return_status
()[source]¶ The footer containts information as to whether the result set was successfully transferred in its entirety. This method makes sure we don’t silently accept an incomplete result set.
@raise WebserviceError: if the footer indicates there was an error
-
-
class
intermine.results.
ResultIterator
(service, path, params, rowformat, view, cld=None)[source]¶ Bases:
object
These objects handle the iteration over results in the formats requested by the user. They are responsible for generating an appropriate parser, connecting the parser to the results, and delegating iteration appropriately.
-
JSON_FORMATS
= frozenset({'jsonobjects', 'jsonrows', 'json'})¶
-
PARSED_FORMATS
= frozenset({'rr', 'dict', 'list'})¶
-
ROW_FORMATS
= frozenset({'dict', 'jsonrows', 'tsv', 'rr', 'count', 'csv', 'jsonobjects', 'json', 'list'})¶
-
STRING_FORMATS
= frozenset({'count', 'csv', 'tsv'})¶
-
-
class
intermine.results.
ResultObject
(data, cld, view=[])[source]¶ Bases:
object
These objects are backed by a row of data and the class descriptor that describes the object. They allow access in standard object style:
>>> for gene in query.results(): ... print gene.symbol ... print map(lambda x: x.name, gene.pathways)
All objects will have “id” and “type” properties. The type refers to the actual type of this object: if it is a subclass of the one requested, the subclass name will be returned. The “id” refers to the internal database id of the object, and is a guarantor of object identity.
-
id
¶ Return the internal DB identifier of this object. Or None if this is not an InterMine object
-
-
class
intermine.results.
ResultRow
(data, views)[source]¶ Bases:
object
ResultRows provide access to the fields of the row through index lookup. However, for convenience both list indexes and dictionary keys can be used. So the following all work:
>>> # Assuming the view is "Gene.symbol", "Gene.organism.name": >>> row[0] == row["symbol"] == row["Gene.symbol"] == row(0) == row("symbol") ... True
-
class
intermine.results.
TableResultRow
(data, views)[source]¶ Bases:
intermine.results.ResultRow
A class for parsing results from the jsonrows data format.
intermine.util module¶
intermine.webservice module¶
-
class
intermine.webservice.
Registry
(registry_url='http://www.intermine.org/registry')[source]¶ Bases:
collections.abc.MutableMapping
Registries are web-services that mines can automatically register themselves with, and thus enable service discovery by clients.
example:
from intermine.webservice import Registry # Connect to the default registry service # at www.intermine.org/registry registry = Registry() # Find all the available mines: for name, mine in registry.items(): print name, mine.version # Dict-like interface for accessing mines. flymine = registry["flymine"] # The mine object is a Service for gene in flymine.select("Gene.*").results(): process(gene)
This class is meant to aid with interoperation between mines by allowing them to discover one-another, and allow users to always have correct connection information.
-
MINES_PATH
= '/mines.json'¶
-
-
class
intermine.webservice.
Service
(root, username=None, password=None, token=None, prefetch_depth=1, prefetch_id_only=False)[source]¶ Bases:
object
The intermine.webservice.Service class is the main interface for the user. It will provide access to queries and templates, as well as doing the background task of fetching the data model, and actually requesting the query results.
example:
from intermine.webservice import Service service = Service("http://www.flymine.org/query/service") template = service.get_template("Gene_Pathways") for row in template.results(A={"value":"zen"}): do_something_with(row) ... query = service.new_query() query.add_view("Gene.symbol", "Gene.pathway.name") query.add_constraint("Gene", "LOOKUP", "zen") for row in query.results(): do_something_with(row) ... new_list = service.create_list("some/file/with.ids", "Gene") list_on_server = service.get_list("On server") in_both = new_list & list_on_server in_both.name = "Intersection of these lists" for row in in_both: do_something_with(row) ...
- The two methods the user will be most concerned with are:
- L{Service.new_query}: constructs a new query to query a service with
- L{Service.get_template}: gets a template from the service
- L{ListManager.create_list}: creates a new list on the service
For list management information, see L{ListManager}.
X{Query} is the term for an arbitrarily complex structured request for data from the webservice. The user is responsible for specifying the structure that determines what records are returned, and what information about each record is provided.
X{Template} is the term for a predefined “Query”, ie: one that has been written and saved on the webservice you will access. The definition of the query is already done, but the user may want to specify the values of the constraints that exist on the template. Templates are accessed by name, and while you can easily introspect templates, it is assumed you know what they do when you use them
X{List} is a saved result set containing a set of objects previously identified in the database. Lists can be created and managed using this client library.
@see: L{intermine.query}
-
IDS_PATH
= '/ids'¶
-
LIST_APPENDING_PATH
= '/lists/append'¶
-
LIST_CREATION_PATH
= '/lists'¶
-
LIST_ENRICHMENT_PATH
= '/list/enrichment'¶
-
LIST_MANAGER_METHODS
= frozenset({'get_all_lists', 'create_list', 'get_list_count', 'get_all_list_names', 'delete_lists', 'l', 'get_list'})¶
-
LIST_PATH
= '/lists'¶
-
LIST_RENAME_PATH
= '/lists/rename'¶
-
LIST_TAG_PATH
= '/list/tags'¶
-
MODEL_PATH
= '/model'¶
-
QUERY_LIST_APPEND_PATH
= '/query/append/tolist'¶
-
QUERY_LIST_UPLOAD_PATH
= '/query/tolist'¶
-
QUERY_PATH
= '/query/results'¶
-
RELEASE_PATH
= '/version/release'¶
-
SAVEDQUERY_PATH
= '/savedqueries/xml'¶
-
SCHEME
= 'http://'¶
-
SEARCH_PATH
= '/search'¶
-
SERVICE_RESOLUTION_PATH
= '/check/'¶
-
TEMPLATEQUERY_PATH
= '/template/results'¶
-
TEMPLATES_PATH
= '/templates'¶
-
USERS_PATH
= '/users'¶
-
VERSION_PATH
= '/version/ws'¶
-
WIDGETS_PATH
= '/widgets'¶
-
deregister
(deregistration_token)[source]¶ @param deregistration_token A token to prove you really want to do this
@return string All the user’s data.
-
get_results
(path, params, rowformat, view, cld=None)[source]¶ This method is called internally by the query objects when they are called to get results. You will not normally need to call it directly
@param path: The resource path (eg: “/query/results”) @type path: string @param params: The query parameters for this request as a dictionary @type params: dict @param rowformat: One of “rr”, “object”, “count”, “dict”, “list”, “tsv”, “csv”, “jsonrows”, “jsonobjects” @type rowformat: string @param view: The output columns @type view: list
@raise WebserviceError: for failed requests
@return: L{intermine.webservice.ResultIterator}
-
get_template
(name)[source]¶ Tries to retrieve a template of the given name from the webservice. If you are trying to fetch a private template (ie. one you made yourself and is not available to others) then you may need to authenticate
@see: L{intermine.webservice.Service.__init__}
@param name: the template’s name @type name: string
@raise ServiceError: if the template does not exist @raise QueryParseError: if the template cannot be parsed
@return: L{intermine.query.Template}
-
list_manager
()[source]¶ This method is primarily useful as a context manager when creating temporary lists, since on context exit all temporary lists will be cleaned up:
with service.list_manager() as manager: temp_a = manager.create_list(file_a, "Gene") temp_b = manager.create_list(file_b, "Gene") for gene in (temp_a & temp_b): print gene.primaryIdentifier, "is in both"
@rtype: ListManager
-
load_query
(xml, root=None)[source]¶ This is the standard method for instantiating new Query objects. Queries require access to the data model, as well as the service itself, so it is easiest to access them through this factory method.
@return: L{intermine.query.Query}
-
model
¶ Service.model S{->} L{intermine.model.Model}
This is used when constructing queries to provide them with information on the structure of the data model they are accessing. You are very unlikely to want to access this object directly.
raises ModelParseError: if the model cannot be read
@rtype: L{intermine.model.Model}
-
new_query
(*columns, **kwargs)¶ As new_query, except that instead of a root class, a list of output column expressions are passed instead.
-
release
¶ Service.release S{->} string
The release is an arbitrary string used to distinguish releases of the datawarehouse. This usually coincides with updates to the data contained within. While a string, releases usually sort in ascending order of recentness (eg: “release-26”, “release-27”, “release-28”). They can also have less machine readable meanings (eg: “beta”)
@rtype: string
-
resolve_ids
(data_type, identifiers, extra='', case_sensitive=False, wildcards=False)[source]¶ Request that a set of identifiers be resolved to objects in the data store.
@param data_type: The type of these identifiers (eg. ‘Gene’) @type data_type: String
@param identifiers: The ids to resolve (eg. [‘eve’, ‘zen’, ‘pparg’]) @type identifiers: iterable of string
@param extra: A disambiguating value (eg. “Drosophila melanogaster”) @type extra: String
@param case_sensitive: Whether to treat IDs case sensitively. @type case_sensitive: Boolean
@param wildcards: Whether or not to interpret wildcards (eg: “eve*”) @type wildcards: Boolean
@return: {idresolution.Job} The job.
-
search
(term, **facets)[source]¶ This seach method performs a search of all objects indexed by the service endpoint, returning results and facets for those results.
@param term The search term @param facets The facets to search by (eg: Organism = ‘H. sapiens’)
@return (list, dict) The results, and a dictionary of facetting informtation.
-
select
(*columns, **kwargs)[source]¶ As new_query, except that instead of a root class, a list of output column expressions are passed instead.
-
templates
¶ Service.templates S{->} dict(intermine.query.Template|string)
For efficiency’s sake, Templates are not parsed until they are required, and until then they are stored as XML strings. It is recommended that in most cases you would want to use L{Service.get_template}.
You can use this property however to test for template existence though:
if name in service.templates: template = service.get_template(name)
@rtype: dict
-
version
¶ The version specifies what capabilities a specific webservice provides. The most current version is 3
may raise ServiceError: if the version cannot be fetched
@rtype: int
-
widgets
¶ The set of widgets available to a service does not change between releases, so they are cached. If you are running a long running process, you may wish to periodically dump the cache by calling L{Service.flush}, or simply get a new Service object.
@return dict