InterMine Items XML Overview
InterMine items XML is a generic format that encodes data the matches InterMine class definitions.
Here, the root element is always <items>.
Within <items> each object has id within a separate <item> element.
Each <item> has an id with the format <NAMESPACE_SUBID>. For simple cases, the namespace can always be '0'. These IDs are used to signify connections between items within the item XML file - once the data is loaded into InterMine its own serial IDs are used instead and these Item XML ids disappear.
The child elements of an <item> are either
- <attribute> - this has the name of the attribute (matching the defined class name) and a value
- <reference> - where the property is a reference to some other item by its Items XML id.
- <collection> - this is a collection of <reference>s
Example scripts used to generate InterMine Items XML can be found at intermine_items_example.pl.
The data formats required for attributes in InterMine Items XML for the most part are fairly obvious and match internal Java types (e.g. strings are UTF-8, doubles are 64-bit IEEE 754 floating point).
One exception is the format required for Dates. InterMine allows this to be expressed in 3 different ways.
- As the number of seconds since the Unix epoch.
- In the string format 'yyyy-MM-dd HH:mm:ss', assuming UTC.
- In the string format 'yyyy-MM-dd', assuming UTC.
If parsing fails for all these formats then InterMine will throw a RuntimeException.
InterMine Items XML can either be generated directly in your favourite programming language, or there are a number of language-specific APIs that can generate it, and handle issues like Item XML allocation and referencing automatically.