The current indexing uses the following items:
- Subject >> S
- Predicate >> P
- Object >> O
- Meta >> M
There are 6 ordered indexes. Together these indexes make one of the possible sets of indexes which allows any group of statements to be found as a contiguous group within one of the constituent indexes. There are 24 such sets (see discussion on Paul's blog on August 4 and August 5, 2004). The set of indexes chosen for Mulgara is:
- SPOM
- POSM
- OSPM
- MSPO
- MPOS
- MOSP
These indexes have the following properties:
- All elements are treated symmetrically, with no bias reflecting usage patterns.
- Any specification for S, P, O and M can be found as a contiguous group within one index. The specified elements are first grouped, and then the corresponding index whose first elements match that group is used.
e.g. looking for predicates and meta on a specified subject and object (show all relationships in all graphs between two individuals), means that S and O are specified. The index with S and O in the first 2 elements is OSPM. - They are all tree indexes, with ordering according to the letters labelling them.
e.g. SPOM is ordered first by subject, then by predicate, object and finally meta. - Contiguous groups can always be found with a pair of O(log(N)) searches, where N is the size of the data set.
- Any discovered group of statements will be in only one of the 24 possible orderings, where that ordering is specified by the given index.
- Counting the size of a group may have O(log(N)) complexity (current implementation is linear).
In addition to these properties, we have 2 additional properties based on the choice of AVL trees for the tree structure:
- Additions to a known point in the tree takes place in constant time.
- Removals from a known point in the tree takes place in O(log(N)) time.
