Trie Finally

Yep, I know, tries are nothing new.  They are a very good way of storing sorted collections with order-N insertion and lookup times.  There are many implementations of tries available in most of the common languages (and probably the uncommon ones too, but I did not look too closely).  There are, however two problems with the existing solutions:

  1. They all store strings: – the sequence of values used to define the path to the stored value are always the characters of a string converted to integer values.
  2. They all store strings: all anybody ever stores in the trie are strings.  More specifically the same strings as they used to define the path to the storage position.

Now storing strings indexed against their contents is fine – you get an order-N sorted string storage mechanism.  And if that is all that you want then go for it.  But a trie could be so much more.  To prove this I have implemented a variant of the trie where I have broken from tradition in two ways:

  1. You do not have to store strings: the sequence of integers used to define the position in the trie is specified with an Iterator<Integer>.  You can easily derive this iterator from a string (I have a class for that) and use the string as a key, but this is not all you can do.  You could derive the iterator from a date, and yield year, month day, hour, minute, secon, millisecond to quickly sort and retreive by date (I will soon have a class for that).  Or you could define your own mapping as long as you can generate a sequence of integers from it.
  2. You do not have to store strings: Each position in the trie can store an instance of your desired type – trie is generic.
  3. The underlying structure is exposed: (yep, point three of two) By exposing the underlying structure some operations become faster, and the semantics of addition and removal become more flexible.

So if you want to store your Customer instances sorted against surname, or date of birth you can now do so with order-N insertion, lookup and removal in a trie (for name, or for date, or one of each).  You can use the tries to extract all customers with surnames beginning with ‘Smi’, or all customers born in january 1987.

And it is simple – there are really only three methods to think of.

Three methods? that sounds too simple – I exagerate for dramtic purposes – there are several other methods to support query and iteration, but the essential methods are:

  • public IndexTrie<T> putTrie(Iterator<Integer>) : returns the trie at the position specified, creating and inserting a new one if necessary.
  • public IndexTrie<T> getTrie(Iterator<Integer>) : returns the trie at the position specified or null if there is not one there.
  • public void remove() : removes the specified trie and all its children.

So if you are storing a list of Foo at each position you do not need to get the value and if null insert the value paying the navigation cost twice.  You can ‘put’ the trie at the position and modify its contents directly – paying the navigation cost once.

Other methods allow you to iterate the tries or values from any parent (and as we know from all those string implementation this gives us the members with a given prefix).

The code is available under Apache V2 licence.