Trie Trie again

I finally got to refactor the Trie.  It bugged me that the interface was a bit complicated.  Then I realised that the Trie was doing two distinct jobs and that these could be separated.

The first job the trie was doing was creating a hierarchy of elements that stored objects according to a sequence of offsets.  In the case of a string, these offsets were the values of the characters in the string.

The second job was the creation of the sequence of offsets.  In the case of the string this is easy – iterate the characters of the string.  But in other circumstances we may wish to do differently.  For example dates could be represented as a series of integers: year, month, day, hour, minute, second.

Separating these jobs has simplified the interface, and made the code more versatile.  The sequence of indices is conveniently represented by a standard Iterator<Integer>.  For a string this is trivial:

public final class TrieStringIndexer implements Iterator<Integer> {
    private final String string;
    private int index = 0;

    public TrieStringIndexer(final String string) {
        this.string = string;
    }

    @Override
    public final boolean hasNext() {
        return this.index < this.string.length();
    }

    @Override
    public final Integer next() {
        return this.string.charAt(this.index++);
    }
}

Now with a string indexer we can store an object:

IndexTrie trie = new MutableIndexTrie();
Iterator<Integer> indexer = new TrieStringIndexer("abc etc");
trie.put(indexer, new Object());

You noticed the word ‘mutable’ did you?  IndexTrie is an abstract class with two concrete subclasses, one mutable one immutable.  The immutable version will never change existing data, but will return a new trie containing the same data as the original.  Any unchanged structure from the original is kept unchanged in the new trie.  The mutable one is, of course,  faster than the immutable one.

While we are on the subject of performance, I implemented a StringTrie – an ordered collection of strings implementing the Set interface. I compared the performance of this to the performance of TreeSet<String> which is the nearest equivalent standard class.  StringTrie was generally faster – up to five times as fast.

As usual the code is available at https://bitbucket.org/drphill/trie

Leave a Reply

Your email address will not be published. Required fields are marked *