Generic Java Trie

I have been looking at Tries again.  Thats right ‘Trie’.  Wikipedia has an excellent article.  Tries are used to sort data (usually strings) that can be found/retrieved/removed quickly. 

The idea is simple: imagine the list of names in your address book.  One page for all the names starting with ‘A’, one for all the names starting with ‘B’ and so on. 26 pages, and we can jump straight to the right page instead of looking at each name in turn.  But what if there are lots of names on the ‘S’ page? Simple, we create a section on the page for every second letter ‘(S)A’, ‘(S)B’ and so on.  Now when we have a name starting with ‘SM’ we jump to the ‘S’  page, look at the second letter and jump to the ‘M’ section.  Now imagine doing this for every letter – we would have lots of pages each with only one name on it.  That is a ‘trie’.

Why would we want to use a trie?  Well they are very quick to find names, or remove names, and if we work our way through all the pages we get an alphabetic list of all the names.

Why might we not want to use a trie? Well, for one they can be slower when we are adding names as we have a lot more work to do.  And for  small numbers of names finding and removing are not much faster than simple lists.  They also use more memory – though for most modern machines this may not amount to a significant difference.

The slower addition is the main drawback, but as usual things are not as simple.  If you do not want your list of names sorted, or you are happy to sort them each time you get the list, then this slower addition may rule out using tries. But if you want to get the sorted list often then the cost of sorting makes the decision much closer.  My initial experiments in Java showed that adding elements to a normal list and then sorting it took about the same time as adding the same elements to a trie.  Which is not surprising, since the task is about the same in each case.  The difference being that sorting subsequent additions to a list is a lot slower than adding extra elements to a trie.  And again this is obvious.  If I have a list of 10000 names and add 100, a list may have to sort 10100, but a trie only has to find the right place for each of the 100.

I have implemented a Trie  (licenced under GPL3) in Java (git repository) that adopts the java.util.Collection interface.  This means that it can be used interchangeably in your app wherever you have already used another collection.

I have a bunch of things I still want to do, and maybe I will get round to them:

  • Publish some performance figures.  Currently for large sets find/remove are about 1000 times faster than java.util.ArrayList.  But these times and the insertion costs are dependent upon how ‘dense’ the data is.
  • Demonstrate the use of the Trie to store data other than strings, for example Event objects with a comment String and Date, sorted by date component eg year,month,day,hour, minute, second,millisecond.
  • Sort out what to do with elements with duplicate paths – are they the same, or should they be added in a list?

Leave a Reply

Your email address will not be published. Required fields are marked *