Trie Finally

Yep, I know, tries are nothing new.  They are a very good way of storing sorted collections with order-N insertion and lookup times.  There are many implementations of tries available in most of the common languages (and probably the uncommon ones too, but I did not look too closely).  There are, however two problems with the existing solutions:

  1. They all store strings: – the sequence of values used to define the path to the stored value are always the characters of a string converted to integer values.
  2. They all store strings: all anybody ever stores in the trie are strings.  More specifically the same strings as they used to define the path to the storage position.

Now storing strings indexed against their contents is fine – you get an order-N sorted string storage mechanism.  And if that is all that you want then go for it.  But a trie could be so much more.  To prove this I have implemented a variant of the trie where I have broken from tradition in two ways:

  1. You do not have to store strings: the sequence of integers used to define the position in the trie is specified with an Iterator<Integer>.  You can easily derive this iterator from a string (I have a class for that) and use the string as a key, but this is not all you can do.  You could derive the iterator from a date, and yield year, month day, hour, minute, secon, millisecond to quickly sort and retreive by date (I will soon have a class for that).  Or you could define your own mapping as long as you can generate a sequence of integers from it.
  2. You do not have to store strings: Each position in the trie can store an instance of your desired type – trie is generic.
  3. The underlying structure is exposed: (yep, point three of two) By exposing the underlying structure some operations become faster, and the semantics of addition and removal become more flexible.

So if you want to store your Customer instances sorted against surname, or date of birth you can now do so with order-N insertion, lookup and removal in a trie (for name, or for date, or one of each).  You can use the tries to extract all customers with surnames beginning with ‘Smi’, or all customers born in january 1987.

And it is simple – there are really only three methods to think of.

Three methods? that sounds too simple – I exagerate for dramtic purposes – there are several other methods to support query and iteration, but the essential methods are:

  • public IndexTrie<T> putTrie(Iterator<Integer>) : returns the trie at the position specified, creating and inserting a new one if necessary.
  • public IndexTrie<T> getTrie(Iterator<Integer>) : returns the trie at the position specified or null if there is not one there.
  • public void remove() : removes the specified trie and all its children.

So if you are storing a list of Foo at each position you do not need to get the value and if null insert the value paying the navigation cost twice.  You can ‘put’ the trie at the position and modify its contents directly – paying the navigation cost once.

Other methods allow you to iterate the tries or values from any parent (and as we know from all those string implementation this gives us the members with a given prefix).

The code is available under Apache V2 licence.

DataFlow Modelling

DataFlow Modelling is a very old technique, at least in the computing world.  It seems to have fallen out of favour, replaced by more trendy state models or object models.  This is a shame as data flow diagrams are so simple and intutitive that they can be used without explanation, even with people that have not encountered them before.  Try that with a State model or an object model…….

One reason that data flow diagrams are not used so often is that the tools to draw them do not seem to be available readily.  To remedy this I have written a simple DataFlowModeller in Java for all to use.  It requires Java 8 to run.

Some notes here on how to use the tool.  Let us start to define a calculator:

The first task is to create a ‘Context Diagram’. This defines the way the calculator will interact with its environment. The large circle in the middle represents the program.

  • ‘Long press’ on the large circle and when the dialogue pops up choose the name ‘Caculator’.
  • Right click on the empty space and choose ‘New External’, select a name for it (e.g. User)and press ‘ok’. The user of the program is now represented in the design.
  • Click on the user to select it.  See that an arrow appears on the right hand side.  Drag the arrow to the ‘Calculator’ circle until the arrow and the circle glow green. Drop the arrow.
  • Name the data flow ‘Sums’.  You now show the user presenting sums to the calculator.
  • Now create a new external called ‘Screen’
  • Create a flow from the calculator to the screen, and call it results.

If you want to make a flow into a pleasing curve, then select it and drag part of the line.  This will create a ‘way-point’ in the flow that affects the curve of the line.  You can have as may way-points as you like in a flow.  You can remove a way-point by making it unnecessary. Do this by moving it to a position where it is on a straight line between the points either side. It will then disappear .

When you have designed the way that your calculator will interact with the world you can drill down to the detail of what goes on inside it.  To do this double click on the circle.  You will initially see a screen with a collection of arrows.  (These arrow may be sitting on top of each other, so you may need to drag them to different locations).  The arrows are the flows entering and leaving ‘Calculator’.

You can leave this view and go back to the context diagram either by double-clicking the empty background, or by pressing the up arrow in the menu bar.

In the exploded view you can add transforms (things that act on and change data) and data stores (places that the data rests, like files and databases).  You can connect these with flows to the entering and leaving flows.

Any transform you add can be ‘opened’ by double clicking and fleshed out in detail……