Perfect Steganography

Steganography is the practice of concealing a file, message, image, or video within another file, message, image, or video. https://en.wikipedia.org/wiki/Steganography

But perfect steganography?  What would that look like?  If we have a carrier (in which the message is hidden) and a payload (the message hidden in the carrier) then we would want the following chracteristics:

  • Undetectable:  Examinination of the carrier should give no clue as to whether there is a payload hidden in it.
  • Unextractable: Even if the carrier is known to contain a payload it should be impossible to extract the payload from it.
  • Unremarkable: The carrier should be of a common type, so that the presence of the file does not indicate a payload.

I have given much thought to this because I think these goals could be approached very closely using jpg format as a carrier.  Instantly we satify the third requirement – jpgs are everywhere.  Incredible numbers are transferred across the internet every day, and even more reside on phones and tablets and personal computers.  The presence of a jpg on a device would not indicate that there is a payload in the jpg.

The second point can be addressed by using ‘standard’ encryption routines such as AES128.

At present, there is no known practical attack that would allow someone without knowledge of the key to read data encrypted by AES when correctly implemented. https://en.wikipedia.org/wiki/Advanced_Encryption_Standard

So what about the first point?  How can we hide a message in a jpg file so that no-one can tell that it is there?  Well, that is possible because of the very nature of jpgs.

An image on your computer is usually stored as an array of pixels, with three bytes for each pixel (red, green and blue).  This is very easy for computers to manipulate, but takes up a large amount of space.  For example a 20 mega-pixel image would require 60 megabytes.  This is far too big to send across the internet, or to store on your device if you have a large number.  So jpg format was invented to reduce the size of the file that is stored.

Some image formats (eg png) are considered lossless – the data is compressed but nothing is lost.  Unfortunately this does not reduce the size enough.  So jpg reduces the size by throwing data away.  The more data we throw away the smaller the image – the trick is to throw away the least useful data.  It would be no good to keep the top left corner of your photo and throw away the rest – chances are that the bit you like is somewhere near the middle.  No, data is thrown away by reducing the quality of the image all across the image.

You can imagine a knob that you could turn, the more you turn it the more detail is lost, but the smaller the jpg file gets.  You keep turning the knob until you find a compromise that ou are happy with and save the resulting jpg.  If you have two versions of the same image, one with a lot of detail and one with a little detail no-one would assume that there was a message in one and not the other.   It just looks like one image with two different compression settings.

Interestingly, the quality varies across an image.  If you have an expanse of blue sky, the subtle details of shadinbg in it are not really picked up by the human eye.  So the compression algorithm throws away more data.  If you have lots of edges – a traction engine maybe – then the algorithm throws away less data.  So the amount of data thrown away varies across the image.  And here we have the medium for our message.  If we can create a pattern in the amount of data that is thrown away – a little more here, a little less there, we can add a signal to the image.

We need to do this in a subtle manner, or our image will have wildly varying quality which will be visible to the human eye, and quite likely to the computer scanner.  But if we do do this carefully I believe that we can achieve the first of our bullet points.

I have an example of an application that does just this: SmugglerMac.  It is written for the MacOS platform (for reasons that will become obvious in later posts), but could easily be adapted for other platforms.  I hope to write a series of posts explaining its implementation.

OK, so there is more to Tries than I thought

I thought that I had done with tries….. but I was wrong.  I chanced upon a subtle refinement which makes the solution even more general. To save repeating myself I will assume that you have read my previous posts about tries.

Instead of indexing the tries with an Iterator<Integer> I have modified the code to use Iterator<Comparable>.  It still works with Iterator<Integer> as Integer implements Comparable.  The advantages are significant.  For example you could index sentences by the contained words…. it was this question in StackOverflow that prompted me.

Now you can create tries that are indexed by any type at all as long as that type implements comparable.  Ordered lists of playing cards, steps in martial arts routines, key sequences on a keyboard – your imagination is the only limit.  What you get is an order-N insertion/retrieval of a sorted set.

 

Trie Finally

Yep, I know, tries are nothing new.  They are a very good way of storing sorted collections with order-N insertion and lookup times.  There are many implementations of tries available in most of the common languages (and probably the uncommon ones too, but I did not look too closely).  There are, however two problems with the existing solutions:

  1. They all store strings: – the sequence of values used to define the path to the stored value are always the characters of a string converted to integer values.
  2. They all store strings: all anybody ever stores in the trie are strings.  More specifically the same strings as they used to define the path to the storage position.

Now storing strings indexed against their contents is fine – you get an order-N sorted string storage mechanism.  And if that is all that you want then go for it.  But a trie could be so much more.  To prove this I have implemented a variant of the trie where I have broken from tradition in two ways:

  1. You do not have to store strings: the sequence of integers used to define the position in the trie is specified with an Iterator<Integer>.  You can easily derive this iterator from a string (I have a class for that) and use the string as a key, but this is not all you can do.  You could derive the iterator from a date, and yield year, month day, hour, minute, secon, millisecond to quickly sort and retreive by date (I will soon have a class for that).  Or you could define your own mapping as long as you can generate a sequence of integers from it.
  2. You do not have to store strings: Each position in the trie can store an instance of your desired type – trie is generic.
  3. The underlying structure is exposed: (yep, point three of two) By exposing the underlying structure some operations become faster, and the semantics of addition and removal become more flexible.

So if you want to store your Customer instances sorted against surname, or date of birth you can now do so with order-N insertion, lookup and removal in a trie (for name, or for date, or one of each).  You can use the tries to extract all customers with surnames beginning with ‘Smi’, or all customers born in january 1987.

And it is simple – there are really only three methods to think of.

Three methods? that sounds too simple – I exagerate for dramtic purposes – there are several other methods to support query and iteration, but the essential methods are:

  • public IndexTrie<T> putTrie(Iterator<Integer>) : returns the trie at the position specified, creating and inserting a new one if necessary.
  • public IndexTrie<T> getTrie(Iterator<Integer>) : returns the trie at the position specified or null if there is not one there.
  • public void remove() : removes the specified trie and all its children.

So if you are storing a list of Foo at each position you do not need to get the value and if null insert the value paying the navigation cost twice.  You can ‘put’ the trie at the position and modify its contents directly – paying the navigation cost once.

Other methods allow you to iterate the tries or values from any parent (and as we know from all those string implementation this gives us the members with a given prefix).

The code is available under Apache V2 licence.

Trie Trie again

I finally got to refactor the Trie.  It bugged me that the interface was a bit complicated.  Then I realised that the Trie was doing two distinct jobs and that these could be separated.

The first job the trie was doing was creating a hierarchy of elements that stored objects according to a sequence of offsets.  In the case of a string, these offsets were the values of the characters in the string.

The second job was the creation of the sequence of offsets.  In the case of the string this is easy – iterate the characters of the string.  But in other circumstances we may wish to do differently.  For example dates could be represented as a series of integers: year, month, day, hour, minute, second.

Separating these jobs has simplified the interface, and made the code more versatile.  The sequence of indices is conveniently represented by a standard Iterator<Integer>.  For a string this is trivial:

public final class TrieStringIndexer implements Iterator<Integer> {
    private final String string;
    private int index = 0;

    public TrieStringIndexer(final String string) {
        this.string = string;
    }

    @Override
    public final boolean hasNext() {
        return this.index < this.string.length();
    }

    @Override
    public final Integer next() {
        return this.string.charAt(this.index++);
    }
}

Now with a string indexer we can store an object:

IndexTrie trie = new MutableIndexTrie();
Iterator<Integer> indexer = new TrieStringIndexer("abc etc");
trie.put(indexer, new Object());

You noticed the word ‘mutable’ did you?  IndexTrie is an abstract class with two concrete subclasses, one mutable one immutable.  The immutable version will never change existing data, but will return a new trie containing the same data as the original.  Any unchanged structure from the original is kept unchanged in the new trie.  The mutable one is, of course,  faster than the immutable one.

While we are on the subject of performance, I implemented a StringTrie – an ordered collection of strings implementing the Set interface. I compared the performance of this to the performance of TreeSet<String> which is the nearest equivalent standard class.  StringTrie was generally faster – up to five times as fast.

As usual the code is available at https://bitbucket.org/drphill/trie

Trie Again

I have been doing some more thinking (and, of course, coding) about tries. 

I have realised, somewhat belatedly, that the important thing about a trie is the way it uses a series of integers (in my case) to navigate a tree of nodes to reach a point unique to every different series of integers.  What gets stored there is irrelevant to the trie but important to the user.  So a trie is best thought of as an implementor of a map (java.util.Map in my case).  The tricky thing is that the supplied key is not used directly, but that the creator of the map has to provide a means of converting the key to a series of integers.

I have implemented TrieMap in this way. 

public final class TrieMap<K, V> implements Map<K, V> {

So that a TrieMap<K,V> can be used anywhere Map<K,V> is required.  The tricky bit is informing thee TrieMap how to turn instances of K into a series of integers.  This has been done by supplying the constructor with an instance of TriePathIteratorFactory, thus:

public TrieMap(final TriePathIteratorFactory<K> pathIteratorFactory)

Where a TriePathIteratorFactory is responsible solely for generating instances of TriePathIterators:

public interface TriePathIteratorFactory<K> {
     public TriePathIterator<K> pathIterator(K k);
}

And the important part of the TriePathIterator is the generation of the series of integers:

public interface TriePathIterator<K> {

/**
* @return true if the receiver has more elements in it path, false otherwise
*/
public abstract boolean hasNext();

public abstract int nextIndex();

}

Now when the user calls a standard method from java.util.Map that takes a key (K), the TrieMap creates a TriePathIterator instance and uses it to traverse the data structure.

Using the underlying structure from the TrieMap I have implemented a TrieSet – simply a subclass of TrieMap<K,V> where K === V.  The performance exceeds that of the supplied java.util.TreeSet.  I will provide some hard data in another post, but the work-in-progress is available from the git repository on the ‘map’ branch.

Generic Java Trie

I have been looking at Tries again.  Thats right ‘Trie’.  Wikipedia has an excellent article.  Tries are used to sort data (usually strings) that can be found/retrieved/removed quickly. 

The idea is simple: imagine the list of names in your address book.  One page for all the names starting with ‘A’, one for all the names starting with ‘B’ and so on. 26 pages, and we can jump straight to the right page instead of looking at each name in turn.  But what if there are lots of names on the ‘S’ page? Simple, we create a section on the page for every second letter ‘(S)A’, ‘(S)B’ and so on.  Now when we have a name starting with ‘SM’ we jump to the ‘S’  page, look at the second letter and jump to the ‘M’ section.  Now imagine doing this for every letter – we would have lots of pages each with only one name on it.  That is a ‘trie’.

Why would we want to use a trie?  Well they are very quick to find names, or remove names, and if we work our way through all the pages we get an alphabetic list of all the names.

Why might we not want to use a trie? Well, for one they can be slower when we are adding names as we have a lot more work to do.  And for  small numbers of names finding and removing are not much faster than simple lists.  They also use more memory – though for most modern machines this may not amount to a significant difference.

The slower addition is the main drawback, but as usual things are not as simple.  If you do not want your list of names sorted, or you are happy to sort them each time you get the list, then this slower addition may rule out using tries. But if you want to get the sorted list often then the cost of sorting makes the decision much closer.  My initial experiments in Java showed that adding elements to a normal list and then sorting it took about the same time as adding the same elements to a trie.  Which is not surprising, since the task is about the same in each case.  The difference being that sorting subsequent additions to a list is a lot slower than adding extra elements to a trie.  And again this is obvious.  If I have a list of 10000 names and add 100, a list may have to sort 10100, but a trie only has to find the right place for each of the 100.

I have implemented a Trie  (licenced under GPL3) in Java (git repository) that adopts the java.util.Collection interface.  This means that it can be used interchangeably in your app wherever you have already used another collection.

I have a bunch of things I still want to do, and maybe I will get round to them:

  • Publish some performance figures.  Currently for large sets find/remove are about 1000 times faster than java.util.ArrayList.  But these times and the insertion costs are dependent upon how ‘dense’ the data is.
  • Demonstrate the use of the Trie to store data other than strings, for example Event objects with a comment String and Date, sorted by date component eg year,month,day,hour, minute, second,millisecond.
  • Sort out what to do with elements with duplicate paths – are they the same, or should they be added in a list?

State Modeling

State modeling is another very old software design technique.  It aims to model the way behaviour changes as a result of external input.

Now, the word ‘State’ is used in many ways when talking about software, and perhaps needs some definition first.  One way to think of  ‘state’ is as the perceived state of the system as judged from outside, perhaps called aggregate state.  Since facile analogies help here is one.  When the car is in the state ‘engine off’ then no amount of pressing the accelerator will get the car to move.  It does not matter if the handbrake is on, or the wheels are pointing straight ahead.  The handbrake setting and the wheel direction may be part of a complete description of the state of the car, but they matter little to us when deciding what to do next.  The ‘aggregate state’ is ‘engine off’ and there is only one trigger that will get us to the state of ‘engine running’.

Did you see that? I snuck in a new term – ‘trigger’.  A trigger is anything that causes (or could cause) the system to change state.  State modeling is about describing how the system responds to triggers.  Different combinations of trigger and state will have different results.  When a document editor is in the ‘editing’ state it will respond to key presses (‘triggers’) by changing the document,  but when it is in read-only mode it will ignore the same triggers (key presses).  Of course there is a special trigger that moves the editor from ‘read-only’ state to ‘editing’ state, and another that moves it back.  So you can see that triggers can change the state of the system, and the state determines its response to the triggers.

I have written a simple tool to illustrate this:  State Modeler.  It is written in java and requires Java 8 or higher.

The thing that I find most fascinating about state modeling is that, within a limited domain, a state model can be such a complete description that the behaviour can be automatically converted to code.  State Modeler does this, just to prove my contention, and will even generate a small testing class to demonstrate.

I will describe how to use the State Modeler in another post.  Please feel free to try it out before then if you wish.

 

 

DataFlow Modelling

DataFlow Modelling is a very old technique, at least in the computing world.  It seems to have fallen out of favour, replaced by more trendy state models or object models.  This is a shame as data flow diagrams are so simple and intutitive that they can be used without explanation, even with people that have not encountered them before.  Try that with a State model or an object model…….

One reason that data flow diagrams are not used so often is that the tools to draw them do not seem to be available readily.  To remedy this I have written a simple DataFlowModeller in Java for all to use.  It requires Java 8 to run.

Some notes here on how to use the tool.  Let us start to define a calculator:

The first task is to create a ‘Context Diagram’. This defines the way the calculator will interact with its environment. The large circle in the middle represents the program.

  • ‘Long press’ on the large circle and when the dialogue pops up choose the name ‘Caculator’.
  • Right click on the empty space and choose ‘New External’, select a name for it (e.g. User)and press ‘ok’. The user of the program is now represented in the design.
  • Click on the user to select it.  See that an arrow appears on the right hand side.  Drag the arrow to the ‘Calculator’ circle until the arrow and the circle glow green. Drop the arrow.
  • Name the data flow ‘Sums’.  You now show the user presenting sums to the calculator.
  • Now create a new external called ‘Screen’
  • Create a flow from the calculator to the screen, and call it results.

If you want to make a flow into a pleasing curve, then select it and drag part of the line.  This will create a ‘way-point’ in the flow that affects the curve of the line.  You can have as may way-points as you like in a flow.  You can remove a way-point by making it unnecessary. Do this by moving it to a position where it is on a straight line between the points either side. It will then disappear .

When you have designed the way that your calculator will interact with the world you can drill down to the detail of what goes on inside it.  To do this double click on the circle.  You will initially see a screen with a collection of arrows.  (These arrow may be sitting on top of each other, so you may need to drag them to different locations).  The arrows are the flows entering and leaving ‘Calculator’.

You can leave this view and go back to the context diagram either by double-clicking the empty background, or by pressing the up arrow in the menu bar.

In the exploded view you can add transforms (things that act on and change data) and data stores (places that the data rests, like files and databases).  You can connect these with flows to the entering and leaving flows.

Any transform you add can be ‘opened’ by double clicking and fleshed out in detail……

Enjoy