Hi, don't ask me why, but I have a sequence of 113 values between 0 and 18, and I would like to come up with a mechanism that for time steps 1...113 generates the values in that order (i.e. to fit a function). I tried quite a lot of things and in my eyes the best solution would be to somehow "train" a ... random number generator on the data but I don't really have an idea, so help me please. Don't flame btw, I'm doing research in the field for quite some time now.
What's wrong with just storing them in an array and spitting them out?
The fact that I'm working with a couple of bytes here (not more than 10-15). I can't afford to store an array that big
You can't afford to store 113 values, but you need to represent them somehow? Sounds fun =)
Here's an algorithm for you:
- Take a bunch of undergrads, a couple hundred should do;
- Lock them in a room without food or water
- Promise to let them out once they have an algorithm for you.
On a more serious note though, we need more data to give you ideas. Are the numbers in sequence? What's the range of the values? You've specified some kind of range, but is that in integer, or some higher precision? How much cpu power do you have to waste on this? Do they follow any known pattern, or is this pattern what you wish to find? Have you tried simply compressing the values somehow? What's the environment? How critical is it that you get the exact values, or is there a margin of error? Are lives at stake?
Ok, here's the bigger picture. The values are in a sequence, yes, here they are:
7, 3, 0, 9, 3, 15, 9, 6, 9, 17, 11, 17, 15, 8, 17, 7, 0, 17, 9, 11, 17, 11, 17,6, 10, 4, 5, 16, 6, 10, 9, 5, 3, 4, 18, 0, 11, 4, 5, 4, 5, 16, 16, 12, 17, 2, 1, 7,18, 18, 14, 6, 14, 2, 6, 14, 2, 1, 12, 12, 12, 12, 10, 13, 12, 10, 13, 13, 0, 13, 12, 16, 12, 8, 9, 12, 8, 9, 13, 8, 12, 13, 12, 13, 14, 13, 14, 0, 0, 15, 7, 8, 1, 0, 15, 7, 2, 7, 11, 4, 1, 0, 0, 0, 0, 4, 9, 1, 16, 18, 7, 3, 0
I cannot compress these data any further as they themselfs are compression of something. Values are as you can see in range [0;18]. I have all the CPU for wasting, important are patterns in the data and most importantly a pattern capturing all of the values or in the worst case, two halfs. The environment is a small COM file with couple of 100s bytes. No lives are at stakes. Error margin has to be 0, as correcting errors will break the compression.
Forgot to say that I've tried curve fitting software already, but they just try to adapt known functions to the data resulting in high error rates or more Fourier harmonics than wishful. I'm certain that some modified random number generator could generate the sequence for a given seed, but I'm out of ideas and math to construct a model for learning.
This sequence looks like it has a pretty high entropy. Exactly how much smaller are you hoping to make it? Or does every byte count?
One obvious thing to do is to pack the values together as closely as possible. In a 4-byte integer you can store 7 values in the 0-18 range like this:
int packed = value * 1+
value * 19 +
value * 19 * 19
value * 19 * 19 * 19 * 19 * 19 * 19;
value = (packed / 1) % 19;
value = (packed / 19) % 19;
value = (packed / (19 * 19)) % 19;
value = (packed / (19 * 19 * 19 * 19 * 19 * 19)) % 19;
This way your 113 value sequence could be packed into 68 bytes (plus the decompression code).
Hi, Nick, thanks for the quick and rather informative answer. I indeed toyed around with modulo packing and extraction but I never calculated the byte gain of it. Yes the problem is that every byte counts and with low byte counts every single byte has a very high information content, which is hard to compress accurately since the positions of the bytes in the stream must also be monitored. I'm currently planning to base my bachelor thesis on the matter, for I've already done a massive reasearch on the topic. The best way of course would be to find some kind of procedure that generates the sequence in that order, but the question remains what this routine looks like and how simple it is to implement. Your idea is effective and simple and will reduce the byte count with around 10 bytes. While I'm not striving to optimize with a certain count, In my head it should be possible with 25 or so. Finally, the whole idea is just out of interest, no lives depend on it.
And of course Arithmetic Coding should take care of every bit of entropy encoding potential this sequence has.
OMG, I never considered the most obvious thing, BIG THANKS AND MANY HUGS!!! Let's see if it will work!