#### Complimentary Multiply with Carry: A better way to generate pseudo-random Numbers

by

, April 27th, 2011 at 01:01 AM (22415 Views)
Java's "default" API pseudo-random number generator is a Linear congruential generator, but has a host of problems (such as a fairly short period, and series correlation). For normal day-to-day use, these problems aren't an issue, and the simplicity of this generator makes it a good choice to implement and use. However, when you start entering the domain of Monte Carlo simulations or other stochastic methods which require millions or more "uncorrelated" random numbers, the LCG just isn't good enough.

A better method is the complimentary multiply with carry method. Because this method uses multiple seeds, it's able to have extremely large periods. Details in the math also eliminate many of the problems found in LCG generators, improving the quality of the numbers generated. Also, because the basis of the CMWC is essentially a LCG generator (albeit with some modifications), it's computationally very simple and can be implemented to be very quick (though not quite as quick as a LCG).

So for fun I decided to implement one. Unfortunately, Java doesn't have unsigned support. Because 32-bit seeds and carries were used, this has made division and modulo operations... interesting. Using a rather cool collection of shifts, bitwise operators, and basic addition/subtraction I was able to avoid this problem.

final long t = multiplier * seeds[n] + carry; // carry = t / b (done in an unsigned way) final long div32 = t >>> 32; // div32 = t / (b+1) carry = div32 + ((t & 0xFFFFFFFFL) >= 0xFFFFFFFFL - div32 ? 1L : 0L); // seeds[n] = (b-1)-t%b (done in an unsigned way) seeds[n] = 0xFFFFFFFEL - (t & 0xFFFFFFFFL) - (carry - div32 << 32) - carry & 0xFFFFFFFFL; final long result = seeds[n]; n = n + 1 & r - 1; return (int) (result >>> 32 - bits);

Another issue was thread safety. For some odd reason, Java decided to make the default Random class thread safe. However, this really doesn't make much sense because:

1. If you have a situation where you have multiple threads you'll likely want to have separate random number generators for each thread (if they require one).

2. synchronization hurts the performance of the random number generator significantly (I ran tests where synchronization would increase generation time by 300% or more).

3. If you do want to synchronize with multiple threads on a single generator object, external synchronization works just as well as internal synchronization. Also, it doesn't force you to use the same locking mechanism I would have used (a simply "synchronized").

For these reasons, I chose to remove internal synchronization.

Here's the full code:

package random; import java.util.Random; /** * A complimentary multiply with carry pseudo-random number generator. * <p> * This generator uses a multiplier of 4294966362L, a base of 2^32-1, and lags * for r cycles. * <p> * This implementation also specifically removes thread safety built into * Java.util.Random * <p> * Different multipliers a and number of seed values r can be chosen. A few good * choices are:<br> * r=2048, a=1047570<br> * r=2048, a=1030770<br> * r=1024, a=5555698<br> * r=1024, a=5555698<br> * r=512, a=123484214<br> * r=512, a=123462658<br> * r=256, a=987665442<br> * r=256, a=987662290<br> * r=128, a=987688614<br> * r=128, a=987688302<br> * r=64, a=987657110 (default)<br> * r=64, a=987651206<br> * r=32, a=987655878<br> * r=32, a=987655670<br> * r=16, a=987651182<br> * r=16, a=987651178<br> * r=8, a=987651670<br> * r=8, a=987651386<br> * r=4, a=987654978<br> * r=4, a=987654366 * <p> * Theoretically, the period is p-1=a*(2^32-1)^r * * @author Andrew */ public class CMWC extends Random { /** * */ private static final long serialVersionUID = 812400912355006066L; // hard-code b // private long b = 0xFFFFFFFFL; private long carry; private boolean haveNextNextGaussian; private long multiplier; private int n = 0; private double nextNextGaussian; /** * The number of seeds this generator uses */ private int r; private long[] seeds; /** * Creates a CMWC generator with generated seeds and carry, and default * multipliers. These are based off of the current system time. */ public CMWC() { this(new long[] { System.currentTimeMillis() }, System.nanoTime(), 64, 987657110L); } /** * Creates a CMWC with given seeds. While the generator requires r seeds, * any number can be provided. If fewer than r seeds are provided, then the * rest are generated based off of the provided seeds. If more seeds are * provided, only the first 32 seeds are used. * * @param seeds * given seeds to use. Only r will be used. If fewer (or none) * are provided, the remaining seeds are generated * @param carry * The initial carry value to use * @param r * Number lag cycles to use. Must be a power of 2. * @param multiplier * Multiplier to use. Depending on the r value chosen, different * multipliers are better. See class descriptor for good r and * multiplier value combinations. */ public CMWC(long[] seeds, long carry, int r, long multiplier) { setR(r); setMultiplier(multiplier); this.seeds = new long[r]; if (seeds == null || seeds.length == 0) { seeds = new long[] { System.currentTimeMillis() }; } // this.seeds = new long[r]; this.carry = (carry & 0xFFFFFFFFL) % multiplier; for (int i = 0; i < r; ++i) { if (i < seeds.length) { this.seeds[i] = seeds[i] & 0xFFFFFFFFL; } else { // generate a random seed this.seeds[i] = super.nextInt() & 0xFFFFFFFFL; } if (this.seeds[i] == 0xFFFFFFFFL) { this.seeds[i] = 1L; } } } /** * generates the next random number of bits [1-31] at a time. Not thread * safe. */ @Override protected int next(int bits) { final long t = multiplier * seeds[n] + carry; // carry = t / b (done in an unsigned way) final long div32 = t >>> 32; // div32 = t / (b+1) carry = div32 + ((t & 0xFFFFFFFFL) >= 0xFFFFFFFFL - div32 ? 1L : 0L); // seeds[n] = (b-1)-t%b (done in an unsigned way) seeds[n] = 0xFFFFFFFEL - (t & 0xFFFFFFFFL) - (carry - div32 << 32) - carry & 0xFFFFFFFFL; final long result = seeds[n]; n = n + 1 & r - 1; return (int) (result >>> 32 - bits); } /** * Non-thread safe version of super class's nextGaussian */ @Override public double nextGaussian() { if (haveNextNextGaussian) { haveNextNextGaussian = false; return nextNextGaussian; } else { double v1, v2, s; do { v1 = 2 * nextDouble() - 1; // between -1.0 and 1.0 v2 = 2 * nextDouble() - 1; // between -1.0 and 1.0 s = v1 * v1 + v2 * v2; } while (s >= 1 || s == 0); double multiplier = StrictMath.sqrt(-2 * StrictMath.log(s) / s); nextNextGaussian = v2 * multiplier; haveNextNextGaussian = true; return v1 * multiplier; } } private void setMultiplier(long multiplier) { this.multiplier = multiplier; } private void setR(int r) { if (r <= 0) { r = 256; // invalid r, use default r } else { boolean validR = true; long a = r; while (a != 1 && validR) { if (a % 2 != 0) { // not a power of 2, use default r r = 256; validR = false; } a >>>= 1; } } this.r = r; } }

This code is designed to allow you to plug this code in place of the default java.util.Random class. There are a few things to keep in mind:

1. Because the CMWC algorithm requires multiple seeds and a changing carry, the setSeed() method does nothing (well, technically it will still set the seed in the base Random class, but as far as the outside can see it does nothing). Currently there's no way to modify the seeds or carry, though it's not too difficult to add code to change that. Just keep in mind that all seeds must be smaller than b, and the carry must be smaller than the multiplier.

2. Depending on your application, you can create a CMWC generator with varying r and multiplier values. However, it's important that r be a power of 2 (preferably one of those tabulated), and the multiplier value a must create a prime number p=a*b^r+1, where b is a primitive prime of p. If you don't know what that means, just pick any listed pair of r and multiplier values. b is always fixed at 2^32-1.

3. The bigger r is, the bigger your theoretical period is. However, bigger r's also take up more memory. Speed is unrelated to the size of r. The default is r=64.

For those who are interested in the underlying workings/math of the CMWC algorithm, see http://education.wayne.edu/jmasm/vol2_no1.pdf. There is a paper published in that article by G. Marsaglia describing how the generator works, as well as performs some analysis on the generator.

Performance

There are 3 main categories when determining the performance of a random number generator:

1. The "quality" of the random numbers generated

2. The speed of the generator

3. The period of the generator

The period of the CMWC generator is exceptionally large (if you choose good lag periods and multiplier values). It has been proven to have a period ofp-1=a*b^r. If you need a larger period, simply pick a larger lag period, then find a corresponding multiplier (note that the memory footprint will go up with larger lag periods).

The quality of the random number generator can be tested using several randomness tests. To be honest, I'm not too interested in this one, but I did run this generator through the DieHard test and it passed all 15 tests (if I'm interpreting the results right).

The last test is performance. Looking at the actual calculations, the time it takes to generate another random number is independent of the memory footprint of this generator. Also, there's no complex operations such as exponentiation or difficult division/modulo (thanks to tricky bitwise operations), so the performance ought to be quite good. Indeed it is, I created a micro-profiling test rig to determine what the actual performance of just the generator was. To ensure a "fair" fight, I re-implemented Java's internal generator to remove the synchronization/thread safety before comparing the two. The test consisted of generating 0x4000000 random integers. The time it took to generate the numbers is measured usingSystem.currentTimeMillis(). Generator creation times were not included in this time. The tests were also run 50 times each, and the min, max, and mean were computed.

The results were as follow (times in milliseconds):

Linear Congruential Generator (Java's implementation with thread safety removed)

Mean:170.52 Min:140 Max:195

Complimentary Multiply with Carry Generator

Mean:537.18 Min:521 Max:572

The results show that while a complimentary multiply with carry generator is slower, it's only ~3 times slower. That means on average it took my computer 2.54 nanoseconds to generate a random integer using LCG and 8.005 nanoseconds to generate a random integer using CMWC. Not bad at all.

Some uses for CMWC:

The CMWC algorithm can be used for scientific simulations which require lots of random numbers (Monte Carlo methods). I'm not completely positive, but I'm fairly certain that CMWC is not suitable for cryptography. You can also use the CMWC for generating general random numbers (it's performance is very close to java.util.Random, technically faster because java.util.Random is thread-safe and this isn't, though the non-thread safe LCG generator is faster than a CMWC generator).

Happy coding

legal:

You're free to use this code as you want. I take no responsibility if it doesn't work, though I would like feedback if something isn't working correctly. Some credits/thanks would be nice, though if you don't include it in your product, I won't go out of my way to pursue you.