#include #include #include using namespace std; // Compile with the GCC C++ compiler: // // g++ rand31-park-miller-carta.cc -o rand31-pmc // rand31-park-miller-carta.cc // --------------------------- // // Robin Whittle rw@firstpr.com.au 2005 September 21 // // For more information and updates: http://www.firstpr.com.au/dsp/rand31/ // // This code contains two separate implementations of the 1988 Park Miller // "minimal standard" 31 bit PRNG (Pseudo-Random Number Generator): // // rand31pm The conventional Park Miller "Integer Version 1" algorithm // but implemented rather slowly in double-precision floating // point to ensure accuracy and portability. // // This is intended as a reference. // // rand31pmc My own fast integer implementation of David G. Carta's // approach to this PRNG, using only 32 bit integer maths // and no divisions. // // My aim is to show that the two are functionally identical, to explain // Carta's implementation, and to provide code which can be used by others. // // Please grab the rand31pmc class and use it in your own projects. A // separate pair of .c and .h files are available for incorporation into // ANSI C projects. // // When running the C versions of the Carta implementation and the // plain floating point Park Miller aglorithm, on an 800MHz Pentium III with // no compiler optimisation, the Carta version produced 13 million results a // second (61 CPU clock-cycles per result) and the floating point version // was about 1/4 this speed. // // The two classes below implement the same multiplicative linear // congruential PRNG (pseudo-random number generator). This type of // PRNG is not suitable for crytpography and does not produce sequences // long enough or random enough for most simulation purposes. // // Most of the detailed analytical work on PRNGs has gone into those // suitable for crypto and simulation. PRNGs well respected in these // fields - such as the Mersenne Twister for simulation - are slow, tricky, // or require excessive memory compared with a linear congruentual // generator. // // This leaves people who are searching for a fast, well studied, linear // congruential PRNG looking at a varied and often troublesome bunch of // stuff, including algorithms written up in textbooks or incorporated in // libraries, many of which produce poor results. Since it can be hard to // recognise the problems in a PRNG, it is best to use one which is well // studied and respected. // // Unfortunately, linear congruential generators have many bad algorithms // and/or bad implementations. Most publicly available implementations // of the good algorithms involve division. (The major exception I know // of is Ray Gardner's rg_rand.c, which implements the Carta algorithm.) // // // There is a widely used 32 bit integer implementation of Park-Miller's // "minimal standard" PRNG, using Shrage's approach: Park Miller's // "Integer Version 2". This requires a division. I don't have this // implementation here. The "Integer Version 1" implementation exists // to provide the reference sequence of pseudo-random numbers by which we // test the second implementation. // // // The second class "rand31pmc" is functionally identical to, and was // inspired by, a snippet of C code by Ray Gardner which dates from as // early as 1995 (according to Google searches in 2005): // // http://c.snippets.org/snip_lister.php?fname=rg_rand.c // // This cites Carta, and implements Carta's approach, but has no explanation // of how this approach works. Carta's paper contains a far more complex // explanation than the one I give below. // // I don't have contact details for Ray Gardner, or any of the people // mentioned in the references. Eric Raymond mentions he wrote some // hypertext-like code with Ray Gardner in the early 90s, and I think // Ray Gardner may have moderated Usenet comp.lang.c. // // There are two important aims which are achieved both by David Carta's // algorithm, Ray Gardner's code and class "rand31pmc": // // 1 - The code can run with 32 bit math operations - including a 16 x 16 // = 32 bit multiply. // // 2 - There is no need for a division. // // David Carta's 1990 paper gave no code examples. // // Stephen K. Park and Keith W. Miller rejected Carta's algorithm in // 1993 "... the generator as implemented by Carta is ~not~ the minimal // standard; it isn't even a full-period generator. We know of no good // reason to use Carta's generator." // // But here, by compiling and running this program, you can see that // David Carta's algorithm produces identical results to the Park Miller // algorithm. // // Assuming Ray Gardner really did write his code (rather than copy // it from somewhere else), my best guess is that he started off // trying to implement Shrage's approach (as the comments and the // unused Shrage-specific constants suggest) and then made his code // do what David Carta suggested. I guess that since it worked, // he left it at that, without updating the comments and constants. // // My explanation of why Carta's algorithm works is in the code, where it // belongs. Please don't strip the comments out. // // It is reasonable to expect that Carta's approach would run faster // than Shrage's approach in "Integer Version 2" - if only because // there is no division. // // My main interest here is proving that Carta's approach works // properly, because I want to implement it on CPUs such as the dsPIC // which have a slow divide. Most CPUs have a slow divide anyway, as // far as I know, so if you are in a hurry, Carta's approach is // probably the best. // // References: // // Stephen K. Park and Keith W. Miller // Random Number Generators: Good Ones are Hard to Find // Communications of the ACM, Oct 1988, Vol 31 Number 10 1192-1201 // // Like the other two papers, this one is normally only available // from the ACM site via subscription. You should be able to // access this paper electronically or in print at a university // library. You may also be able to find the .PDF wild on the // Net. Search for "p1192-park.pdf". For instance: // // http://www-scf.usc.edu/~csci105/links/p1192-park.pdf // // David F. Carta // Two Fast Implementations of the "Minimal Standard" Random Number Generator // Communications of the ACM, Jan 1990, Vol 33 Number 1 87-88 (p87-carta.pdf) // // George Marsaglia; Stephen J. Sullivan; Stephen K. Park, Keith W. Miller, // Paul K. Stockmeyer // Remarks on Choosing and Implementing Random Number Generators // Communications of the ACM, Jul 1993, Vol 36 Number 7 105-110 (p105-crawford.pdf) // // The following code is public domain. If you use this code, I request that // you keep the comments with it, to save some poor sod from having to figure // out the history behind it. ////////////////////////////////////////////////////////////////////////////// // // rand31pm // // 31 bit Pseudo Random Number Generator based on Park Miller "Integer // Version 1" - but done with double-precision floating point so we are not // concerned with the limits of integer operations. This is not intended for // fast operation - but *maybe* it would be faster than the integer // implementation on some CPUs. // // Methods: // // seedi Set seed with a 31 bit unsigned integer between 1 and // 0x7FFFFFFE inclusive. Don't use 0! // // ranlui Provides the next pseudorandom number as a long unsigned // integer (31 bits). // // ranf Provides the next pseudorandom number as a float between // nearly 0 and nearly 1.0. class rand31pm { // The sole item of state - a 32 bit // integer. long unsigned int seed31; public: // Constructor sets seed31 to 1, // in case no seedi operation is // used. rand31pm() {seed31 = 1;} // Declare methods. void seedi(long unsigned int); long unsigned int ranlui(void); float ranf(void); private: // nextrand() // // Generate next pseudo-random number. // Multiplier constant = 16807 = 7^5. // Park and Miller in 1993 recommend // 48271. #define consta 16807 // #define consta 48271 // Modulus constant = 2^31 - 1 = // 0x7FFFFFFFF. Use .0 to deter compiler // from complaining about a very large // integer constant. #define constm 2147483647.0 long unsigned int nextrand() { double const a = consta; double const m = constm; // This is the linear congrentual // generator: // // Multiply the old seed by constant a // and take the modulus of the result // (the remainder of a division) by // constant m. seed31 = (long)(fmod((seed31 * a), m)); return (seed31); } }; ///////////////////////////////////////// // // Implementations of the methods. // seedi() // // Set the seed from a long unsigned // integer. If zero is used, then // the seed will be set to 1. void rand31pm::seedi(long unsigned int seedin) { if (seedin == 0) seedin = 1; seed31 = seedin; } // ranlui() // // Return next pseudo-random value as // a long unsigned integer. long unsigned int rand31pm::ranlui(void) { return nextrand(); } // ranf() // // Return next pseudo-random value as // a floating point value. float rand31pm::ranf(void) { return (nextrand() / constm); } ////////////////////////////////////////////////////////////////////////////// ////////////////////////////////////////////////////////////////////////////// // // // rand31pmc // // Robin Whittle 2005 September 20 // // 31 bit pseudo-random number generator based on: // // Lehmer (1951) // Lewis, Goodman & Miller (1969) // Park & Miller (1983) // // implemented according to the optimisation suggested by David G. Carta // in 1990 which uses 32 bit math and does not require division. // Park and Miller rejected Carta's approach in 1993. Carta provided no // code examples. Carta's approach produces identical results to Park // and Miller's code. // // Copyright public domain . . . *but*: // // * Please leave the comments intact so inquiring minds have a chance of // * understanding how this implementation works and chasing the // * references to see the strengths and limitations of this particular // * pseudo-random number generator. // // Output is a 31 bit unsigned integer. The range of values output is // 1 to 2,147,483,646 and the seed must be in this range too. The // output sequence repeats in a loop of this length = (2^31 - 2). // // The output stream has some predictable patterns. For instance, after // a very low output, the next one or two outputs will be relatively low // (compared to the 2 billion range) because the multiplier is only 16,807. // Linear congruential generators are not suitable for cryptography or // simulation work (such as Monte Carlo Method), but they are probably // fine for many uses where the output is sound or vision for human // perception. // // The particular generator implemented here: // // New-value = (old-value * 16807) mod 0x7FFFFFFF // // is probably the best studied linear congruentual PRNG. It is not the very // best, but it is far from the worst. // // For the background on this implementation, and the Park Miller // "Minimal Standard" linear congruential PRNG, please see: // // http://www.firstpr.com.au/dsp/rand31/ // // Stephen K. Park and Keith W. Miller // Random Number Generators: Good Ones are Hard to Find // Communications of the ACM, Oct 1988, Vol 31 Number 10 1192-1201 // // David G. Carta // Two Fast Implementations of the "Minimal Standard" Random Number Generator // Communications of the ACM, Jan 1990, Vol 33 Number 1 87-88 // // George Marsaglia; Stephen J. Sullivan; Stephen K. Park, Keith W. Miller, // Paul K. Stockmeyer // Remarks on Choosing and Implementing Random Number Generators // Communications of the ACM, Jul 1993, Vol 36 Number 7 105-110 // // http://random.mat.sbg.ac.at has lots of material on PRNG quality. // // // The sequence of values this PRNG should produce includes: // // Result Number of results after seed of 1 // // 16807 1 // 282475249 2 // 1622650073 3 // 984943658 4 // 1144108930 5 // 470211272 6 // 101027544 7 // 1457850878 8 // 1458777923 9 // 2007237709 10 // // 925166085 9998 // 1484786315 9999 // 1043618065 10000 // 1589873406 10001 // 2010798668 10002 // // 1227283347 1000000 // 1808217256 2000000 // 1140279430 3000000 // 851767375 4000000 // 1885818104 5000000 // // 168075678 99000000 // 1209575029 100000000 // 941596188 101000000 // // 1207672015 2147483643 // 1475608308 2147483644 // 1407677000 2147483645 // 1 2147483646 // 16807 2147483647 // // Carta refers to two registers p (15 bits) and q (31 bits) which // together hold the 46 bit multiplication product: // // | | | | // 4444 4444 3333 3333 3322 2222 2222 1111 1111 11 // 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210 // // q 31 qqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq // p 15 pp pppp pppp pppp p // // The maximum 46 bit result occurs // when the seed is at its highest // allowable value: 0x7FFFFFFE. // // 0x20D37FFF7CB2 // // which splits up like this // // q 31 111 1111 1111 1111 0111 1100 1011 0010 // p 15 10 0000 1101 0011 0 // = 100 0001 1010 0110 // // In hex, these maxiumum values are: // // q 31 7FFF7CB2 = 2^31 - (2 * 16807) // p 15 41A6 = 16807 - 1 // // // The task is to combine the two partial products p and q as if they were // both parts of a 46 bit number, with the final result being modulo: // // 0111 1111 1111 1111 1111 1111 1111 1111 // // when we are actually only doing 32 bits at a time. // // Here I explain David G. Carta's trick - in a different and much simpler // way than he does. // // We need to deal with the p bits "pp pppp pppp pppp p" shown above. // These bits carry weights of bits 45 to 31 in the multiplication product // of the usual Park Miller algorithm. // // David Carta writes that in order to calculate mod(0x7FFFFFFF) of the // complete multiplication product (taking into account the total value // of p and q) we should simply add the bits of p into the bit positions // 14 to 0 of q and then do a mod(0x7FFFFFFF) on the result! // // | | | | // 4444 4444 3333 3333 3322 2222 2222 1111 1111 11 // 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210 // // 31 qqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq // 15 + ppp pppp pppp pppp // = Cxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx // // Highest possible value, // for q, with a value for // p which would allow it: // // 7FFFFFFF 111 1111 1111 1111 1111 1111 1111 1111 // + 41A5 100 0001 1010 0101 // = 8000041A4 1000 0000 0000 0000 0100 0001 1010 0100 // // The result can't be larger than 2 * 0x7FFFFFFF = 0xFFFFFFFE. So when we // do the modulus operation, we will have to subtract either nothing or just // one 0x7FFFFFFF. With this model of addition, the subtraction only // occurs very rarely. // // David Carta's explanation for why this produces the correct answer is too // long to repeat here. Mine is easy to understand. // // Lets define some labels: // // Q = 31 bits 30 to 0. // P = 15 bits 14 to 0. // // If we were doing 46 bit math, the multiplication product (seed * 16807) // would be: // // Q // + (P * 0x80000000) // // Observe that this is the same as: // // Q // + (P * 0x7FFFFFFF) // + (P * 0x00000001) // // However, we don't need or want a 46 bit result. We only want that result // mod(0x7FFFFFFF). Therfore we can ignore the middle line above and use for // our result: // // Q // + P // // This is a lot snappier than using a division, as the Schrage technique // requires. // // Methods: // // seedi Set seed with a 31 bit unsigned integer between 1 and // 0x7FFFFFFE inclusive. Don't use 0! // // ranlui Provides the next pseudorandom number as a long unsigned // integer (31 bits). // // ranf Provides the next pseudorandom number as a float between // nearly 0 and nearly 1.0. class rand31dc { // The sole item of state - a 32 bit // integer. long unsigned int seed31; public: // Constructor sets seed31 to 1, in case // no seedi operation is used. rand31dc() {seed31 = 1;} // Declare methods. void seedi(long unsigned int); long unsigned int ranlui(void); float ranf(void); private: // nextrand() // // Generate next pseudo-random number. // Multiplier constant = 16807 = 7^5. // This is 15 bits. // // Park and Miller in 1993 recommend // 48271, which they say produces a // somewhat better quality of // pseudo-random results. // // 48271 can't be used with the // following implementation of Carta's // algorithm, since it is 16 bits and // would result in bit 31 potentially // being set in lo in the first // multiplication. (A similar problem // occurs later with the higher bits of // hi.) #define consta 16807 // Modulus constant = 2^31 - 1 = // 0x7FFFFFFF. We use this explicitly // in the code, rather than define it // somewhere, because this is a value // which must not be changed and should // always be recognised as a zero // followed by 31 ones. long unsigned int nextrand() { // Two 32 bit integers for holding // parts of the (seed31 * consta) // multiplication product which would // normally need a 46 bit word. // // lo 31 bits 30 - 0 // hi 30 bits 45 - 16 // // These overlap in their value. // // Bit 0 of hi has the same weight in // the result as bit 16 of lo. long unsigned int hi, lo; // lo = 31 bits: // // low 16 bits (15-0) of seed31 // * 15 bit consta lo = consta * (seed31 & 0xFFFF); // hi = 30 bits: // // high 15 bits (30-16) of seed31 // * 15 bit consta hi = consta * (seed31 >> 16); // The new pseudo-random number is the // 46 bit product mod(0x7FFFFFF). Our // task is to calculate it with 32 // bit words and maths, and without // division. // // The first section is easy to // understand. We have a bunch of // bits - bits 14 to 0 of hi - // which overlap with and carry the // same weight as bits 30 to 16 of // lo. // // Add the low 15 bits of hi into // bits 30-16 of lo. lo += (hi & 0x7FFF) << 16; // The result may set bit 31 of lo, but // it will not overflow lo. // // So far, we got some of our total // result in lo. // // The only other part of the result // we need to deal with is bits // 29 to 15 of hi. // // These bits carry weights of bits // 45 to 31 in the value of the // multiplication product of the usual // Park-Miller algorithm. // // David Carta writes that in order // to get the mod(0x7FFFFFF) of the // multiplication product we should // simply add these bits into the // bit positions 14 to 0 of lo. // lo += hi >> 15; // // In order to be able to get away with // this, and to perform the following // simple mod(0x7FFFFFFF) operation, // we need to be sure that the result // of the addition will not exceed: // // 2 * 0x7FFFFFFF = 0xFFFFFFFE // // This is assured as per the diagrams // above. // Note that in the vast majority of // cases, lo will be less than // 0x7FFFFFFFF. if (lo > 0x7FFFFFFF) lo -= 0x7FFFFFFF; // lo contains the new pseudo-random // number. Store it to the seed31 and // return it. return ( seed31 = (long)lo ); } }; ///////////////////////////////////////// // // Implementations of the methods. // seedi() // // Set the seed from a long unsigned // integer. If zero is used, then // the seed will be set to 1. void rand31dc::seedi(long unsigned int seedin) { if (seedin == 0) seedin = 1; seed31 = seedin; } // ranlui() // // Return next pseudo-random value as // a long unsigned integer. long unsigned int rand31dc::ranlui(void) { return nextrand(); } // ranf() // // Return next pseudo-random value as // a floating point value. float rand31dc::ranf(void) { return (nextrand() / 2147483647.0); } ////////////////////////////////////////////////////////////////////////////// // Test the two implementations by // running them through their full // cycle of output values - and // by checking that their 10,000th // value is as defined by Park and Miller. int main () { // Instantiate two random number objects. rand31pm ran1pm; rand31dc ran2dc; // Create two 32 bit unsigned integers. long unsigned int test1, test2; // Seed them both to 1. ran1pm.seedi(1); ran2dc.seedi(1); cout << "\nComparing the output of a traditional Park Miller PRNG with the output of a\n"; cout << "faster version, which implements David G. Carta's optimisation.\n"; cout << "See http://www.firstpr.com.au/dsp/rand31/\n\n"; // Run the loop so it goes just a few // than the full 2^32 - 1 cycles. double loopcount; for(loopcount = 1; loopcount < (double)2147483655.0; loopcount++) { // Get a new value from each // generator. test1 = ran1pm.ranlui(); test2 = ran2dc.ranlui(); // Compare the values and spit the // dummy (Aussie term for spitting // out the "pacifier") if they do // not agree. if( test1 != test2) { cout << test1 << " != " << test2 << " Ooooops . . . loopcount = " << loopcount << "\n"; exit(0); } // Display if values == 1. if(test1 == 1) { cout << test1 << " == " << test2 << " == 1. ------------------------- loopcount = " << loopcount << "\n"; } else if(loopcount == 9997) { cout << "\nFor a = 16807, the 10000th value after a seed of 1 should be 1043618065\n"; cout << "as predicted by Park and Miller, 1988, page 1195.\n\n"; } else // Display: // // The first 10 results. // or The last 15 or so results. // or Results 9998 to 10002. // or Every millionth pair of results. if( (loopcount <= 10) ||(loopcount >= 2147483640.0) ||( (loopcount >= 9998) &&(loopcount <= 10002) ) ||(fmod(loopcount, 1000000) == 0) ) { cout << test1 << " == " << test2 << " loopcount = " << (unsigned)loopcount << "\n"; } } return 0; }