; - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
;
; Park-Miller-Carta Pseudo-Random Number Generator
;
; For Microchip.com dsPIC DSP microcontroller.
;
; Copyright public domain Robin Whittle 2005 Sept 21
;
; 31 bit pseudo-random number generator based on:
;
; Lehmer (1951)
; Lewis, Goodman & Miller (1969)
; Park & Miller (1983)
;
; implemented according to the optimisation suggested by David G. Carta
; in 1990 which uses 32 bit math and does not require division.
; Park and Miller rejected Carta's approach in 1993. Carta provided no
; code examples. Carta's approach produces identical results to Park
; and Miller's code.
;
; Copyright public domain . . . *but*:
;
; * Please leave the comments intact so inquiring minds have a chance of
; * understanding how this implementation works and chasing the
; * references to see the strengths and limitations of this particular
; * pseudo-random number generator.
;
; Output is a 31 bit unsigned integer. The range of values output is
; 1 to 2,147,483,646 and the seed must be in this range too. The
; output sequence repeats in a loop of this length = (2^31 - 2).
;
; The output stream has some predictable patterns. For instance, after
; a very low output, the next one or two outputs will be relatively low
; (compared to the 2 billion range) because the multiplier is only 16,807.
; Linear congruential generators are not suitable for cryptography or
; simulation work (such as Monte Carlo Method), but they are probably
; fine for many uses where the output is sound or vision for human
; perception.
;
; The particular generator implemented here:
;
; New-value = (old-value * 16807) mod 0x7FFFFFFF
;
; is probably the best studied linear congruentual PRNG. It is not the very
; best, but it is far from the worst.
;
; For the background on this implementation, and the Park Miller
; "Minimal Standard" linear congruential PRNG, please see:
;
; http://www.firstpr.com.au/dsp/rand31/
;
; Stephen K. Park and Keith W. Miller
; Random Number Generators: Good Ones are Hard to Find
; Communications of the ACM, Oct 1988, Vol 31 Number 10 1192-1201
;
; David G. Carta
; Two Fast Implementations of the "Minimal Standard" Random Number Generator
; Communications of the ACM, Jan 1990, Vol 33 Number 1 87-88
;
; George Marsaglia; Stephen J. Sullivan; Stephen K. Park, Keith W. Miller,
; Paul K. Stockmeyer
; Remarks on Choosing and Implementing Random Number Generators
; Communications of the ACM, Jul 1993, Vol 36 Number 7 105-110
;
; http://random.mat.sbg.ac.at has lots of material on PRNG quality.
;
;
; The sequence of values this PRNG should produce includes:
;
; Result Number of results after seed of 1
;
; 16807 1
; 282475249 2
; 1622650073 3
; 984943658 4
; 1144108930 5
; 470211272 6
; 101027544 7
; 1457850878 8
; 1458777923 9
; 2007237709 10
;
; 925166085 9998
; 1484786315 9999
; 1043618065 10000
; 1589873406 10001
; 2010798668 10002
;
; 1227283347 1000000
; 1808217256 2000000
; 1140279430 3000000
; 851767375 4000000
; 1885818104 5000000
;
; 168075678 99000000
; 1209575029 100000000
; 941596188 101000000
;
;
; Carta refers to two registers p (15 bits) and q (31 bits) which
; together hold the 46 bit multiplication product:
;
; | | | |
; 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
; 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210
;
; q 31 qqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq
; p 15 pp pppp pppp pppp p
;
; The maximum 46 bit result occurs
; when the seed is at its highest
; allowable value: 0x7FFFFFFE.
;
; 0x20D37FFF7CB2
;
; which splits up like this
;
; q 31 111 1111 1111 1111 0111 1100 1011 0010
; p 15 10 0000 1101 0011 0
; = 100 0001 1010 0110
;
; In hex, these maxiumum values are:
;
; q 31 7FFF7CB2 = 2^31 - (2 * 16807)
; p 15 41A6 = 16807 - 1
;
;
; The task is to combine the two partial products p and q as if they were
; both parts of a 46 bit number, with the final result being modulo:
;
; 0111 1111 1111 1111 1111 1111 1111 1111
;
; when we are actually only doing 32 bits at a time.
;
; Here I explain David G. Carta's trick - in a different and much simpler
; way than he does.
;
; We need to deal with the p bits "pp pppp pppp pppp p" shown above.
; These bits carry weights of bits 45 to 31 in the multiplication product
; of the usual Park Miller algorithm.
;
; David Carta writes that in order to calculate mod(0x7FFFFFFF) of the
; complete multiplication product (taking into account the total value
; of p and q) we should simply add the bits of p into the bit positions
; 14 to 0 of q and then do a mod(0x7FFFFFFF) on the result!
;
; | | | |
; 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
; 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210
;
; 31 qqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq
; 15 + ppp pppp pppp pppp
; = Cxxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
;
; Highest possible value,
; for q, with a value for
; p which would allow it:
;
; 7FFFFFFF 111 1111 1111 1111 1111 1111 1111 1111
; + 41A5 100 0001 1010 0101
; = 8000041A4 1000 0000 0000 0000 0100 0001 1010 0100
;
; The result can't be larger than 2 * 0x7FFFFFFF = 0xFFFFFFFE. So when we
; do the modulus operation, we will have to subtract either nothing or just
; one 0x7FFFFFFF. With this model of addition, the subtraction only
; occurs very rarely.
;
; David Carta's explanation for why this produces the correct answer is too
; long to repeat here. Mine is easy to understand.
;
; Lets define some labels:
;
; Q = 31 bits 30 to 0.
; P = 15 bits 14 to 0.
;
; If we were doing 46 bit math, the multiplication product (seed * 16807)
; would be:
;
; Q
; + (P * 0x80000000)
;
; Observe that this is the same as:
;
; Q
; + (P * 0x7FFFFFFF)
; + (P * 0x00000001)
;
; However, we don't need or want a 46 bit result. We only want that result
; mod(0x7FFFFFFF). Therfore we can ignore the middle line above and use for
; our result:
;
; Q
; + P
;
; dsPIC-specific details:
;
; Here is how we do the additions, and their carries. This is somewhat
; different from the detail of David Carta's approach, but we get the same
; results.
;
; We generate the partial results lo and hi as shown below, with
; two multiplies. :
;
; lo (31 bits) = seed low (16 bits) * 16807 (15 bits)
; hi (30 bits) = seed high (15 bits) * 16807 (15 bits)
;
; | | | |
; 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
; 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210
;
; / W1 | W0 \
; lo 31 0xxx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
; hi 30 00xx xxxx xxxx xxxx xxxx xxxx xxxx xxxx
; \ W3 | W2 /
;
; 16807 is 0x41A7. The maximum results are for the maximum allowable
; seed value of 0x7FFFFFFE:
;
; lo = 0xFFFE * 0x41A7 = 0x41A67CB2
; hi = 0x7FFF * 0x41A7 = 0x20D33E59
;
; | | | |
; 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
; 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210
;
; lo 31 0100 0001 1010 0110 0111 1100 1011 0010
; hi 30 0010 0000 1101 0011 0011 1110 0101 1001
;
; Note these bits of hi: ^^^^ ^^^^ ^^^^ ^^^^
; could be all 1s, or nearly
; all 1s, when the higher bits
; have lower values than those
; above.
;
; Now consider where the split is between p and q in David Carta's model:
;
; | | | |
; 4444 4444 3333 3333 3322 2222 2222 1111 1111 11
; 7654 3210 9876 5432 1098 7654 3210 9876 5432 1098 7654 3210
;
; / W1 | W0 \
; lo 31 0100 0001 1010 0110 0111 1100 1011 0010
; hi 30 0010 0000 1101 0011 0011 1110 0101 1001
; qqq qqqq qqqq qqqq qqqq qqqq qqqq qqqq
; pp pppp pppp pppp p
; \ W3 | W2 /
;
; According to David Carta's principle, we need to treat the p bits totally
; differently from the q bits. They are beyond the 31 bit range of the
; final result. In our physical implementation, we have the 30 q bits and
; the lowest p bit in register pair "lo". But we also have some q bits in
; register pair "hi" along with the rest of the p bits. We have two tasks:
;
; 1 - Add the q bits in "hi" into register pair "lo", starting at bit 16.
; This can produce an overflow beyond this 32 bit register pair, into
; the Carry flag. We must then treat that Carry bit as the same weight
; as the least significant p bit. (Alternatively, do a mod(0x7FFFFFFF)
; operation on the result, which clears bit 31 and increments the
; register. That increment has the same function as adding 1 to the p
; bits.
;
; 2 - Get the p bits and add them to register pair "lo", starting at bit 0.
;
; Doing it in this order is OK, but we have two potential carry situations.
; My approach does it in the opposite order. First we add the p bits as
; shown above (not including any carry from adding low "hi" bits to high
; "lo" bits - we haven't done that yet) into the low end of "lo". The
; exact way we do this is a bit of a trick. We Shift Left W2, to put its
; bit 15, which is a 'p' bit, into the Carry flag. Then we Rotate Left
; with Carry W3. This means that W3 contains all the 'p' bits, ready
; to add to register pair "lo". W2 is still in a shifted state:
;
; W2 = qqqq qqqq qqqq qqq0
;
; but we will sort that out in a moment.
;
; We add the p bits in W3 to W0. This may cause a carry, which needs to
; be propagated into W1. We do in a moment. Now, we restore W2 to:
;
; W2 = 0qqq qqqq qqqq qqqq
;
; We need to add W2 into W1, and we also have the Carry bit to add - so
;
; W1 = W1 + W2 + Carry bit
;
; Then we are ready to do the mod(0x7FFFFFFF) operation which produces
; the final result.
;
;
; This subroutine, on average uses 18 clock cycles including the 2 cycle
; call and the 3 cycle return. It could be rewritten to be faster as
; inline code, rather than a subroutine, and by using two Wx registers for
; the seed, rather than the two file registers RAND31L and RAND31H.
;
; dsPICs can run at 30 MHz, which means they can create well over a
; million results a second. Power consumption at 30 MHz 5 volts is
; typically 730mW (145 mA for the 64 pin dsPIC30F6012) or 335mW
; (67mA 28 pin dsPIC30F2010).
;
; There must be two words of RAM:
;
; ; 31 bit value, 16 bits in
; ; low word and 15 bits in
; ; high word. Must be
; ; initialised to between 1
; ; and 0x7FFFFFFE inclusive.
; RAND31L: .space 2 ;
; RAND31H: .space 2 ;
;
; This subroutine uses the registers:
;
; W0 = Constant 16807.
; Then lo bits 15 to 0.
; Finally, returns the lower 16 bits of the new
; pseudo-random number.
;
; W1 = First, lo bits 30 to 16.
; Then returns the upper 15 bits of the new
; pseudo-random number.
;
; W3-W2 = hi
;
; W12 = pointer to save a CPU cycle by making RAND31H/L
; available for the multiply operations without
; the two CPU cycles required for copying their
; contents into registers. See below for getting
; rid of the use of this register and so making the
; code require 1 more CPU cycle.
;
RAND31: ;
; W0 = 16807 = 7^5.
mov #16807, W0 ;
; W12 points to RAND31H to save
; a cycle loading the high and low
; parts of the seed. Alternatively
; don't use W12 and instead use:
;
; ; Calculate hi.
; mov RAND31H, W1 ;
; mul.uu W0, W1, W2 ;
; ; Calculate lo.
; mov RAND31L, W1 ;
; mul.uu W0, W1, W0 ;
;
mov #RAND31H, W12 ;
; Calculate hi.
mul.uu W0, [W12--], W2 ;
; Calculate lo.
mul.uu W0, [W12], W0 ;
;
;
; W3-W2 = 00pp pppp pppp pppp pqqq qqqq qqqq qqqq
;
; We need to get W3 to the state of:
;
; 0ppp pppp pppp pppp
;
; Shift left W2 to put its 'p' bit
; into the carry bit.
sl W2, W2 ;
; W2 = qqqq qqqq qqqq qqq0
;
; Now Rotate Left with Carry W3 to get
; that p bit into bit 0.
rlc W3, W3 ;
; W3 = 0ppp pppp pppp pppp
;
; Restore W2 with 0 in bit 15.
lsr W2, W2 ;
; W2 = 0qqq qqqq qqqq qqqq
;
; Add with carry W3 into W0. This may
; set the carry bit. We must propagate
; the carry bit into W1.
addc W0, W3, W0 ;
;
; Propagate the carry into W1 with an
; addc whilst also adding in W2. This
; may set bit 15, but it won't
; overflow into the carry bit.
addc W2, W1, W1 ;
;
; We need to do a mod(0x7FFFFFF) on
; this.
;
; (Note that this is really a test
; for < 0x7FFFFFFF rather than
; <= 0x7FFFFFFF. However, the
; sequence contains numbers only up
; to 0x7FFFFFFE.)
;
; If this propagation of carry sets
; bit 31 of lo (W1 bit 15) then the
; result has exceeded 0x7FFFFFFF and
; we need to subtract 0x7FFFFFFF by
; clearing that bit (like subtracting
; 0x8000000), and incrementing W1-W0.
;
; Detect this condition by the N flag
; which will be set if bit 15 of W1 is
; set. 75% of the time, the N flag
; is clear, so we want to make that
; path of execution as short as .
; possible. The way to do this is
; have one set of exit code here, and
; the same instructions following the
; code which handles the overflow.
;
; 9 CPU cycles so far, not counting
; call instruction.
bra N, RAND31C ;
; Store the result to RAND31H and
; RAND31L. Leave results in W1 - W0
; for the calling code to use.
mov W1, RAND31H ;
mov W0, RAND31L ;
return ;
;
; 75% of the time, the N flag is
; clear, so there is 1 cycle for the
; bra instruction, 2 more for the
; next two instructions and 3 cycles
; for the return.
;
; 25% of the time, the N flag is
; set, so the bra has 2 cycles, plus
; 5 cycles for the first five
; instructions at RAND31C plus 3 for
; the return.
;
; On average, this is:
;
; (0.75 * 6) + (0.25 * 10)
; = 4.5 + 2.5
; = 7 cycles on average.
;
; This is 16 cycles on average, plus
; 2 for the call = 18 CPU cycles
; total subroutine call overhead.
;
RAND31C: ;
; Subtract 0x7FFFFFFF from the result
; by clearing bit 31 (like subtracting
; or adding 0x80000000) and then
; adding 1.
bclr W1, #15 ;
inc W0, W0 ;
addc #0, W1 ;
; Store the result to RAND31H and
; RAND31L. Leave results in W1 - W0
; for the calling code to use.
mov W1, RAND31H ;
mov W0, RAND31L ;
return ;
;
;
; --------------------------------