MONTE CARLO (MC) METHODS 75 property that when |x| is large enough, x · ∂S > 1. (2.68) ∂x In this situation, it is possible to generate samples of x according to p(x) by using Langevin’s equation [10]. If S has the above contraction property and some smoothness requirements [86, 106], which usually turn out to be physically reasonable, a Langevin process x(t) given by the following equation will converge to a stationary state: dx(t) = − 1 ∂S dt + dw(t), (2.69) 2 ∂x where w is a Brownian motion, and the stationary distribution will be p(x). Equation (2.69) follows from the Fokker–Planck equation (see [10], equation 11.3.15) and (2.79) below. What are we to make of w? Its properties are as follows. 1. The probability density for w satisfies the heat equation: ∂p(w, t) 1 = p(w, t), ∂t 2 where is the Laplace operator in n-dimensions. That is n ∂2p p = i=1 ∂wi2 . 2. The t = 0 initial distribution for p is p(x, t = 0) = δ(x) 3. Its increments dw (when t → t + dt) satisfy the equations: Edwi(t) = 0, Edwi(t) dwj(s) = δijδ(s − t) dt ds. In small but finite form, these equations are E∆wi(tk) = 0, (2.70) E∆wi(tk)∆wj (tl) = hδij δkl, (2.71) where h is the time step h = ∆t. In these equations, δij = 1, if i = j and zero otherwise: the δij are the elements of the identity matrix in n-dimensions; and δ(x) is the Dirac delta function—zero everywhere except x = 0, but δ(x) dnx = 1. Item (1) says w(t) is a diffusion process, which by item (2) means w(0) starts at the origin and diffuses out with a mean square E|w|2 = n · t. Item (3) says increments dw are only locally correlated.
76 APPLICATIONS There is an intimate connection between parabolic (and elliptic) partial differ- ential equations and stochastic differential equations. Here is a one-dimensional version for the case of a continuous Markov process y, p(y|x, t) = transition probability of x → y, after time t. (2.72) The important properties of p(y|x, t) are p(y|x, t) dy = 1, (2.73) p(z|x, t1 + t2) = p(z|y, t2)p(y|x, t1) dy, (2.74) where (2.73) says that x must go somewhere in time t, and (2.74) says that after t1, x must have gone somewhere (i.e. y) and if we sum over all possible ys, we will get all probabilities. Equation (2.74) is known as the Chapman– Kolmogorov–Schmoluchowski equation [10] and its variants extend to quantum probabilities and gave rise to Feynman’s formulation of quantum mechanics, see Feynman–Hibbs [50]. We would like to find an evolution equation for the transition probability p(y|x, t) as a function of t. To do so, let f (y) be an arbitrary smooth function of y, say f ∈ C∞. Then using (2.74) f (y) ∂p(y|x, t) dy ∂t = lim 1 f (y) (p(y|x, t + ∆t) − p(y|x, t)) dy, ∆t→0 ∆t = lim 1 f (y) p(y|z, ∆t)p(z|x, t) dz − p(y|x, t) dy. ∆t→0 ∆t For small changes ∆t, most of the contribution to p(y|z, ∆t) will be near y ∼ z, so we expand f (y) = f (z) + fz(z)(z − y) + 1 fzz (z )(y − z)2 + · · · to get 2 f (y) ∂p(y|x, t) dy ∂t 1 (f (z)p(y|z, ∆t)p(z|x, t) dz) dy = lim ∆t→0 ∆t + ∂f (z) (y − z)p(y|z, ∆t)p(z|x, t) dz dy ∂z + 1 ∂2f (z) (y − z)2p(y|z, ∆t)p(z|x, t) dz dy 2 ∂z2 + O((y − z)3) − f (y)p(y|x, t) dy . (2.75)
MONTE CARLO (MC) METHODS 77 If most of the contributions to the z → y transition in small time increment ∆t are around y ∼ z, one makes the following assumptions [10] p(y|z, ∆t)(y − z) dy = b(z)∆t + o(∆t), (2.76) p(y|z, ∆t)(y − z)2 dy = a(z)∆t + o(∆t), (2.77) p(y|z, ∆t)(y − z)m dy = o(∆t), for m > 2. Substituting (2.76) and (2.77) into (2.75), we get using (2.73) and an integration variable substitution, in the ∆t → 0 limit f (z) ∂p(z|x, t) dz = ∂f 1 ∂2f ∂t p(z|x, t) ∂z b(z) + 2 ∂z2 a(z) dz, = ∂ 1 ∂2 f (z) − ∂z [b(z)p(z|x, t)] + 2 ∂z2 [a(z)p(z|x, t)] dz. (2.78) In the second line of (2.78), we integrated by parts and assumed that the large |y| boundary terms are zero. Since f (y) is perfectly arbitrary, it must be true that ∂p(y|x, t) = − ∂ [b(y)p(y|x, t)] + 1 ∂2 [a(y)p(y|x, t)]. (2.79) ∂t ∂y 2 ∂y2 Equation (2.79) is the Fokker–Planck equation and is the starting point for many probabilistic representations of solutions for partial differential equations. Gener- alization of (2.79) to dimensions greater than one is not difficult. The coefficient b(y) becomes a vector (drift coefficients) and a(y) becomes a symmetric matrix (diffusion matrix). In our example (2.69), the coefficients are b(x) = − 1 ∂S 2 ∂x and a(x) = I, the identity matrix. The Fokker–Planck equation for (2.69) is thus ∂p 1 ∂ · ∂S ∂p . = p+ ∂t 2 ∂x ∂x ∂x And the stationary state is when ∂p/∂t → 0, so as t → ∞ ∂S ∂p p + = constant = 0, ∂x ∂x
78 APPLICATIONS if p goes to zero at |x| → ∞. Integrating once more, p(x, t → ∞) = e−S(x) , Z where 1/Z is the normalization constant, hence we get (2.68). How, then, do we simulate Equation (2.69)? We only touch on the easiest way here, a simple forward Euler method: √ ∆w = hξ, ∆x = − h ∂S + ∆w, (2.80) 2 ∂x k xk+1 = xk + ∆x. At each timestep of size h, a new vector of mutually independent, univariate, and normally distributed random numbers ξ is generated each by the Box–M√uller method (see Equation (2.66)) and the increments are simulated by ∆w = hξ. The most useful parallel form is to compute a sample N of such vectors, i = 1, . . . , N per time step: √ ∆w[i] = hξ[i], ∆x[i] = − h ∂S [i] + ∆w[i], 2 ∂x k xk[i+] 1 = x[ki] + ∆x[i]. (2.81) If S is contracting, that is, (2.68) is positive when |x| is large, then x(t) will converge to a stationary process and Ef in (2.67) may be computed by a long time average [118] Ef ≈ 1T f (x(t)) dt, T0 1 mN xk[i] . (2.82) ≈f mN k=1 i=1 Two features of (2.82) should be noted: (1) the number of time steps chosen is m, that is, T = m · h; and (2) we can accelerate the convergence by the sample average over N . The conver√gence rate, measured by the variance, is O(1/mN ): the statistical error is O(1/ mN ). Process x(t) will become stationary when m is large enough that T = h · m will exceed the relaxation time of the dynamics: to lowest order this means that relative to the smallest eigenvalue λsmall > 0 of the Jacobian [∂2S/∂xi∂xj], T λsmall 1.
MONTE CARLO (MC) METHODS 79 There are several reasons to find (2.82) attractive. If n, the size of the vectors x and w, is very large, N = 1 may be chosen to save memory. The remaining i = 1 equation is also very likely to be easy to parallelize by usual matrix methods for large n. Equation (2.80) is an explicit equation—only old data xk appears on the right-hand side. The downside of such an explicit formulation is that Euler methods can be unstable for a large stepsize h. In that situation, higher order methods, implicit or semi-implicit methods, or small times steps can be used [86, 106, 124]. In each of these modifications, parallelism is preserved. To end this digression on Langevin equations (stochastic differential equations), we show examples of two simple simulations. In Figure 2.21, we show 32 simulated paths of a two-dimensional Brownian motion. All paths begin at the origin, and the updates per time step take the following form. Where, w = w1 , w2 the updates at each time step are √ ξ1 . wk+1 = wk + h ξ2 The pair ξ are generated at each step by (2.66) and h = 0.01. 10 5 y 0 –5 –10 –5 0 5 10 –10 x Fig. 2.21. Simulated two-dimensional Brownian motion. There are 32 simulated paths, with timestep h = 0 .1 and 100 steps: that is, the final time is t = 100 h.
80 APPLICATIONS 104 dx = –(x/|x|) dt + dw, t = 100 [expl. TR, x0 = 0, N = 10,240, h = 0.01] 103 Frequency 102 101 100 10–1 –5 0 5 Process value x(t) Fig. 2.22. Convergence of the optimal control process. The parameters are N = 10 , 240 and time step h = 0 .01 . Integration was by an explicit trapezoidal rule [120]. And finally, in Figure 2.22, we show the distribution of a converged optimal control process. The Langevin equation (2.80) for this one-dimensional example is dx = −sign(x) dt + dw. It may seem astonishing at first to see that the drift term, −sign(x) dt, is suf- ficiently contracting to force the process to converge. But it does converge and to a symmetric exponential distribution, p ∼ exp(−2|x|). Again, this is the Fokker–Planck equation (2.79) which in this case takes the form: ∂p ∂ 1 ∂2 −→ = [sign(x)p ] + p, 0. ∂t ∂x 2 ∂x2 t→∞ The evolution ∂p/∂t → 0 means the distribution becomes stationary (time inde- pendent), and p = e−2|x| follows from two integrations. The first is to a constant in terms of x = ±∞ boundary values of p and p , which must both be zero if p is normalizable, that is, p(x, t) dx = 1. The second integration constant is determined by the same normalization. Exercise 2.1 MC integration in one-dimension. As we discussed in Section 2.5, MC methods are particularly useful to do integrations. However, we also pointed out that the statistical error is proportional to N −1/2, where N is the sample size. This statistical error can be large since N −1/2 = 1 percent
MONTE CARLO (MC) METHODS 81 when N = 104. The keyword here is proportional. That is, although the statistical error is k · N −1/2 for some constant k, this constant can be reduced significantly. In this exercise, we want you to try some variance reduction schemes. What is to be done? The following integral π 1 e−x cos ζ dζ I0(x) = π 0 is known to have a solution I0(x), the zeroth order modified Bessel function with argument x, see, for example, Abramowitz and Stegun [2], or any comparable table. 1. The method of antithetic variables applies in any given domain of integra- tion where the integrand is monotonic [99]. Notice that for all positive x, the integrand exp(−x cos ζ) is monotonically decreasing on 0 ≤ ζ ≤ π/2 and is symmetric around ζ = π/2. So, the antithetic variates method applies if we integrate only to ζ = π/2 and double the result. For x = 1, the antithetic variates method is I0(1) ≈ I+ + I− 1 N e− cos πui/2 + 1 N = NN e− cos π(1−ui)/2. i=1 i=1 Try the I+ + I− antithetic variates method for I0(1) and compute the variance (e.g. do several runs with different seeds) for both this method and a raw integration for the same size N —and compare the results. 2. The method of control variates also works well here. The idea is that an integral 1 I = g(u) du 0 can be rewritten for a control variate φ as 11 I = (g(u) − φ(u)) du + φ(u) du, 00 1N ≈ [g(ui) − φ(ui)] + Iφ, N i=1 where Iφ is the same as I except that g is replaced by φ. The method consists in picking a useful φ whose integral Iφ is known. A good one for the modified Bessel function is φ(u) = 1 − cos(πu/2) + 1 (cos(πu/2))2 , 2
Search
Read the Text Version
- 1
- 2
- 3
- 4
- 5
- 6
- 7
- 8
- 9
- 10
- 11
- 12
- 13
- 14
- 15
- 16
- 17
- 18
- 19
- 20
- 21
- 22
- 23
- 24
- 25
- 26
- 27
- 28
- 29
- 30
- 31
- 32
- 33
- 34
- 35
- 36
- 37
- 38
- 39
- 40
- 41
- 42
- 43
- 44
- 45
- 46
- 47
- 48
- 49
- 50
- 51
- 52
- 53
- 54
- 55
- 56
- 57
- 58
- 59
- 60
- 61
- 62
- 63
- 64
- 65
- 66
- 67
- 68
- 69
- 70
- 71
- 72
- 73
- 74
- 75
- 76
- 77
- 78
- 79
- 80
- 81
- 82
- 83
- 84
- 85
- 86
- 87
- 88
- 89
- 90
- 91
- 92
- 93
- 94
- 95
- 96
- 97
- 98
- 99
- 100
- 101
- 102
- 103
- 104
- 105
- 106
- 107
- 108
- 109
- 110
- 111
- 112
- 113
- 114
- 115
- 116
- 117
- 118
- 119
- 120
- 121
- 122
- 123
- 124
- 125
- 126
- 127
- 128
- 129
- 130
- 131
- 132
- 133
- 134
- 135
- 136
- 137
- 138
- 139
- 140
- 141
- 142
- 143
- 144
- 145
- 146
- 147
- 148
- 149
- 150
- 151
- 152
- 153
- 154
- 155
- 156
- 157
- 158
- 159
- 160
- 161
- 162
- 163
- 164
- 165
- 166
- 167
- 168
- 169
- 170
- 171
- 172
- 173
- 174
- 175
- 176
- 177
- 178
- 179
- 180
- 181
- 182
- 183
- 184
- 185
- 186
- 187
- 188
- 189
- 190
- 191
- 192
- 193
- 194
- 195
- 196
- 197
- 198
- 199
- 200
- 201
- 202
- 203
- 204
- 205
- 206
- 207
- 208
- 209
- 210
- 211
- 212
- 213
- 214
- 215
- 216
- 217
- 218
- 219
- 220
- 221
- 222
- 223
- 224
- 225
- 226
- 227
- 228
- 229
- 230
- 231
- 232
- 233
- 234
- 235
- 236
- 237
- 238
- 239
- 240
- 241
- 242
- 243
- 244
- 245
- 246
- 247
- 248
- 249
- 250
- 251
- 252
- 253
- 254
- 255
- 256
- 257
- 258
- 259
- 260
- 261
- 262
- 263
- 264
- 265
- 266
- 267
- 268
- 269
- 270
- 271
- 272
- 273
- 274
- 275
- 276
- 277
- 278