Home Explore Introduction to parallel computing

Introduction to parallel computing

Published by Willington Island, 2021-07-15 10:44:02

Description: In the last few years, courses on parallel computation have been developed and offered in many institutions in the UK, Europe and US as a recognition of the growing significance of this topic in mathematics and computer science. There is a clear need for texts that meet the needs of students and lecturers and this book, based on the author's lecture at ETH Zurich, is an ideal practical student guide to scientific computing on parallel computers working up from a hardware instruction level, to shared memory machines, and finally to distributed memory machines.

Aimed at advanced undergraduate and graduate students in applied mathematics, computer science, and engineering, subjects covered include linear algebra, fast Fourier transform, and Monte-Carlo simulations, including examples in C and, in some cases, Fortran. This book is also ideal for practitioners and programmers.

Read the Text Version

Pages:

MONTE CARLO (MC) METHODS 75 property that when |x| is large enough, x · ∂S > 1. (2.68) ∂x In this situation, it is possible to generate samples of x according to p(x) by using Langevin’s equation [10]. If S has the above contraction property and some smoothness requirements [86, 106], which usually turn out to be physically reasonable, a Langevin process x(t) given by the following equation will converge to a stationary state: dx(t) = − 1 ∂S dt + dw(t), (2.69) 2 ∂x where w is a Brownian motion, and the stationary distribution will be p(x). Equation (2.69) follows from the Fokker–Planck equation (see [10], equation 11.3.15) and (2.79) below. What are we to make of w? Its properties are as follows. 1. The probability density for w satisﬁes the heat equation: ∂p(w, t) 1 = p(w, t), ∂t 2 where is the Laplace operator in n-dimensions. That is n ∂2p p = i=1 ∂wi2 . 2. The t = 0 initial distribution for p is p(x, t = 0) = δ(x) 3. Its increments dw (when t → t + dt) satisfy the equations: Edwi(t) = 0, Edwi(t) dwj(s) = δijδ(s − t) dt ds. In small but ﬁnite form, these equations are E∆wi(tk) = 0, (2.70) E∆wi(tk)∆wj (tl) = hδij δkl, (2.71) where h is the time step h = ∆t. In these equations, δij = 1, if i = j and zero otherwise: the δij are the elements of the identity matrix in n-dimensions; and δ(x) is the Dirac delta function—zero everywhere except x = 0, but δ(x) dnx = 1. Item (1) says w(t) is a diﬀusion process, which by item (2) means w(0) starts at the origin and diﬀuses out with a mean square E|w|2 = n · t. Item (3) says increments dw are only locally correlated.

76 APPLICATIONS There is an intimate connection between parabolic (and elliptic) partial diﬀer- ential equations and stochastic diﬀerential equations. Here is a one-dimensional version for the case of a continuous Markov process y, p(y|x, t) = transition probability of x → y, after time t. (2.72) The important properties of p(y|x, t) are p(y|x, t) dy = 1, (2.73) p(z|x, t1 + t2) = p(z|y, t2)p(y|x, t1) dy, (2.74) where (2.73) says that x must go somewhere in time t, and (2.74) says that after t1, x must have gone somewhere (i.e. y) and if we sum over all possible ys, we will get all probabilities. Equation (2.74) is known as the Chapman– Kolmogorov–Schmoluchowski equation [10] and its variants extend to quantum probabilities and gave rise to Feynman’s formulation of quantum mechanics, see Feynman–Hibbs [50]. We would like to ﬁnd an evolution equation for the transition probability p(y|x, t) as a function of t. To do so, let f (y) be an arbitrary smooth function of y, say f ∈ C∞. Then using (2.74) f (y) ∂p(y|x, t) dy ∂t = lim 1 f (y) (p(y|x, t + ∆t) − p(y|x, t)) dy, ∆t→0 ∆t = lim 1 f (y) p(y|z, ∆t)p(z|x, t) dz − p(y|x, t) dy. ∆t→0 ∆t For small changes ∆t, most of the contribution to p(y|z, ∆t) will be near y ∼ z, so we expand f (y) = f (z) + fz(z)(z − y) + 1 fzz (z )(y − z)2 + · · · to get 2 f (y) ∂p(y|x, t) dy ∂t 1 (f (z)p(y|z, ∆t)p(z|x, t) dz) dy = lim ∆t→0 ∆t + ∂f (z) (y − z)p(y|z, ∆t)p(z|x, t) dz dy ∂z + 1 ∂2f (z) (y − z)2p(y|z, ∆t)p(z|x, t) dz dy 2 ∂z2 + O((y − z)3) − f (y)p(y|x, t) dy . (2.75)

MONTE CARLO (MC) METHODS 77 If most of the contributions to the z → y transition in small time increment ∆t are around y ∼ z, one makes the following assumptions [10] p(y|z, ∆t)(y − z) dy = b(z)∆t + o(∆t), (2.76) p(y|z, ∆t)(y − z)2 dy = a(z)∆t + o(∆t), (2.77) p(y|z, ∆t)(y − z)m dy = o(∆t), for m > 2. Substituting (2.76) and (2.77) into (2.75), we get using (2.73) and an integration variable substitution, in the ∆t → 0 limit f (z) ∂p(z|x, t) dz = ∂f 1 ∂2f ∂t p(z|x, t) ∂z b(z) + 2 ∂z2 a(z) dz, = ∂ 1 ∂2 f (z) − ∂z [b(z)p(z|x, t)] + 2 ∂z2 [a(z)p(z|x, t)] dz. (2.78) In the second line of (2.78), we integrated by parts and assumed that the large |y| boundary terms are zero. Since f (y) is perfectly arbitrary, it must be true that ∂p(y|x, t) = − ∂ [b(y)p(y|x, t)] + 1 ∂2 [a(y)p(y|x, t)]. (2.79) ∂t ∂y 2 ∂y2 Equation (2.79) is the Fokker–Planck equation and is the starting point for many probabilistic representations of solutions for partial diﬀerential equations. Gener- alization of (2.79) to dimensions greater than one is not diﬃcult. The coeﬃcient b(y) becomes a vector (drift coeﬃcients) and a(y) becomes a symmetric matrix (diﬀusion matrix). In our example (2.69), the coeﬃcients are b(x) = − 1 ∂S 2 ∂x and a(x) = I, the identity matrix. The Fokker–Planck equation for (2.69) is thus ∂p 1 ∂ · ∂S ∂p . = p+ ∂t 2 ∂x ∂x ∂x And the stationary state is when ∂p/∂t → 0, so as t → ∞ ∂S ∂p p + = constant = 0, ∂x ∂x

78 APPLICATIONS if p goes to zero at |x| → ∞. Integrating once more, p(x, t → ∞) = e−S(x) , Z where 1/Z is the normalization constant, hence we get (2.68). How, then, do we simulate Equation (2.69)? We only touch on the easiest way here, a simple forward Euler method: √ ∆w = hξ, ∆x = − h ∂S + ∆w, (2.80) 2 ∂x k xk+1 = xk + ∆x. At each timestep of size h, a new vector of mutually independent, univariate, and normally distributed random numbers ξ is generated each by the Box–M√uller method (see Equation (2.66)) and the increments are simulated by ∆w = hξ. The most useful parallel form is to compute a sample N of such vectors, i = 1, . . . , N per time step: √ ∆w[i] = hξ[i], ∆x[i] = − h ∂S [i] + ∆w[i], 2 ∂x k xk[i+] 1 = x[ki] + ∆x[i]. (2.81) If S is contracting, that is, (2.68) is positive when |x| is large, then x(t) will converge to a stationary process and Ef in (2.67) may be computed by a long time average [118] Ef ≈ 1T f (x(t)) dt, T0 1 mN xk[i] . (2.82) ≈f mN k=1 i=1 Two features of (2.82) should be noted: (1) the number of time steps chosen is m, that is, T = m · h; and (2) we can accelerate the convergence by the sample average over N . The conver√gence rate, measured by the variance, is O(1/mN ): the statistical error is O(1/ mN ). Process x(t) will become stationary when m is large enough that T = h · m will exceed the relaxation time of the dynamics: to lowest order this means that relative to the smallest eigenvalue λsmall > 0 of the Jacobian [∂2S/∂xi∂xj], T λsmall 1.

MONTE CARLO (MC) METHODS 79 There are several reasons to ﬁnd (2.82) attractive. If n, the size of the vectors x and w, is very large, N = 1 may be chosen to save memory. The remaining i = 1 equation is also very likely to be easy to parallelize by usual matrix methods for large n. Equation (2.80) is an explicit equation—only old data xk appears on the right-hand side. The downside of such an explicit formulation is that Euler methods can be unstable for a large stepsize h. In that situation, higher order methods, implicit or semi-implicit methods, or small times steps can be used [86, 106, 124]. In each of these modiﬁcations, parallelism is preserved. To end this digression on Langevin equations (stochastic diﬀerential equations), we show examples of two simple simulations. In Figure 2.21, we show 32 simulated paths of a two-dimensional Brownian motion. All paths begin at the origin, and the updates per time step take the following form. Where, w = w1 , w2 the updates at each time step are √ ξ1 . wk+1 = wk + h ξ2 The pair ξ are generated at each step by (2.66) and h = 0.01. 10 5 y 0 –5 –10 –5 0 5 10 –10 x Fig. 2.21. Simulated two-dimensional Brownian motion. There are 32 simulated paths, with timestep h = 0 .1 and 100 steps: that is, the ﬁnal time is t = 100 h.

80 APPLICATIONS 104 dx = –(x/|x|) dt + dw, t = 100 [expl. TR, x0 = 0, N = 10,240, h = 0.01] 103 Frequency 102 101 100 10–1 –5 0 5 Process value x(t) Fig. 2.22. Convergence of the optimal control process. The parameters are N = 10 , 240 and time step h = 0 .01 . Integration was by an explicit trapezoidal rule [120]. And ﬁnally, in Figure 2.22, we show the distribution of a converged optimal control process. The Langevin equation (2.80) for this one-dimensional example is dx = −sign(x) dt + dw. It may seem astonishing at ﬁrst to see that the drift term, −sign(x) dt, is suf- ﬁciently contracting to force the process to converge. But it does converge and to a symmetric exponential distribution, p ∼ exp(−2|x|). Again, this is the Fokker–Planck equation (2.79) which in this case takes the form: ∂p ∂ 1 ∂2 −→ = [sign(x)p ] + p, 0. ∂t ∂x 2 ∂x2 t→∞ The evolution ∂p/∂t → 0 means the distribution becomes stationary (time inde- pendent), and p = e−2|x| follows from two integrations. The ﬁrst is to a constant in terms of x = ±∞ boundary values of p and p , which must both be zero if p is normalizable, that is, p(x, t) dx = 1. The second integration constant is determined by the same normalization. Exercise 2.1 MC integration in one-dimension. As we discussed in Section 2.5, MC methods are particularly useful to do integrations. However, we also pointed out that the statistical error is proportional to N −1/2, where N is the sample size. This statistical error can be large since N −1/2 = 1 percent

MONTE CARLO (MC) METHODS 81 when N = 104. The keyword here is proportional. That is, although the statistical error is k · N −1/2 for some constant k, this constant can be reduced signiﬁcantly. In this exercise, we want you to try some variance reduction schemes. What is to be done? The following integral π 1 e−x cos ζ dζ I0(x) = π 0 is known to have a solution I0(x), the zeroth order modiﬁed Bessel function with argument x, see, for example, Abramowitz and Stegun [2], or any comparable table. 1. The method of antithetic variables applies in any given domain of integra- tion where the integrand is monotonic [99]. Notice that for all positive x, the integrand exp(−x cos ζ) is monotonically decreasing on 0 ≤ ζ ≤ π/2 and is symmetric around ζ = π/2. So, the antithetic variates method applies if we integrate only to ζ = π/2 and double the result. For x = 1, the antithetic variates method is I0(1) ≈ I+ + I− 1 N e− cos πui/2 + 1 N = NN e− cos π(1−ui)/2. i=1 i=1 Try the I+ + I− antithetic variates method for I0(1) and compute the variance (e.g. do several runs with diﬀerent seeds) for both this method and a raw integration for the same size N —and compare the results. 2. The method of control variates also works well here. The idea is that an integral 1 I = g(u) du 0 can be rewritten for a control variate φ as 11 I = (g(u) − φ(u)) du + φ(u) du, 00 1N ≈ [g(ui) − φ(ui)] + Iφ, N i=1 where Iφ is the same as I except that g is replaced by φ. The method consists in picking a useful φ whose integral Iφ is known. A good one for the modiﬁed Bessel function is φ(u) = 1 − cos(πu/2) + 1 (cos(πu/2))2 , 2

Pages:

Willington Island

Introduction to parallel computing

Like this book? You can publish your book online for free in a few minutes!

Create your own flipbook

TOP SEARCH

business design fashion music health life sports home marketing children

Introduction to parallel computing

Read the Text Version

Willington Island

TOP SEARCH

RELATED PUBLICATIONS