Reference no: EM133220431
Look up the classical fourth-order Runge Kutta method for solving an ordinary differential equation. Use this method instead of Euler's method to estimate the values. Modify the reduced versions of the serial n-body solver, either the Pthreads or the OpenMP n-body solver, and the MPI n-body solver. How does the output compare to the output using Euler's method? How does the performance of the two methods compare?
Use Pthreads or OpenMP to implement tree search in which there's a shared stack. As we discussed in the text, it would be very inefficient to have all calls to Push and Pop access a shared stack, so the program should also use a local stack for each thread. However, the Push function can occasionally push partial tours onto the shared stack, and the Pop function can pop several tours from the shared stack and push them onto the local stack, if the calling thread has run out of work. Thus, the program will need some additional input arguments:
a. The frequency with which tours are pushed onto the shared stack. This can be an int. For example, if every 10th tour generated by a thread should be pushed onto the shared stack, then the command-line argument would be 10.
b. A blocksize for pushing. There may be less contention if, rather than pushing a single tour onto the shared stack, a block of several tours is pushed
c. A blocksize for popping. If we pop a single tour from the shared stack when we run out of work, we may get too much contention for the shared stack.
How can a thread determine whether the program has terminated?
Implement this design and your termination detection with Pthreads or OpenMP. How do the various input arguments affect its performance? How does the optimal performance of this program compare with the optimal performance of the dynamically load-balanced code we discussed in the text?