What patterns are you seeing in the performance curves

Assignment Help Programming Languages
Reference no: EM131506120

Project - OpenCL Array Multiply, Multiply-Add, and Multiply-Reduce

Introduction

There are many problems in scientific computing where you want to do arithmetic on multiple arrays of numbers (matrix manipulation, Fourier transformation, convolution, etc.). This project is in two parts:

1. Multiply two arrays together using OpenCL: D[gid] = A[gid]*B[gid];
Benchmark it against both input array size (i.e., the global work size) and the local work size (i.e., number of work-items per work-group).

2. Multiply two arrays together and add a third using OpenCL: D[gid] = A[gid]*B[gid] + C[gid];
Benchmark it against both input array size (i.e., the global work size) and the local work size (i.e., number of work-items per work-group).

3. Perform the same array multiply as in #1, but this time with a reduction: Sum = summation{ A[:]*B[:] };
Benchmark that against input array size (i.e., the global work size). You can pick a local work size and hold that constant.

Requirements:

First, work on the Array Multiply and the Array Multiply-Add portions:

1. Start with the first.cpp and first.cl files. That code already does array multiplication for one particular combination of global work size and local work size.

2. Helpful Hint: The Array Multiply and the Array Multiply-Add can really be the same program. Write one program that creates the 4 arrays. Pass A, B, and C into OpenCL, and return D. Then all you have to do between the Multiply and Multiply-Add tests is change one line in the .cl file.

3. Make this all work for global work sizes in (at least) the range 1K to 8M, and local work sizes in (at least) the range 8 to 512, or up to the maximum work-group size allowed by your system. How you do this is up to you. Use enough values in those ranges to make good graphs.

4. Use performance units that make sense. Jane Parallel used "MegaMultiplies Per Second" and "MegaMultiply-Adds Per Second".

5. Make two graphs:

  1. Multiply and Multiply-Add performance versus Global Work Size, with a series of colored Constant-Local-Work-Size curves
  2. Multiply and Multiply-Add performance versus Local Work Size, with a series of colored Constant-Global-Work-Size curves

6. Your commentary PDF should tell:

  1. What machine you ran this on
  2. Show the tables and graphs
  3. What patterns are you seeing in the performance curves?
  4. Why do you think the patterns look this way?
  5. What is the performance difference between doing a Multiply and doing a Multiply-Add?
  6. What does that mean for the proper use of GPU parallel computing?

Then, write another version of the code that turns it into a Multiply+Reduce application.

7. Note that this will ultimately compute just a single floating point scalar value.

8. Produce the product array on the GPU, and then do the reduction on it from the same kernel.

9. Return an array, the same size as the number of work-groups. Each element of the array will have the sum from all the items in one work-group. Add up the elements of the array yourself.

10. Try at last 3 different local work sizes, more if you want. Make it no smaller than 32. Make it no larger than 256.

11. Vary the size of the input array from 1K to 8M.

12. Plot another graph showing Multiply-reduction performance versus Input Array Size.

13. Use performance units that make sense. Jane Parallel used "MegaMultiply-Reductions Per Second".

14. To your PDF write-up add:

  1. Show this table and graph
  2. What pattern are you seeing in this performance curve?
  3. Why do you think the pattern looks this way?
  4. What does that mean for the proper use of GPU parallel computing?

Running OpenCL in Visual Studio

First, you will need the following files:

1. cl.h
2. cl_platform.h
3. OpenCL32.lib or OpenCL64.lib

To enable OpenMP, which you need for timing:
Project → Properties → Configuration Properties → C/C++ → Language and then change OpenMP support to "Yes (/openmp)"

To link the library:
Project → Properties → Configuration Properties → Linker → Additional Dependencies →
<Edit...>
and then type either OpenCL32.lib or OpenCL64.lib in the box.

To make this easier, an entire Visual Studio solution has been zipped up in the file First.zip

Running OpenCL in Linux

First, you will need the following files:

1. cl.h
2. cl_platform.h
3. libOpenCL.so

If you are on rabbit, compile and link like this:
icpc -o first first.cpp -no-vec /scratch/cuda-7.0/lib64/libOpenCL.so -lm - openmp
or
g++ -o first first.cpp /scratch/cuda-7.0/lib64/libOpenCL.so -lm -fopenmp

If you are on your own system, change the library reference to whatever path your system has the library in.

Attachment:- Prog.rar

Verified Expert

This project is implemented in C++ language.This project demonstrates the implementation of OpenCL Array Multiply, Multiply-Add, and Multiply-Reduce. For this, I have used OpenCL library which allows to do GPU computing. For this, the main function loads the kernel file that has implementation function of the operation : Multiply, Multiply-Add/Multiply-Reduce. further for each of these implementation. I have computed speed of execution by timing these operations using openmp library. Further I have plotted graphs showing the variations of Multiply, Multiply-Add and Multiply Reduce against array size and local work size of work unit.

Reference no: EM131506120

Questions Cloud

How much more money will your brother have than you : You do not start saving money until age 46. On your 46th birthday you dutifully invest $10,000 each year until you finish your deposits when you reach the age.
Who are salva kiir and riek machar : Who are Salva Kiir and Riek Machar and what is the basis of their disagreement? What is "responsibility to protect" and how might it be applied in this case?
How much user could afford to pay for more efficient unit : A single-stage centrifugal blower is to be selected for an engineering design application. Suppliers have been consulted, and the choice has been narrowed down.
Develop a branding strategy for your product : Develop a branding strategy for your product that covers the brand name, logo, slogan, and at least one (1) brand extension.
What patterns are you seeing in the performance curves : What machine you ran this on Show the tables and graphs What patterns are you seeing in the performance curves?
Discuss the salvage values of both generators : A pump for a reservoir must be operated continuously (8,760 hours per year). In the event of a large storm, the electricity from the local utility's power.
Advantages or disadvantages of the systemic intervention : What may be advantages or disadvantages of the systemic intervention based upon whether the family or the individual is the focus of treatment
Develop change strategies to effectively implement : Develop change strategies to effectively implement economic policy modifications that respond to market conditions, and improve economic performance.
What you foresee as major problem with languageless thinking : "The Relationship Between Language and Thinking"- If you believe it is not possible, describe what you foresee as major problems with languageless thinking.

Reviews

inf1506120

6/17/2017 6:10:36 AM

I am extremely happy with the administration gave by this organization. Will return - everybody is brilliant and supportive here. Thanks for completing my assignment before the time.

len1506120

5/24/2017 6:52:23 AM

please change OpenCL64lib.txt to OpenCL64.lib etc... I had to change the file name in order to send it over the internet Feature Points Multiply table and graphs 20 Multiply-Add table and graphs 20 Multiply and Multiply-Add Commentary 30 Reduction tables and graphs 20 Reduction Commentary 30 Potential Total 120

Write a Review

Programming Languages Questions & Answers

  Write a haskell program to calculates a balanced partition

Write a program in Haskell which calculates a balanced partition of N items where each item has a value between 0 and K such that the difference b/w the sum of the values of first partition,

  Create an application to run in the amazon ec2 service

In this project you will create an application to run in the Amazon EC2 service and you will also create a client that can run on local machine and access your application.

  Explain the process to develop a web page locally

Explain the process to develop a Web page locally

  Write functions

These 14 questions covers java class, Array, link list , generic class.

  Programming assignment

If the user wants to read the input from a file, then the output will also go into a different file . If the user wants to read the input interactively, then the output will go to the screen .

  Write a prolog program using swi proglog

Write a Prolog program using swi proglog

  Create a custom application using eclipse

Create a custom Application Using Eclipse Android Development

  Create a application using the mvc architecture

create a application using the MVC architecture. No scripting elements are allowed in JSP pages.

  Develops bespoke solutions for the rubber industry

Develops bespoke solutions for the rubber industry

  Design a program that models the worms behavior

Design a program that models the worm's behavior.

  Writing a class

Build a class for a type called Fraction

  Design a program that assigns seats on an airplane

Write a program that allows an instructor to keep a grade book and also design and implement a program that assigns seats on an airplane.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd