Assume a gpu architecture that contains 10 simd

Assignment Help Operating System
Reference no: EM13166574

Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each non- diverged SIMD instruction can produce 32 results every 4 cycles.Assume a kernel that has divergent branches that causes on average 80% of threads to be active. Assume that 70% of all SIMD instructions executed are single-precision arithmetic and 20% are load/store. Since not all memory latencies are covered, assume an average SIMD instruction issue rate of 0.85. Assume that the GPU has a clock speed of 1.5 GHz.
Questions :
(1) Compute the throughput, in GFLOP/sec, for this kernel on this GPU.

(2)Assume that you have the following choices:

(1) Increasing the number of single precision lanes to 16

(2) Increasing the number of SIMD processors to 15 (assume this change doesn't affect any other performance metrics and that the code scales to the additional processors)

(3) Adding a cache that will effectively reduce memory latency by 40%, which will increase

instruction issue rate to 0.95

What is speedup in throughput for each of these improvements?

Reference no: EM13166574

Questions Cloud

Problem of determining whether a dfa : Consider the problem of determining whether a DFA and a regular expression are equivalent. Express this problem as a language and show that it is decidable.
What is the minimum number of attendants : A nursing home employs attendants who are needed around the clock. Each attendant is paid the same, regardless of when his or her shift begins. Each shift is 8 consecutive hours.
A queue is a first-in, first-out list : A queue is a first-in, first-out list. The queue has two essential operations: enqueue, which appends an entry to the end of the list (called the tail ) and dequeue, which returns and removes the first entry in the list
Since mac address is unique for each device : Since MAC address is unique for each device, why Internet does not use MAC address for routing and communications, instead, they assign an IP address to each machine? Here we do not count that MAC address is longer than IPv4 address (since IPv6 ha..
Assume a gpu architecture that contains 10 simd : Assume a GPU architecture that contains 10 SIMD processors. Each SIMD instruction has a width of 32 and each SIMD processor contains 8 lanes for single-precision arithmetic and load/store instructions, meaning that each non- diverged SIMD instruct..
Random permutations : Write a program in C++ that gives a prompt to the users, asking for an integer M.It then prints out a M different random permutations, one on each of M lines, of the numbers 1, 2, ... , 10, separated by spaces or tabs.
Eiffel tower ridiculous eyesore : Explain whether you agree with the opinion of many Parisians in 1889 that the Eiffel Tower was a ridiculous eyesore that should have been torn down after the World’s Fair.
Write a complete program which computes the sum : Write a complete program (actually complete the one on the exam) which computes the following sum: ( 100 / 1 ) + ( 99 / 2 ) + ( 98 / 3 ) + ( 97 / 4 ) + ... + ( 3 / 98 ) + ( 2 / 99 ) + ( 1 / 100 ) Use a FOR loop in your program (where indicated) to..
Cuckoo hashing : Using Cuckoo hashing, hash the following keys using the (h1,h2) pairs shown. A: 2,0 B: 0,0 C: 4,1 D: 0,1 E: 2,3 Using Hopscotch hashing with a max hop of 4, hash the following keys. A: 6 B: 7 C: 9 D: 7 E: 6 F: 7 G: 8

Reviews

Write a Review

Operating System Questions & Answers

  Implementation of algorithms for process management

The Shortest Job Next (SJN) algorithm queues processes in a way that the ones that use the shortest CPU cycle will be selected for running rst.

  Develop a user mode command interpreter

Develop a user mode command interpreter which support list-short.

  Memory allocation in operating system

Analysis and implementation of algorithms for memory allocation in operating system, Explain First- t and best- t methods are used in memory allocation in operating systems.

  Stand alone child process

Forking the child process

  Write a multi-threaded program

Write a multi-threaded program to solve producer and consumer problem

  Marginal and average cost curves

n a competitive market place (pure competition) is it possible to continually sell your product at a price above the average cost of production.

  Simulating operating systems scheduling

Simulate the long-term scheduler, the short-term scheduler and the I/O scheduler of the computer using the First-Come-First-Serve algorithm.

  Issues with trusted platform module

Research paper discussing the issues with Trusted Platform Module (TPM)

  Threads

Explain a complication that concurrent processing adds to an operating system.

  Design and programming

Use the semaphore methods to control the concurrency of the solution

  Virtual machines

Virtual machines supported by a host operating system

  Discuss an application that benefits barrier synchronization

Discuss an application that would benefit from the use of barrier synchronization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd