hw7, Data Structure & Algorithms

Assignment Help:
Handout 15
COMP 264: Introduction to Computer Systems (Section 001)
Spring 2013
R. I. Greenberg
Computer Science Department
Loyola University
Water TowerCampus, Lewis Towers 524
820 N. Michigan Ave.
Chicago, Illinois 60611-2147
Assignment #7
Issued 4/12
Due (at class time) 4/22
Best to hand in all solutions through the online submission mechansim of the course web
page. No fancy diagramming is required; you can scan something hand-drawn, or use any
simple online tool of your choosing. Just make sure you end up with a common format
that will be easy to handle, for example .pdf, .jpg, .txt or even .doc or .docx if you must.
Also make sure that your submission name is appropriate to the le type, e.g., 7.pdf or
7.txt, etc. (You also can submit directly on shannon/knuth if you are working there by
copying your le to the directory ~rig/c264hw7sub with lename in the form Email-X.EXT,
where Email is your email address, X is a string of eight or more \random" alphanumeric
characters, and EXT is the le type (i.e. do a command similar to, the following with the
correct replacements for EXT, YOUREMAILADDRESS, and RANDOM:
cp 7.EXT ~rig/c264hw7sub/YOUREMAILADDRESS-RANDOM.EXT
)
HW7-1 (43 points) Suppose we wish to write a procedure that computes the inner
product of two vectors u and v. An abstract version of the function has a CPE of 16{17
with x86-64 and 26{29 with IA32 for integer, single-precision, and double-precision data. By
doing the same sort of transformations we did to transform the abstract program combine1
into the more ecient combine4, we get the following code:
void inner4(vec_ptr u, vec_ptr v, data t *dest) {
long int i;
int length = vec_length(u);
data_t *udata = get_vec_start(u);
data_t *vdata = get_vec_start(v);
data_t sum = (data_t) 0;
for (i = 0; i < length; i++){
sum = sum + udata[i] * vdata[i];
}
*dest = sum;
}
Our measurements show that this function has a CPE of 3.00 for integer and
oating-
point data. For data type float, the x86-64 assembly code for the inner loop is as follows
(udata in %rbx, vdata in %rax, limit in %rcx, i in %rdx, sum in %xmm1):
.L87: ; loop:
movss (%rbx,%rdx,4), %xmm0 ; Get udata[i]
mulss (%rax,%rdx,4), %xmm0 ; Multiply by vdata[i]
addss (%xmm0, %xmm1 ; Add to sum
addq $1, %rdx ; Increment i
cmpq %rcx, %rdx ; Compare i:limit
jl .L87 ; If <, goto loop
Assume that the functional units have the latencies and issue times given in Figure 5.12
(and in the course notes).
A. Diagram how this instruction sequence would be decoded into operations, and show how
the data dependencies between them would create a critical path of operations in the style
of Figures 5.13 (fpb-sequential.ppt) and 5.14 (fpb-
ow.ppt and fpb-
ow-abstract.ppt). (25
points.)
B. For data type float, what lower bound on the CPE is determined by the critical path?
(6 points.)
C. Assuming similar instruction sequences for the integer code as well, what lower bound on
the CPE is determined by the critical path for integer data? (6 points.)
D. Explain how the two
oating-point versions can have CPEs of 3.00 even though the
multiplication operation requires either 4 or 5 cycles. (6 points.)
HW7-2 (27 points) Write a version of the inner product procedure described in the
previous problem that uses four-way loop unrolling. (11 points.)
For x86-64, our measurements of the unrolled version give a CPE of 2.00 for integer data
but still 3.00 for both single and double precision
oating point.
A. Explain why any version of any inner product procedure cannot achieve a CPE less than
2.00. (8 points.)
B. Explain why the performance for
oating-point data did not improve with loop unrolling.
(8 points.)

Related Discussions:- hw7

Balance theorem, Question 1 Discuss the following theorems with respect to...

Question 1 Discuss the following theorems with respect to Splay Trees- Balance Theorem Dynamic Finger Theorem   Question 2 Write a C program for implementation

Explain multidimensional array, Multidimensional array: Multidimensional a...

Multidimensional array: Multidimensional arrays can be defined as "arrays of arrays". For example, a bidimensional array can be imagined as a bidimensional table made of elements,

Define threaded binary tree, Threaded Binary Tree:- By changing the NUL...

Threaded Binary Tree:- By changing the NULL lines in a binary tree to special links known as threads, it is possible to perform traversal, insertion and deletion without using

Total weight of minimum spanning tree, a) Run your program for α = 0.05, 0...

a) Run your program for α = 0.05, 0.5, and 0.95. You can use n = 30, and W = 10. What is impact of increasing value of α on connectivity of G'? To answer this question, for each v

Implementation of stack, In this unit, we have learned how the stacks are i...

In this unit, we have learned how the stacks are implemented using arrays and using liked list. Also, the advantages and disadvantages of using these two schemes were discussed. Fo

Test whether a binary tree is a binary search tree, Q. Write down an algori...

Q. Write down an algorithm to test whether a Binary Tree is a Binary Search Tree.              A n s . The algorithm to check whether a Binary tree is as Binary Search

Draw the process flow diagram, Draw the process flow diagram: Anand   ...

Draw the process flow diagram: Anand   Dairy (AD) sources 150,000 litres of milk daily from large number of local villagers .The milk is collected from 4:00 AM to 6:00 am and

Define min-heap, Define min-heap A min-heap is a complete binary tree i...

Define min-heap A min-heap is a complete binary tree in which each element is less than or equal to its children. All the principal properties of heaps remain valid for min-hea

Insertion of a node into a binary search tree, A binary search tree is cons...

A binary search tree is constructed through the repeated insertion of new nodes in a binary tree structure. Insertion has to maintain the order of the tree. The value to the lef

Arrays and pointers, C compiler does not verify the bounds of arrays. It is...

C compiler does not verify the bounds of arrays. It is your job to do the essential work for checking boundaries wherever required. One of the most common arrays is a string tha

Write Your Message!

Captcha
Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd