Reference no: EM13762000
In this assignment, you will have a chance to implement the hash-join algorithm for a DBMS. You can use C, C++, or Java to write the program. However, your program should include su- cient comments to make it readable. You need to turn in
(1) your program source code;
(2) sample execution outputs;
(3) proof of compilation (e.g., screen snapshots for compilation);
(4) a brief document about your program design and implementation (e.g., high-level program diagram and data/le structures), program usage and experiments
-----------------------------
However, your program should include sufficient comments to make it readable. I need to turn in (a) a brief report/description about your program design and implementation (e.g., high-level program diagram and data/file structures) and program usage; (B) your program source code; (c) proof of compilation (e.g., the screen snapshot of a successful compilation); and (D) sample execution outputs. Please assemble all the above required contents in a single Word or PDF file for your submission.
The program specifications is given as follows. Let R1(a1, a2, a3) and R2(b1, b2, b3, b4) be two relations with all integer attributes. Tuples in these two relations are sequentially stored in two data files, respectively.
• Use the hash-join algorithm to implement a join (equijoin) of R1 and R2. Assume that the hash function is f(k) = k mod N, where N is the number of buckets allowed in your hash structure/table.
• Your program should allow a user to choose the joining attributes from the two relations, i.e., performing R1 ./R1.ai=R2.bj R2 for any chosen pair of ai and bj , where ai is the i-th attribute in R1 (1 = i = 3) and bj is the jth attribute in R2 (1 = j = 4). For example, a user may want to perform R1 ./R1.a2=R2.b3 R2.
• Your program should display the join result and output the selectivity of the join.
• You may request a user to interactively input the necessary parameters, such as the data file names for R1 and R2, the number of tuples in each relation, and the joining attributes (e.g., 1 for the 1st attribute, 3 for the 3rd attribute).
you need to implement the hash-join algorithm for a DBMS. You can use C++ to write the program. However, your program should include sufficient comments to make it readable. I need to turn in (a) a brief report/description about your program design and implementation (e.g., high-level program diagram and data/file structures) and program usage; (B) your program source code; (c) proof of compilation (e.g., the screen snapshot of a successful compilation); and (D) sample execution outputs. Please assemble all the above required contents in a single Word or PDF file for your submission.
The program specifications is given as follows. Let R1(a1, a2, a3) and R2(b1, b2, b3, b4) be two relations with all integer attributes. Tuples in these two relations are sequentially stored in two data files, respectively.
• Use the hash-join algorithm to implement a join (equijoin) of R1 and R2. Assume that the hash function is f(k) = k mod N, where N is the number of buckets allowed in your hash structure/table.
• Your program should allow a user to choose the joining attributes from the two relations, i.e., performing R1 ./R1.ai=R2.bj R2 for any chosen pair of ai and bj , where ai is the i-th attribute in R1 (1 ≤ i ≤ 3) and bj is the jth attribute in R2 (1 ≤ j ≤ 4). For example, a user may want to perform R1 ./R1.a2=R2.b3 R2.
• Your program should display the join result and output the selectivity of the join.
- You may request a user to interactively input the necessary parameters, such as the data file names for R1 and R2, the number of tuples in each relation, and the joining attributes (e.g., 1 for the 1st attribute, 3 for the 3rd attribute).
- Use your program to perform several joins for different relation instances of R1 and R2
: Join Algorithms
• Iteration Join (conceptual) For each r ? R do For each s ? S do if r.C = s.C then output r,s pair 3
• Merge Join (conceptual) (1) if R and S not sorted, sort them (2) i ? 1; j ? 1; While (i = |R|) ? (j = |S|) do if R[i].C = S[j].C then outputTuples else if R[i].C > S[j].C then j ? j+1 else if R[i].C < S[j].C then i ? i+1 4 Output-Tuples (for duplicates) While (R[i].C = S[j].C) ? (i = |R|) do jj ? j; While (R[i].C=S[j].C)?(jj=|S|) do output (R[i], S[j]); jj ? jj+1; i ? i+1 5 Merge Join
• Both R, S sorted by C Memory ….. ….. R S R S 6
• Hash Join (conceptual) – Hash function h, range 1 ? k – Buckets for R: G1, ... Gk – Buckets for S: H1, ... Hk Algorithm (1) Hash R tuples into G1,…,Gk buckets (2) Hash S tuples into H1,…,Hk buckets (3) For i = 1 to k do match tuples in Gi, Hi buckets 7 Hash Join
• Step (1) • Step (2) ... ... Memory buckets G1 G2 Gk R S Gi ... ... memory Hi R H1 H2 G1 G2 G3 8
• Index Join (conceptual) (1) If not, create an index for S.C (2) For each r ? R do X ? index-lookup(S.C, r.C) For each s ? X do output (r,s) 9 Index Join Memory ….. ….. R S
Simple Hash Join
1. for each logical bucket j
2. for each record r in R
3. if r is in bucket j then
4. insert r into the hash table;
5. for each record s in S
6. if s is in bucket j then
7. probe the hash table
Expert says
Shall I use 2 text files and print the joined 2 data file as output.
example..
text file1 contains
12 34 67
13 78 90
text file2 contains
13 89 55
15 66 77
result after hash join,
13 78 90 89 55