Solution-Define the concept of reduction factor

Define the concept of reduction factor

Assignment Help Database Management System

Reference no: EM13336562

Part 1. Concepts and principles

(Each answer should contain approximately 300 words.)

Question 1

Summarize briefly how to make use of indexes such as B+ tree or a hash indexes in selection, projection, and join operations?

Question 2

Explain why two different hash functions are used in hash join, and how to select them.

Question 3

Define the concept of reduction factor, and explain how it is calculated based on what statistics are available in the catalogs.

Question 4

Explain what a typical optimizer does, and what strategies can be used to generate a good plan, and how it can affect the performance of a DBMS.

Question 5

Define relational algebra equivalences, and explain how they can be used for plan generation. You may use pushing selection (ahead of joins) and re-ordering join expressions as examples.

Part 2. Design considerations and calculation

Question 1

Athabasca University has about 32,000 students between the ages of 17 to 60. Consider the AU student relation with the following schema:

Students (stud-id: integer, stud-name: string, gpa: integer, age: real)

For each of the following indexes, list whether the index matches the given selection conditions. If there is a match, list the primary conjuncts.2

1. A B+-tree index on the search key Students.stud-id.

a. σ Students.stud-id < 28,000 (Students)

b. σ Students.stud-id = 28,000 (Students)

2. A hash index on the search key Students.stud-id.

a. σ Students.stud-id < 28,000 (Students)

b. σ Students.stud-id = 28,000 (Students)

3. A B+-tree index on the search key Students.stud-id, Students.age.

a. σ Students.stud-id < 28,000 and Students.age = 21 (Students)

b. σ Students.stud-id = 28,000 and Students.age> 21 (Students)

c. σ Students.stud-id = 28,000 (Students )

d. σ Students.age = 21 (Students )

4. A hash-tree index on the search key Students.stud-id, Students.age .

a. σ Students.stud-id = 28,000 and Students.age = 21 (Students)

b. σ Students.stud-id = 28,000 and Students.age> 21 (Students)

c. σ Students.stud-id = 28,000 (Students )

Question 2

Suppose that you just finished inserting several records into a heap file and now want to sort those records. Assume that the DBMS uses external sort and makes efficient use of the available buffer space when it sorts a file. Here is some potentially useful information about the newly loaded file and the DBMS software available to operate on it:

The number of records in the file is 4500. The sort key for the file is 4 bytes long.

You can assume that rids are 8 bytes long and page ids are 4 bytes long. Each

record is a total of 48 bytes long. The page size is 512 bytes. Each page has 12

bytes of control information on it. Four buffer pages are available.

1. How many sorted subfiles will there be after the initial pass of the sort, and how long will each subfile be?

2. How many passes (including the initial pass just considered) are required to sort this file?

3. What is the total I/O cost for sorting this file?

4. What is the largest file, in terms of the number of records, you can sort with just four buffer pages in two passes? How would your answer change if you had 257 buffer pages?

5. Suppose that you have a B+ tree index with the search key being the same as the desired sort key. Find the cost of using the index to retrieve the records in sorted order for each of the following cases:

• The index uses Alternative (1) for data entries.

• The index uses Alternative (2) and is unclustered. (You can compute the worst case cost in this case.)

• How would the costs of using the index change if the file is the largest that you can sort in two passes of external sort with 257 buffer pages? Give your answer for both clustered and unclustered indexes.3

Question 3

Consider the join R R.a = S.bS , given the following information about the relations to be joined. The cost metric is the number of page I/Os unless otherwise noted, and the cost of writing out the result should be uniformly ignored.

Relation R contains 10,000 tuples and has 10 tuples per page.

Relation S contains 2000 tuples and also has 10 tuples per page.

Attribute b of relation S is the primary key for S.

Both relations are stored as simple heap files.

Neither relation has any indexes built on it.

52 buffer pages are available.

1. What is the cost of joining R and S using a page-oriented simple nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

2. What is the cost of joining R and S using a block-nested loops join? What is the minimum number of buffer pages required for this cost to remain unchanged?

3. What is the cost of joining R and S using a sort-merge join? What is the minimum number of buffer pages required for this cost to remain unchanged?

4. What is the cost of joining R and S using a hash join? What is the minimum number of buffer pages required for this cost to remain unchanged?

5. What would be the lowest possible I/O cost for joining R and S using any join algorithm, and how much buffer space would be needed to achieve this cost? Explain briefly.

6. How many tuples does the join of R and S produce, at most, and how many pages are required to store the result of the join back on disk?

7. Would your answers to any of the previous questions in this exercise change if you were told that R.a is a foreign key that refers to S.b?

Question 4

Consider the following relational schema and SQL query. The schema captures information about employees, departments, and company finances (organized on a per department basis).

Emp (eid:integer, did:integer, sal:integer, hobby:char(20))

Dept (did:integer, dname:char(20), floor:integer, phone:char(10))

Finance (did:integer, budget:real, sales:real, expenses:real)4

Consider the following query:

SELECT D.dname, F.budget

FROM Emp E, Dept D, Finance F

WHERE E.did = D.did AND D.did = F.did AND D.floor = 1

AND E.sal ≥ 59000 AND E.hobby = ‘yodeling'

1. Identify a relational algebra tree (or a relational algebra expression if you prefer) that reflects the order of operations a decent query optimizer would choose.

2. List the join orders (i.e., orders in which pairs of relations can be joined to compute the query result) that a relational query optimizer will consider. (Assume that the optimizer follows the heuristic of never considering plans that require the computation of crossproducts.) Explain briefly how you arrived at your list.

3. Suppose that the following additional information is available: Unclustered B+ tree indexes exist on Emp.did, Emp.sal, Dept.floor,Dept.did, and Finance.did. The system's statistics indicate that employee salaries range from 10,000 to 60,000, employees enjoy 200 different hobbies, and the company owns two floors in the building. There are a total of 50,000 employees and 5,000 departments (each with corresponding financial information) in the database. The DBMS used by the company has just one join method available, index nested loops.

a. For each of the query's base relations (Emp, Dept, and Finance) estimate the number of tuples that would be initially selected from that relation if all of the nonjoin predicates on that relation were applied to it before any join processing begins.

b. Given your answer to the preceding question, which of the join orders considered by the optimizer has the lowest estimated cost?

Part 1. Concepts and principles

(Each answer should contain approximately 300 words.)

Question 1

Define ACID properties, and briefly explain the methods used to deal with them in the DBMS.

Question 2

Why is concurrent execution of transactions important in the DBMS? What is a schedule and how do you determine the conflicts between transactions in a schedule? You may need to explain this using such anomalies as dirty read, unrepeatable read, and blind write.

Question 3

Define Strict 2PL protocol and precedence graph, and explain how to use them to address the conflict serializability problem.

Question 4

Explain the reason of deadlock and discuss how to deal with a deadlock issue in the DBMS.

Question 5

Briefly explain the ARIES recovery algorithm, including the main phases, main principles and key ideas behind the algorithm such as WAL protocol and checkpointing etc.

Part 2. Design considerations and calculation

Question 1

Consider a database with objects X and Y and assume that there are two transactions T1 and T 2. Transaction T 1 reads objects X and Y and then writes object X. Transaction T 2 reads objects X and Y and then writes objects X and Y.

1. Give an example schedule with actions of transactions T1 and T 2 on objects X and Y that results in a write-read conflict.

2. Give an example schedule with actions of transactions T1 and T 2 on objects X and Y that results in a read-write conflict.

3. Give an example schedule with actions of transactions T1 and T 2 on objects X and Y that results in a write-write conflict.

4. For each of the three schedules, show that Strict 2PL disallows the schedule.

Question 2

Answer the following questions: SQL supports four isolation-levels and two access-modes, for a total of eight combinations of isolation-level and access-mode. Each combination implicitly defines a class of transactions; the following questions refer to these eight classes:

1. Consider the four SQL isolation levels. Describe which of the phenomena can occur at each of these isolation levels: dirty read, unrepeatable read, and phantom problem.

2. For each of the four isolation levels, give examples of transactions that could be run safely at that level.

3. Why does the access mode of a transaction matter?

Question 3

Consider the following classes of schedules: serializable, conflict-serializable, viewserializable, recoverable, avoids-cascading-aborts, and strict. For each of the following schedules, state to which of the preceding classes it belongs. If you cannot decide whether a schedule belongs in a certain class based on the listed actions, explain briefly.The actions are listed in the order they are scheduled and prefixed with the transaction name. If a commit or abort is not shown, the schedule is incomplete; assume that abort or commit must follow all the listed actions.

1. T1:R(X), T2:R(X), T1:W(X), T2:W(X)

2. T1:W(X), T2:R(Y), T1:R(Y), T2:R(X)

3. T1:R(X), T2:R(Y), T3:W(X), T2:R(X), T1:R(Y)

4. T1:R(X), T1:R(Y), T1:W(X), T2:R(Y), T3:W(Y), T1:W(X), T2:R(Y)

5. T1:R(X), T2:W(X), T1:W(X), T2:Abort, T1:Commit

6. T1:R(X), T2:W(X), T1:W(X), T2:Commit, T1:Commit

7. T1:W(X), T2:R(X), T1:W(X), T2:Abort, T1:Commit

8. T1:W(X), T2:R(X), T1:W(X), T2:Commit, T1:Commit

9. T1:W(X), T2:R(X), T1:W(X), T2:Commit, T1:Abort

10. T2: R(X), T3:W(X), T3:Commit, T1:W(Y), T1:Commit, T2:R(Y), T2:W(Z), T2:Commit

11. T1:R(X), T2:W(X), T2:Commit, T1:W(X), T1:Commit, T3:R(X), T3:Commit

12. T1:R(X), T2:W(X), T1:W(X), T3:R(X), T1:Commit, T2:Commit, T3:Commit

Question 4

Consider the execution shown in the following figure. In addition, the system crashes during recovery after writing two log records to stable storage and again after writing another two log records.

LSN LOG

00 begin_checkpoint

10 end_checkpoint

20 update: T1 writes P1

30 update: T2 writes P2

40 update: T3 writes P3

50 T2 commit

60 update: T3 writes P2

70 T2 end

80 update: T1 writes P5

90 T3 abort

X CRASH, RESTART

1. What is the value of the LSN stored in the master log record?

2. What is done during Analysis?

3. What is done during Redo?

4. What is done during Undo?

5. Show the log when recovery is complete, including all non-null prevLSN and undonextLSN values in log records

Part 1. Concepts and principles

(Each answer should contain approximately 300 words.)

Question 1

What decision need to be made during physical design, and when should be create clustered indexes?

Question 2

Describe the main idea behind discretionary access control. What are its weakness?

Question 3

Explain fragmentation and replication, and explain their differences in data distribution and updating.

Question 4

Define association rules, and explain how to induct association rules by using frequent itemsets, a Priori Property, and support and confidence measures.

Part 2. Design and tuning considerations

Question 1

Consider the following BCNF relational schema for a portion of a company database (type information is not relevant to this question and is omitted):

Project (pno, proj_name, proj_base_dept, proj_mgr, topic, budget)

Manager (mid, mgr_name, mgr_dept, salary, age, sex)

Note that each project is based in some department, each manager is employed in some department, and the manager of a project need not be employed in the same department (in which the project is based). Suppose you know that the following queries are the five most common queries in the workload for this company and all five are roughly equivalent in frequency and importance:

• List the names, ages, and salaries of managers of a user-specified sex (male or female) working in a given department. You can assume that, while there are many departments, each department contains very few project managers.2

• List the names of all projects with managers whose ages are in a user-specified range (e.g., younger than 30).

• If a department has a manager who manages a project based in this department, then list the department name as output (exclude those departments in the output whose managers always manage some other departments' project, or don't manage any projects at all).

• List the name of the project with the lowest budget.

• For a given project, list the names of all managers in the department in which the project is based. Note: a department may have more than one manager who can manage projects that may or may not belong to the same department, as described in the question above.

These queries occur much more frequently than updates, so you should build whatever indexes you need to speed up these queries. However, you should not build any unnecessary indexes, as updates will occur (and would be slowed down by unnecessary indexes). Given this information, design a physical schema for the company database that will give good performance for the expected workload. In particular, decide which attributes should be indexed and whether each index should be a clustered or an unclustered index. Assume that both B+ trees and hashed indexes are supported by the DBMS, and that both single-and multiple-attribute index keys are permitted.

1. Specify your physical design by identifying the attributes you recommend indexing on, indicating whether each index should be clustered or unclustered and whether it should be a B+ tree or a hashed index.

2. Assume that this workload is to be tuned with an automatic index-tuning wizard. Outline the main steps in the algorithm and the set of candidate configurations considered.

3. Redesign the physical schema assuming the set of important queries is changed to be the following:

• Find the total of the budgets for projects managed by each manager; that is, list proj_mgr and the total of the budgets of projects managed by that manager, for all values of proj_mgr.

• Find the total of the budgets for projects managed by each manager but only for managers who are in a user-specified age range.

• Find the number of male managers.

• Find the average age of managers.

Question 2

For each of the following queries, identify one possible reason why an optimizer might not find a good plan. Rewrite the query so that a good plan is likely to be found. Any available indexes or known constraints are listed before each query; assume that the relation schemas are consistent with the attributes referred to in the query.

Employee (eno, ename, dno, age, sex, sal )

Project (pno, pname, dno, proj_mgr, topic, budget)

Department (dno, dname, mgr_name, address) 3

1. An index is available on the age attribute:

SELECT E.dno

FROM Employee E

WHERE E.age = 20 OR E.age = 10

2. A B+ tree index is available on the age attribute:

SELECT E.dno

FROM Employee E

WHERE E.age<20 AND E.age>10

3. An index is available on the age attribute:

SELECT E.dno

FROM Employee E

WHERE 2*E.age< 20

4. No index is available:

SELECT DISTINCT *

FROM Employee E

5. No index is available:

SELECT AVG (E.sal)

FROM Employee E

GROUP BY E.dno

HAVING E.dno = 22

6. The dno in Employee is a foreign key that refers to Department:

SELECT D.dno

FROM Department D, Employee E

WHERE D.dno = E.dno

Part 3. Exploration and/or research report

In this part, you are asked to explore and/or investigate technical problems of DBM systems. You may either explore the problems without direct relation with a real DBMS, or focus on one of the next DBMS such as

• PostgreSQL( https://www.postgresql.org/ )

• MySQL (https://www.mysql.com/ )

• Oracle (https://www.oracle.com/database/index.html )

• SQL Server ( https://www.microsoft.com/sqlserver )

• DB2 (www.ibm.com/db2 )

Your exploration or research should focus on an in-depth topic about theories, techniques, algorithms, approaches, mechanisms, and implementation of the following techniques:

• File organization and indexes

• Query optimization

• Transaction management

• Distributed database support

• Database security

• Data mining

Your topic could come from a sub-problem of a cutting-edge research problem about these techniques (if you want to investigate and solve a technical problem), or a successful (or a planned) implementation in one of the above DBMSs (if you want to explore an existing DBMS).

Your investigation will be based on recent papers, technical documents, and software packages (open source preferred). You are encouraged to read some papers from the next sources for new techniques in DBMS:

• Communications of the ACM

• ACM SIGMOD (ACM Special Interest Group on Management of Data) Newsletter

• SIGMOD: International Conference on Management of Data

• PODS: Symposium on Principles of Database Systems

• ACM Transactions on Database Systems

• Knowledge and Data Engineering, IEEE Transactions on

• Data & Knowledge Engineering

Reference no: EM13336562

Questions Cloud

Find an expression for the velocity vs distance : Find an expression for the velocity vs. distance.

Briefly how to make use of indexes such as b+ tree : Summarize briefly how to make use of indexes such as B+ tree or a hash indexes in selection, projection, and join operations?

Sketch the base-five pieces and each of the subtraction mode : Sketch the base-five pieces and each of the subtraction models to determine the following differences. Record sketches of your work and record the base-five numeral for each difference.

What is the maximum safe diving depth : A scuba diver can withstand pressures up to 4.2 atm without rish of getting the bends. What is the maximum safe diving depth

Define the concept of reduction factor : Summarize briefly how to make use of indexes such as B+ tree or a hash indexes in selection, projection, and join operations?

Where would you deploy personnel or focus attention : What points overlap? where would you deploy personnel or focus attention?

Determine the angular speed of the merry-go-round : A 6.2-m radius playground merry-go-round with a moment of inertia of 2400 kg·m2 is rotating freely with an angular speed of 1.5 rad/s. What is the angular speed of the merry-go-round right after the two people have stepped on

How much time elapses before it attains its greatest speed : A simple pendulum is made from a 0.60-m-long string and a small ball attached to its free end. how much time elapses before it attains its greatest speed

Karen and wayne need to buy a refrigerator because theirs ju : Karen and Wayne need to buy a refrigerator because theirs just broke

User Account

All Pages

Define the concept of reduction factor

Reference no: EM13336562

Reference no: EM13336562

Questions Cloud

Reviews

Write a Review

Database Management System Questions & Answers

Knowledge and data warehousing

Design a database schema

Entity-relationship diagram

Implement a database of courses and students for a school

Prepare the e-r diagram for the movie database

Design relation schemas for the entire database

Prepare the relational schema for database

Data modeling and normalization

Use cases perform a requirements analysis for the case study

Knowledge and data warehousing

Stack and queue data structure

Practice on topic of normalization

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT