Critically evaluate the SH data warehouse

Assignment Help Database Management System
Reference no: EM132767670

KL7011 Advanced Databases - Northumbria University

Learning outcome 1: Demonstrate deep knowledge of key concepts of data warehousing, data analytics, data standards, and data quality

Learning outcome 2: Appraise, analyse, design, develop and evaluate data warehousing and data analytics solutions using Oracle database system

Learning outcome 3: Develop critical awareness of the responsibilities of database developer with respect to professional, legal, security and ethical issues individually or as part of a team

You are advised to read the guidance for students regarding assessment policies (Northumbria, 2019).

Assessment Background and Scenario

This assessment is based on two scenarios as follows:

a) The Sales History (SH) Data Warehouse scenario. SH is a sample database schema provided by Oracle, which has been extensively used in the Oracle's Data Warehousing Guide (Lane, 2013). The details of this scenario are provided in Appendix 1.

b) UniTel's Customer Churn dataset described in Appendix 2.

Assignment Questions

Part 1: Data Warehousing

This part is based on the Sales History scenario as described in Appendix 1.

You must submit all the SQL queries and any other code that you wrote in answering any of the tasks / questions (e.g., the use of Explain Plan statements for the queries and their outputs using Spooling or other suitable means).

(A) Study the index definitions in sh_idx.sql. Discuss in detail (using cost-based analysis) why these indexes (choose three different ones) are useful for answering queries over the SH2 and DWU versions of the database. You should not run the sh_idx.sql script at all.

(B) Identify three new indexes and justify why they could be useful. Write the SQL code for creating these indexes under your DWU account. Give example queries with cost-based analysis for both DWU account (which will have the new indexes) and SH2 shared schema (which will NOT have any of your new indexes). Alternatively, you may choose to run the same queries on your DWU account before and after creating your proposed three indexes.

(C) Given the two materialized views (MVs) defined in sh_cremv.sql, discuss in detail why these MVs are useful for users of the SH database. You should provide detailed examples of cost based analysis, e.g., using Explain Plan for running sample queries on both SH2 and DWU to illustrate your answer. You should not run the sh_cremv.sql script at all.

(D) Identify three new MVs based on the base tables in the SH schema under your DWU account and justify why they would be useful for the users of your data warehouse. Write the SQL code for creating these MVs. Moreover, run sample queries on both SH2 and DWU to ensure that queries running on DWU will be re-written by Oracle to use your proposed three MVs instead of the base tables used in the sample queries. Note that you must not query your MVs directly in the FROM clause; let the Oracle Query Optimizer re-write the queries and answer them using your proposed MVs.

(E) Prior to the introduction of the special aggregation function CUBE, there was no possibility to express an aggregation over different levels within a single SQL statement without using the set operation UNION ALL. Every different aggregation level needed its own SQL aggregation expression, operating on the exact same data set n times, once for each of the n different aggregation levels. With the introduction of CUBE in the recent database systems, Oracle provided a single SQL command for handling the aggregation over different levels within a single SQL statement, not only improving the runtime of this operation but also reducing the number of internal operations necessary to run the query and reducing the workload on the system.

i. Using CUBE, write an SQL query over the SH schema under your DWU account involving one fact table (SALES or COSTS) and at least two dimension tables and at least 3 grouping attributes. Provide output of successful execution of your query. Provide reasons why your query may be useful for users of the SH data warehouse.

ii. Using set operation UNION ALL (and not CUBE), write an SQL query that produces the same result as the query in (a) above. Provide output of successful execution of your query.

iii. Using EXPLAIN PLAN, provide a detailed discussion analysing costs of evaluating the above queries (i.e. with and without ROLLUP).

Part 2: Data Mining Tasks

This part is based on the UniTel scenario as described in Appendix 2. Moreover, you must use the DMUn Oracle Data Mining Account (where 1 <= n <= 75, e.g., DMU1, DMU2) allocated to your group.

Jessica is the customers relation manager at UniTel. She wants to know the possibility of potential churn of the company's customers based on previous experience, so she may be able take some actions accordingly to retain their customers.

To help Jessica in doing her analysis, we need to investigate what could be a suitable algorithm for solving her problem. The data from last year are used as the training data and the data of February of this year are taken as the testing data to verify the model accuracy. Data of all the columns are used to set up the model. To meet the requirement, many algorithms can be selected.

Oracle Data Mining (ODM) provides the following algorithms for classification:

• Decision Tree
• Naive Bayes
• Generalized Linear Models (GLM)
• Support Vector Machines (SVM)

You are required to perform the following tasks:

1. Using PL/SQL API, SQL-Developer's Data Miner Workflows or RODM (R package for Interfacing ODM), develop at least TWO models based on the above algorithms for the dataset accessible as CUSTOMERCHURN table
2. Evaluate capabilities of the models you have developed
3. Present and discuss your findings and make recommendations for Jessica

Part 3

Critically evaluate the SH data warehouse and the UniTel's Customer Churn data in relation to theory and best practices of data quality and standards.

The report should be concise and comprehensive and in the region of 900-1000 words. You should use Harvard style of citation and referencing by following the guidelines in Pears and Shields (2008).

Harvard referencing.

Attachment:- Advanced Databases.rar

Reference no: EM132767670

Questions Cloud

Develop disaster recovery plan for organization : Develop a disaster recovery plan for organization. There are many different templates available online for you to use as reference and guidance.
What would be the effect on Johnson operating income : Johnson's variable selling expense for its normal line of sneakers is $1.00 per pair. What would be the effect on Johnson's operating income
What are considerations in accepting or rejecting order : The full cost is $25 per dozen. Varden has a normal selling price of $35 per dozen. What are considerations in accepting or rejecting order
Determine the relative product profitabilities : The variable conversion cost is $10 per process hour. The fixed cost is $400,000. Provide an analysis to determine the relative product profitabilities
Critically evaluate the SH data warehouse : Develop critical awareness of the responsibilities of database developer with respect to professional, legal, security and ethical issues individually
What is the total estimated operating income of warehouse : Differential analysis involving opportunity costs - If the proposal is accepted, what is the total estimated operating income of the warehouse for the 14 years
Determine the unit contribution margin for each product : The furnace operation is part of the total process for each of these three products. Determine the unit contribution margin for each product
What is the sunk cost in this situation : The annual variable production costs for the new machine are estimated to be $24,000 per year for eight years. What is the sunk cost in this situation
Problem - Segment Analysis for a Service Company : Problem - Segment Analysis for a Service Company - How does a brokerage company like Schwab define the Investor Services and Advisor Services segments

Reviews

len2767670

1/18/2021 9:45:27 PM

Need to cover all details and do all parts. Make sure need to NEW unique work - Harvard referencing

Write a Review

Database Management System Questions & Answers

  Provide an entity relationship model

Provide an Entity Relationship Model (ERM) that will describe the data structure that will store all data elements.Note:The graphically depicted solution is not included in the required page length.

  Draw an er diagram for the diven situation

Table TakenCourses manages courses info that a student has taken. A course may require pre-requisite courses and the pre-requisite course info is managed in table PreRequisite. draw an ER diagram for this situation.

  Create the database tables, establish keys and constraints

Create the database tables, establish keys and constraints, populate data within the database, and queries to the Assignment Files tab.Submit the presentation including recorded audio speaker notes to the Assignment Files tab Just need it for Supp..

  Intro to programming

For this assignment, you will create a class to describe the product that is being ordered. You will then modify your code to create an instance of this class and utilize it in the ordering process.

  Design database using extended entity relationship diagram

Design the database using Extended Entity Relationship (EERD) Diagram. Develop relational schemas. Relational schemas should be derived using the EERD.

  Differences between the oracle database sysdba and sysoper

What are the differences between the Oracle database SYSDBA and SYSOPER privileges? Explain OS authentication and how that allows you to connect to the database as SYS without providing a password

  Search for a record based on a particular field value

For each of the following queries, which of the listed index choices would you choose to speed up the query? If your database system does not consider index-only plans (i.e., data records are always retrieved even if enough information is available i..

  Store information about musicians who perform on its albums

REQUIREMENTS: YoTown Records has decided to store information about musicians who perform on its albums (as well as other company data) in a database

  Describe a polynomial-time algorithm to test dependency

Describe a polynomial-time algorithm to test dependency-preservation.

  Develop a access database based upon the business scenario

Develop a Microsoft Access database based upon the following business scenario. Be sure to include tables, fields, keys, relationships, and test data in your database. Your final submission will be a Microsoft Access database.

  Develop a fully functional relational database system

Develop a fully functional relational database system, based on an existing system design - Produce technical and user documentation

  Describing the database systems

Summarize your paper by describing the database systems in your workplace, identifying which database systems and architecture they fall under.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd