Evaluate capabilities of the models you have developed

Assignment Help Database Management System
Reference no: EM132866904

KL7011 Advanced Databases - Northumbria University

1. Demonstrate deep knowledge of key concepts of data warehousing, data analytics, data standards, and data quality

2. Appraise, analyse, design, develop and evaluate data warehousing and data analytics solutions using Oracle database system

3. Develop critical awareness of the responsibilities of database developer with respect to professional, legal, security and ethical issues individually or as part of a team.

Assessment Background and Scenario

This assessment is based on two scenarios as follows:

a) The Sales History (SH) Data Warehouse scenario. SH is a sample database schema provided by Oracle, which has been extensively used in the Oracle's Data Warehousing Guide (Lane, 2013).The details of this scenario are provided in Appendix 1.

b) UniTel's Customer Churn dataset described in Appendix 2.

Assignment Questions

Part 1: Data Warehousing Tasks

This part is based on the Sales History scenario as described in Appendix 1.

You must submit all the SQL queries and any other code that you wrote in answering any of the tasks / questions (e.g., the use of Explain Plan statements for the queries and their outputs using Spooling or other suitable means).

(A) Study the index definitions in sh_idx.sql. Discuss in detail (using cost-based analysis) why these indexes (choose three different ones) are useful for answering queries over the SH2 and DWU versions of the database. You should not run the sh_idx.sql script at all.

(B) Identify three new indexes and justify why they could be useful. Write the SQL code for creating these indexes under your DWUaccount. Give example queries with cost-based analysis for both DWU account (which will have the new indexes) and SH2 shared schema (which will NOT have any of your new indexes). Alternatively, you may choose to run the same queries on your DWU account before and after creating your proposed three indexes.

(C) Given the two materialized views (MVs) defined in sh_cremv.sql, discuss in detail why these MVs are useful for users of the SH database. You should provide detailed examples of cost based analysis, e.g., using Explain Plan for running sample queries on both SH2 and DWU to illustrate your answer. You should not run the sh_cremv.sqlscript at all.

(D) Identify threenew MVs based on the base tables in the SH schema under your DWUaccount and justify why they would be useful for the users of your data warehouse. Write the SQL code for creating these MVs. Moreover, run sample queries on both SH2 and DWU to ensure that queries running on DWU will be re-written by Oracle to use your proposed three MVs instead of the base tables used in the sample queries. Note that you must not query your MVs directly in the FROM clause; let the Oracle Query Optimizer re-write the queries and answer them using your proposed MVs.

(E) Prior to the introduction of the special aggregation function CUBE, there was no possibility to express an aggregation over different levels within a single SQL statement without using the set operation UNION ALL. Every different aggregation level needed its own SQL aggregation expression, operating on the exact same data set n times, once for each of the n different aggregation levels. With the introduction of CUBE in the recent database systems, Oracle provided a single SQL command for handling the aggregation over different levels within a single SQL statement, not only improving the runtime of this operation but also reducing the number of internal operations necessary to run the query and reducing the workload on the system.

i. Using CUBE, write an SQL query over the SH schema under your DWUaccount involving one fact table (SALES or COSTS) and at least two dimension tables and at least 3 grouping attributes. Provide output of successful execution of your query. Provide reasons why your query may be useful for users of the SH data warehouse.

ii. Using set operation UNION ALL (and not CUBE), write an SQL query that produces the same result as the query in (a) above. Provide output of successful execution of your query.

iii. Using EXPLAIN PLAN, provide a detailed discussion analysing costs of evaluating the above queries (i.e. with and without ROLLUP).

Part 2: Data Mining Tasks

This part is based on the UniTelscenario as described in Appendix 2. Moreover, you must use the DMUn Oracle Data Mining Account (where 1 <= n <= 75, e.g., DMU1, DMU2) allocated to your group.

Jessica is the customers relation manager at UniTel. She wants to know the possibility of potential churn of the company's customers based on previous experience, so she may be able take some actions accordingly to retain their customers.

To help Jessica in doing her analysis, we need to investigate what could be a suitable algorithm for solvingher problem. The data from last year are used as the training data and the data of February of this year are taken as the testing data to verify the model accuracy. Data of all the columns are used to set up the model. To meet the requirement, many algorithms can be selected.

Oracle Data Mining (ODM) provides the following algorithms for classification:

• Decision Tree
• Naive Bayes
• Generalized Linear Models (GLM)
• Support Vector Machines (SVM)

You are required to perform the following tasks:

1. Using PL/SQL API, SQL-Developer's Data Miner Workflows or RODM (R package for Interfacing ODM), develop at least TWO models based on the above algorithms for the dataset accessible as CUSTOMERCHURN table
2. Evaluate capabilities of the models you have developed
3. Present and discuss your findings and make recommendations for Jessica

Part 3

Critically evaluate the SH data warehouse and the UniTel's Customer Churn data in relation to theory and best practices of data quality and standards.

The report should be concise and comprehensive and in the region of 900-1000 words. You should use Harvard style of citation and referencing by following the guidelines in Pears and Shields (2008).

Attachment:- Advanced Databases.rar

Reference no: EM132866904

Questions Cloud

What are the stated objectives of actors : What are the stated objectives of your actors? What motivates them, or what has caused them to react? What are their ideological assumption
Do project managers predict stakeholders behavior : Does Johnson & Johnson continually seek to identify project stakeholders? In what ways? Do project managers predict stakeholders behavior? In what ways?
How much skocpol and finegold analysis complicate the idea : Why do they see the Agricultural Adjustment Act (and Administration) as more successful than the National Industrial Recovery Act
Testing sales software : Company W is testing a sales software. Its sales force of 500 people is divided into four regions: Northeast, Southeast, Central, and West.
Evaluate capabilities of the models you have developed : Identify three new indexes and justify why they could be useful. Write the SQL code for creating these indexes under your DWUaccount
Summarize booker stances on issues important : Summarize Booker's stances on issues important to him: including race relations, crime, the crisis of mass incarceration, issues of economic justice
Describe the kinds of knowledge management systems : What are the most important knowledge assets at Dirt Bikes? What functions and employee positions are responsible for creating, distributing
Considering branching out into cold beverages : WidgeCorp is considering branching out into cold beverages.
What is a coalition : What is a Coalition? The term "coalition" is the denotation for a group formed when two or more persons or organizations agree to work together temporarily

Reviews

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd