Knowledge of key concepts of data warehousing

Assignment Help PL-SQL Programming
Reference no: EM132755320

KL7011 Advanced Databases - Northumbria University

LO1: Demonstrate deep knowledge of key concepts of data warehousing, data analytics, data standards, and data quality

Appraise, analyse, design, develop and evaluate data warehousing and data analytics solutions using Oracle database system

Develop critical awareness of the responsibilities of database developer with respect to professional, legal, security and ethical issues individually or as part of a team

Assessment Background and Scenario

This assessment is based on two scenarios as follows:

- The Sales History (SH) Data Warehouse scenario. SH is a sample database schema provided by Oracle, which has been extensively used in the Oracle's Data Warehousing Guide (Lane, 2013). The details of this scenario are provided in Appendix 1.

- UniTel's Customer Churn dataset described in Appendix 2.

Assignment Questions

Part 1: Data Warehousing Tasks

This part is based on the Sales History scenario as described in Appendix 1.

You must submit all the SQL queries and any other code that you wrote in answering any of the tasks / questions (e.g., the use of Explain Plan statements for the queries and their outputs using Spooling or other suitable means).

- Study the index definitions in sh_idx.sql. Discuss in detail (using cost-based analysis) why these indexes (choose three different ones) are useful for answering queries over the SH2 and DWU versions of the database. You should not run the sh_idx.sql script at all.

- Identify three new indexes and justify why they could be useful. Write the SQL code for creating these indexes under your DWU account. Give example queries with cost-based analysis for both DWU account (which will have the new indexes) and SH2 shared schema (which will NOT have any of your new indexes). Alternatively, you may choose to run the same queries on your DWU account before and after creating your proposed three indexes.

- Given the two materialized views (MVs) defined in sh_cremv.sql, discuss in detail why these MVs are useful for users of the SH database. You should provide detailed examples of cost based analysis, e.g., using Explain Plan for running sample queries on both SH2 and DWU to illustrate your answer. You should not run the sh_cremv.sql script at all.

- Identify three new MVs based on the base tables in the SH schema under your DWU account and justify why they would be useful for the users of your data warehouse. Write the SQL code for creating these MVs. Moreover, run sample queries on both SH2 and DWU to ensure that queries running on DWU will be re-written by Oracle to use your proposed three MVs instead of the base tables used in the sample queries. Note that you must not query your MVs directly in the FROM clause; let the Oracle Query Optimizer re-write the queries and answer them using your proposed MVs.

- Prior to the introduction of the special aggregation function CUBE, there was no possibility to express an aggregation over different levels within a single SQL statement without using the set operation UNION ALL. Every different aggregation level needed its own SQL aggregation expression, operating on the exact same data set n times, once for each of the n different aggregation levels. With the introduction of CUBE in the recent database systems, Oracle provided a single SQL command for handling the aggregation over different levels within a single SQL statement, not only improving the runtime of this operation but also reducing the number of internal operations necessary to run the query and reducing the workload on the system.

- Using CUBE, write an SQL query over the SH schema under your DWU account involving one fact table (SALES or COSTS) and at least two dimension tables and at least 3 grouping attributes. Provide output of successful execution of your query. Provide reasons why your query may be useful for users of the SH data warehouse.

- Using set operation UNION ALL (and not CUBE), write an SQL query that produces the same result as the query in (a) above. Provide output of successful execution of your query.

- Using EXPLAIN PLAN, provide a detailed discussion analysing costs of evaluating the above queries (i.e. with and without ROLLUP).

Part 2: Data Mining Tasks

This part is based on the UniTel scenario as described in Appendix 2. Moreover, you must use the DMUn Oracle Data Mining Account (where 1 <= n <= 75, e.g., DMU1, DMU2) allocated to your group.

Jessica is the customers relation manager at UniTel. She wants to know the possibility of potential churn of the company's customers based on previous experience, so she may be able take some actions accordingly to retain their customers.

To help Jessica in doing her analysis, we need to investigate what could be a suitable algorithm for solving her problem. The data from last year are used as the training data and the data of February of this year are taken as the testing data to verify the model accuracy. Data of all the columns are used to set up the model. To meet the requirement, many algorithms can be selected.

Oracle Data Mining (ODM) provides the following algorithms for classification:

- Decision Tree
- Naive Bayes
- Generalized Linear Models (GLM)
- Support Vector Machines (SVM)

You are required to perform the following tasks:

- Using PL/SQL API, SQL-Developer's Data Miner Workflows or RODM (R package for Interfacing ODM), develop at least TWO models based on the above algorithms for the dataset accessible as CUSTOMERCHURN table
- Evaluate capabilities of the models you have developed

- Present and discuss your findings and make recommendations for Jessica

Part 3

Critically evaluate the SH data warehouse and the UniTel's Customer Churn data in relation to theory and best practices of data quality and standards.

The report should be concise and comprehensive and in the region of 900-1000 words. You should use Harvard style of citation and referencing by following the guidelines in Pears and Shields (2008).

Appendix 1

The Sales History (SH) Data Warehouse

Reference no: EM132755320

Questions Cloud

Facilities operations : Lean operations create more value for customers while using fewer resources than traditional operations.
What effect does it have on stockholders equity : A corporation reacquires 60,000 shares of its own $10 par common stock for $3,000,000, recording it at cost. What effect does it have on stockholders' equity
Find what journal entry to record the bond issuance on dec : Find What is the journal entry to record the bond issuance on Dec 31, 20X0? Use 3 decimal point time value of money factors in calculating answers
Discuss the characteristics of the industry and company : Discuss the characteristics of the industry and company (and or product life cycle) that would be the most suitable for each of the budgeting approaches
Knowledge of key concepts of data warehousing : Demonstrate deep knowledge of key concepts of data warehousing, data analytics, data standards, and data quality - Develop critical awareness
What is the effect of the stock dividend : An owner of 2,500 shares of Simmons Company common stock receives a stock dividend of 50 shares. What is the effect of the stock dividend
New manufacturing facility : Analyze the national differences in political economy between these three countries
Make a statement of stockholders equity for the month : Common Stock, $50,000; Additional Paid-In Capital, $2,000,000; Make a statement of stockholders' equity for the month ended June 30
Describe the functional currency for lucky trader ltd : Describe the functional currency for Lucky Trader Ltd. The stock was shipped FOB on the 10th December 2019 when the exchange rate was $1: NAD13.00

Reviews

Write a Review

PL-SQL Programming Questions & Answers

  Question 1write a perl subroutine for temperature

question 1write a perl subroutine for temperature conversion named converttemp. it should be able to handle both

  Design queries using professional principles and standards

Design the below queries using professional principles and standards. A set of SQL Statements that returns all rows and all data for each table in database.

  1 a table scan is reading every record from the table in a

1. a table scan is reading every record from the table in a sequential order to find the data that a query is looking

  Create database of customers table

1) Add 10 customers to the database using the Customers table. Show the table, including the SQL Code.2) Using SQL query, update more than one element in a customer record. Show before and after data, including the SQL code.

  Display customers table and its fields in sql

Display all the LastNames from the Customers table and any associated OrdersIDs from the Orders Table. Show the LastName even if they do not have any associated orders. Order the results by LastName in ascending order.

  List the movies which have more than one director

List the movies which have more than one director. Include the year, title, last and first name of directors. Sort the output by last name within title. (use subquery)

  Write sql queries for the books database

Write SQL queries for the books database (discussed in Section 18.3) that perform each of the following tasks: Select all authors from the Authors table with the columns in the order lastName, firstName and authorID.

  Create the movie theater database based upon the below erm

Write the SQL Data Definition Language (DDL) commands to create the movie theater database based upon the below Entity Relationship Model (ERM). Be sure to include commands to create tables, fields, keys, and relationships.

  Write ten sql select statements to query

Write Ten SQL SELECT statements to query the STUDENT schema you created for practice lab 2. Your Select Statements should run error-free and should be valid.

  Understand the concept of normalization

This assignment has been designed so that you understand the concept of Normalization.

  Write a sql expression to display the status

Write a SQL statement to display the Major of students with no duplications. Do not display student names.

  Write a plsql block and declare a variablenbspvsalnbspof

write a plsql block and declare a variablenbspvsalnbspof the type number. include the following statement in the

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd