Design data pipeline assessment

Assignment Help Python Programming
Reference no: EM133000455

BDA601 Big Data and Analytics - Laureate International Universities

Assessment - Design Data Pipeline

Learning Outcome 1: Explain and evaluate the V's of Big Data (volume, velocity, variety, veracity, valence, and value)
Learning Outcome 2: Identify best practices in data collection and storage, including data security and privacy principles; and
Learning Outcome 3: Effectively report and communicate findings to an appropriate audience.

Task Summary
Critically analyse the online retail business case (see below) and write a 1,500-word report that:
a) Identifies various sources of data to build an effective data pipeline;
b) Identifies challenges in integrating the data from the sources and formulates a strategy to address those challenges; and
c) Describes a design for a storage and retrieval system for the data lake that uses commercial and/or open-source big data tools.

Context
A modern data-driven organisation must be able to collect and process large volumes of data and perform analytics at scale on that data. Thus, the establishment of a data pipeline is an essential first step in building a data-driven organisation. A data pipeline ingests data from various sources, integrates that data and stores that data in a ‘data lake', making that data available to everyone in the organisation.

This Assessment prepares you to identify potential sources of data, address challenges in integrating data and design an efficient ‘data lake' using the big data principles, practices and technologies covered in the learning materials.

Case Study

Big Retail is an online retail shop in Adelaide, Australia. Its website, at which its users can explore different products and promotions and place orders, has more than 100,000 visitors per month. During checkout, each customer has three options: 1) to login to an existing account; 2) to create a new account if they have not already registered; or 3) to checkout as a guest. Customers' account information is maintained by both the sales and marketing departments in their separate databases. The sales department maintains records of the transactions in their database. The information technology (IT) department maintains the website.

Every month, the marketing team releases a catalogue and promotions, which are made available on the website and emailed to the registered customers. The website is static; that is, all the customers see the same content, irrespective of their location, login status or purchase history.

Recently, Big Retail has experienced a significant slump in sales, despite its having a cost advantage over its competitors. A significant reduction in the number of visitors to the website and the conversion rate (i.e., the percentage of visitors who ultimately buy something) has also been observed. To regain its market share and increase its sales, the management team at Big Retail has decided to adopt a data-driven strategy. Specifically, the management team wants to use big data analytics to enable a customised customer experience through targeted campaigns, a recommender system and product association.

The first step in moving towards the data-driven approach is to establish a data pipeline. The essential purpose of the data pipeline is to ingest data from various sources, integrate the data and store the data in a ‘data lake' that can be readily accessed by both the management team and the data scientists.

Task Instructions

Critically analyse the above case study and write a 1,500-word report. In your report, ensure that you:

• Identify the potential data sources that align with the objectives of the organisation's data- driven strategy. You should consider both the internal and external data sources. For each data source identified, describe its characteristics. Make reasonable assumptions about the fields and format of the data for each of the sources;

• Identify the challenges that will arise in integrating the data from different sources and that must be resolved before the data are stored in the ‘data lake.' Articulate the steps necessary to address these issues;

• Describe the ‘data lake' that you designed to store the integrated data and make the data available for efficient retrieval by both the management team and data scientists. The system should be designed using a commercial and/or an open-source database, tools and frameworks. Demonstrate how the ‘data lake' meets the big data storage and retrieval requirements; and

• Provide a schematic of the overall data pipeline. The schematic should clearly depict the data sources, data integration steps, the components of the ‘data lake' and the interactions among all the entities.

Attachment:- Design Data Pipeline.rar

Reference no: EM133000455

Questions Cloud

Describe function of financial statements to different users : Describe the function of financial statements to different users.
Model evaluation assessment : Model Evaluation Assessment - Effectively report and communicate findings to an appropriate audience.
Visualisation and model development assessment : Visualisation and Model Development Assessment - Apply data science principles to the cleaning, manipulation, and visualisation of data
What is the value of purchase discount : On 9/4/2019 the company paid the full amount in cash assuming that the sales term was (2/10, n/30). What is the value of purchase discount
Design data pipeline assessment : Identify best practices in data collection and storage, including data security and privacy principles; and Effectively report and communicate findings
Identify best practices in data collection : Identify best practices in data collection and storage, including data security and privacy principles; and Effectively report and communicate findings
What the price at which you willing to purchase these bonds : If the market interest rate is 8% per annum, compounded semi-annually, what will be the price at which you will be willing to purchase these bonds
What percent of their goal has been reached : They want to make $4,923 for a club trip. To the nearest tenth, what percent of their goal has been reached
What the total manufacturing cost of job : The corporation applies manufacturing overhead to each job order on the basis of direct labor cost, What the total manufacturing cost of Job

Reviews

Write a Review

Python Programming Questions & Answers

  Display positions of the first occurrence of common letters

Write program named common Letter.py that prompts user to enter two words, and display the positions of the first occurrence of common letter in both words.

  Display the volume of the sphere in the output function

Write a python program that will find the volume of a sphere using the formula. Display the volume of the sphere in the output function.

  Create a function that expects no arguments

Create a function called allButMax that expects no arguments. Instead, this functiongets its input from the user at the keyboard.

  Define a function that takes an argument

Define a function that takes an argument. Call the function. Identify what code is the argument and what code is the parameter.

  Design a python program to analyze weather data

Design a Python program to analyze weather data. Ask the user to enter the daily high temperature of 5 days. Store the termeratures in a list.

  Define a function that can accept two strings as input

Define a function that can accept two strings as input and print the string with maximum length in console. If two strings have the same length.

  Write a function that computes the commission

Write a function that computes the commission. The header is: def compute Commission salesAmount: The commission rate is 0.09.

  Question 1 research 5-8 species within one family of birds

question 1 research 5-8 species within one family of birds. be sure to use primary or very good secondary literature

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  How does learning python or other languages benefit

How does learning Python or other languages benefit you? As a developer, it is important that you find the right language to master.

  Code a console-based program in python

CP1404/CP5632 2016 SP2/22/52 Shopping List 1.0 - You are to plan and then code a console-based program in Python 3, as described in the following information and sample output. This assignment will help you build skills using selection, repetition,..

  Define the playround method with only the self parameter

Define the playRound method with only the self parameter. Make a call to another instance method called doRolls that takes no arguements.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd