Write spark program that load data and analyze data quality

Assignment Help Programming Languages
Reference no: EM131796182

Write a Spark program that loads the data, analyzes data quality, provides a summary report, and reports your findings, abc is an eCommerce company, and as such our analysts work using the language of eCommerce. Here are some terms that are used in this task description:

- Shopper - an individual using an eCommerce website

- Session - the experience of a shopper on an eCommerce website within a single continuous period of time (if a shopper visits a site multiple times, sessions are split anywhere that there was at least a 30 minute break)

Conversion - a session which resulted in a purchase (one conversion can have multiple transactions)

- Marketing Strategy - Any modification to an eCommerce website that is targeted to a shopper and is executed with the intent that it will increase the likelihood of a purchase Fields

ssid - session identifier: used to link logs between files, it is a key composed of three values in the following format: user_id:site_id:session_start_time (session start time is taken from client side).

st - server timestamp: timestamp of when a web request was recorded on the server side

gr - determines assignment of a session to a control or experiment group

ad - indicates which marketing strategy a shopper was exposed to Data Assumptions

- A shopper can have more than one session (each session separated by at least a 30 min break)

- Each session should have exactly one session log

- There is one marketing strategy per session

- Each session has a corresponding features log

Report format

After loading the data, we expect you to summarize and group it and prepare:

1 ) a populated table (tsv format) with the following header:

Session start date at hourly granularity, site_id, gr, Ad, browser, number of sessions, number of conversions, number of transactions, sum of revenue

Notes:

Each row will contain aggregated data (key being first five columns)

Session start date at hourly granularity: 1464742123 -> 2016-06-01 00:00 (UTC)

2) a list of means and standard deviations for each feature per every (site_id, ad) pair Expected outcome

- Source code for Spark program to generate reports
- Report regarding data quality
- Reports with data summary

Reference no: EM131796182

Questions Cloud

Discussion on new sausage system with an installed cost : Dog Up! Franks is looking at a new sausage system with an installed cost of $440,000. This cost will be depreciated straight-line to zero over the project's.
Administration style and leadership approach : Given this administration's style and leadership approach, do you think the minutes of the board meeting reflect actual board meeting discussions?
What is the current year Subpart F deemed dividend to USCo : OutCo's current year E&P is $250,000 and its accumulated E&P is $18 million. What is the current year Subpart F deemed dividend to USCo
Scheduling in single machine schedule results : The EDD (early due date) scheduling in single machine schedule results in.
Write spark program that load data and analyze data quality : Write a Spark program that loads the data, analyzes data quality, provides a summary report, and reports your findings, abc is an eCommerce company
Independent and uniformly distributed between 0 and 99 : a) Assuming the numbers of cents involved are independent and uniformly distributed between 0 and 99:
Distribution of the total lifetime of n batteries : And not counting that new battery as a replacement? [Hint: Use the normal approximation to the distribution of the total lifetime of n batteries for a suitable
Describe the independent auditors responsibility : Many people confuse the responsibilities of the independent auditors. Describe the independent auditors' responsibility regarding financial statements.
What is the company weighted-average cost of capital : What is the company's weighted-average cost of capital if the corporate tax rate is 35%? (Do not round intermediate calculations.

Reviews

Write a Review

Programming Languages Questions & Answers

  Write a standalone method named count_letters

Write a standalone method named count_letters that returns the number of occurrences of a specified letter lines in an input string

  Write program to help students learn periods of geology time

You are taking a geology class, and the professor wants you to write a program to help students learn the periods of geologic time. The program should let the user enter a range of prehistoric dates (in millions of years), and then output the peri..

  What are biggest challenges in planning programming problem

What are the three biggest challenges in planning and designing a solution for a programming problem? What can you do to overcome these challenges?

  Write an if-then statement for the following code

Write an array that stores the square of the numbers from 1 to 5 in each element. Write out each element.

  Write a program that reads two times in military format

Write a program that reads two times in military format (e.g., 0900, 1730) and prints the number of hours and minutes between the two times. Note that the first time can come before or after the second time.

  Show the parse trees for each of the given sentences

Show the parse trees (which can be generated in ANTLRWorks) for each of the following sentences. Examples of SYNTACTICALLY INVALID Input Strings.

  Machine language used by sim virtual computer simulator

Write a program in machine language used by the SIM virtual computer simulator package to calculate the sum of the four numbers stored in memory addresses A1, A2, A3 and A4.

  Pseudocode of program that will prompt user for number

Write pseudocode and flowchart for a program which will prompt user for a number, prompt the user for an operator (+,-,*,/), prompt the user for another number.

  Convert the erd into a database schema

Convert the ERD into a database schema and make sure it is normalized. Do a strict conversion and build the GCI, Inc, revenue cycle database in Access using the procedure and specifications on the next pages.

  Write a program that determine which number produce sequence

Write a program that determines which numbers produce a happy sequence, within a series of test numbers. Main will call a user-defined function to read the first and last numbers for the range to test from the user.

  Deisgn class contains data fields for height and surfacearea

Deisgn a class named Rectangle that contains data fields for height width and surfaceArea and a method named computeSurfaceArea?

  Perform some analysis of the age and weight relationships

In this project you will combine the use of arrays and objects to perform some analysis of the age and weight relationships among a group of people. You have been contracted to write a program into which data about people, including their ages and..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd