Programming for big data assignment

Assignment Help Applied Statistics
Reference no: EM132587664

Programming for Big Data - Higher Diploma in Data Analytics

Project Description

You are required to carry out a series of analyses on publicly accessible datasets using the R programming language used in this module and programming environments suitable for the task. It is recommended that your use at least two separate datasets. For each of the chosen datasets you are required to compile a report of your analysis. Each dataset should have at least 1,000 records (rows). If you are unsure if your dataset(s) is/are appropriate, please check with your lecturer. You must provide evidence in your report that you are authorized to use the dataset(s) that you have chosen.

The main deliverable is a report that provides significant insights into the datasets that you have chosen to analyse. Your report should provide at least four unique insights based on your data analysis. Examples of insights might include relationships, trends/patterns, correlations, models based on the data, visuals, and statistical analyses.

All deliverables should be compiled into a project report document for submission including all programming code elements in an appendix. Please submit your report via the Turnitin upload link in Moodle. R scripts and additional files are to be uploaded to a separate link in Moodle. Your project report should discuss the challenges that you encountered while handling your chosen datasets and the means and mechanisms you implemented to overcome these challenges. The word count for your report should be not less than 2,000 words, and not more than 2,500 words (not counting R code).

Structure and Rating Grid

• Description of the objective(s) of the analysis with reference to basic domain literature to explain the domain purpose of the analyses

• Description of the underlying dataset including an assessment of the data types present, with an emphasis on the data that is actually used in the analytical processes

• Approach to the analysis, aided by visuals such as diagrams, flowcharts, tables, and pseudocode, where appropriate

• R code demonstrating at least four unique insights. R scripts will be executed as part of the assessment process. It is expected that scripts are fully working, efficient, commented clearly, and do not contain excess code

• Project report structure, presentation and discussion of challenges.

Attachment:- Diploma in Data Analytics.rar

Reference no: EM132587664

Questions Cloud

Estimates for vendors a and b : A large manufacturing firm can procure one equipment item from two suppliers -firm (A) and firm (B). Approximately 100,000 unites
Find the gain or loss on sale of january : Find the gain or loss on sale of January 2, 2018 to be recognized directly in the retained earnings is, fair value through other comprehensive income
Outline the potential change in risk and size current : Outline the potential change in risk and size current health care reform will have in the U.S.> What prediction about supply would you make
What situations does the integrated audit apply : How might the auditor use evidence obtained in the audit of the financial statements when concluding on the effectiveness of internal control over financial
Programming for big data assignment : Programming for Big Data Assignment Help and Solution, Higher Diploma in Data Analytics - basic domain literature to explain the domain purpose of the analyses
What constitutes security policy framework : What constitutes a security policy framework? Discuss the elements of this summary, what elements are essential, and which elements could be optional.
How much of the amount should be distributed to each partner : Who gets the $16,000? Determine how much of this amount should be distributed to each partner. (Do not round intermediate calculations.)
SOX Compliance Journey at Trinity Industries : Referring to this week's reading, "The SOX Compliance Journey at Trinity Industries," discuss the how well you think Trinity's 2008 governance,
Should a small family-owned business spend the effort : Should a small family-owned business spend the effort to adjust to the accrual basis of accounting? Defend your answer. Discuss in detail?

Reviews

Write a Review

Applied Statistics Questions & Answers

  If you were conducting a two sample t test to compare two me

If you were conducting a two sample T-test to compare two means, which of the following would allow you to properly use the pooled method in order to perform the test? A) If the larger sample standard deviation was 5 and the smaller sample standard d..

  Reflection to date

Reflection to date

  Perform a suitable statistical analysis on dataset

BUS708 Statistics and Data Analysis Statistical Modelling Assignment - Perform a suitable statistical analysis on dataset 2 (the one you collected)

  Design an application to facilitate the estimating process

Assignment - VBA and Spreadsheet - Your objective is to design an application to facilitate the estimating process for IndCon

  Compare the cluster centroid to characterize clusters

Frequent Flyers and Marketing - Compare the cluster centroid to characterize the different clusters, and try to give each cluster a label

  Match the linear correlation coefficient to the scatter diag

Match the linear correlation coefficient to the scatter diagram.

  Distribution using the kernel density function

Ten samples are available, i.e., -1.3499, 3.0349, 0.7254, -0.0631, 0.7147, -0.2050, -0.1241, 1.4897, 1.4090, 1.4172, to fit a distribution using the Kernel Density Function. If a Gaussian Kernel Function is selected with a bandwidth of 0.7693, estima..

  Formulate the null and alternative hypotheses

Formulate the null and alternative hypotheses. Interpret the findings for the company - The company has been approached by two different companies

  Unions often oppose imports

Unions often oppose imports from what they call low-wage countries and advocate trade barriers to protect jobs from what they often characterise as "unfair" import competition" please critically analise this statement

  Perform a test that will compare all four means

MBALN 603  - What assumption do you need to make about the population of interest to construct the confidence intervals ?

  Determine the number of stores

Determine the number of stores that must be sampled in order to estimate the true proportion to within 0.05 with 95% confidence using the large-sample method.

  Number of days absent per term for all the students

A random sample of [n=64 children] of working mothers showed that they were absent from school a sample average of [x=5.3] days per term, with a standard deviation [s=1.8 days].  Provide a 96% confidence interval for the average number of days absent..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd