Fundamental concepts of big data management

Assignment Help Other Subject
Reference no: EM133137009

ITECH2302 Big Data Management

Big Data Management Report

Purpose:
The assignment helps you grasp the fundamental concepts of big data management, related knowledge and the techniques, and practical software and tools which are required for developing big data projects.

Requirements: You are required to identify a suitable dataset, provide an analysis of the data, and recommend suitable Big Data Management strategies. This will be written up as a professional report.

Details
You will use the analytical tools taught on this course (including Jupyter notebooks, pySpark, Tableau) to explore, analyse and visualise a dataset of your choosing. An important part of this work is preparing a good quality report, which details your choices, analysis, and recommendations/conclusions. Also, that it is of an appropriate style.

The dataset should be chosen from the following repository:

UC Irvine Machine Learning Repository

The aim is to use the dataset allocated to provide interesting insights, trends and patterns amongst the data. Your intended audience is the CEO and middle management of the Company for whom you are employed, and who have tasked you with this analysis.

Tasks

Data choice. Choose any dataset from the repository that has at least five attributes, and for which the default task is classification. Transform this dataset into an appropriate one to load into your chosen analytics software.

Background information. Write a description of the dataset and project. Provide an overview of what the dataset is about, including from where and how it has been gathered, and for what purpose.

Data description. Describe how many instances does the dataset contain, how many attributes there are in the dataset, their names, and include which is the class attribute. Include in your description details of any missing values, and any other relevant characteristics. Use appropriate pandas functions to initially analyse the data, for instance descriptive statistics of each attribute, including description of the range of possible values of the attributes, and visualise these in a graphical format.

Initial analysis. You will need to make decisions about which features to include in your dataframe, and how to deal with missing values (if they exist). You might need preprocess the dataset attributes. Useful techniques will include remove certain attributes, exploring different ways of discretizing continuous attributes and replacing missing values. Discretizing is the conversion of numeric attributes into "nominal" ones by binning numeric values into intervals. If you replaced missing values explain what strategy you used to select a replacement of the missing values.

GroupBy analysis. Implement various aggregate functions that will provide interesting insights into the data. Use the GroupBy function in pandas to analyse the data.

Data visualisation. Choose any data visualisation techniques that will provide helpful insights into the data. This could include plotting chosen variables against each other, and displaying them in a linechart, or binning them and using a (stacked) histogram etc. Use whichever you prefer from either matplotlib (matplotlib.pyplot.hist), pandas (pandas.DataFrame.plot), seaborn (seaborn.histplot) and/or Tableau.

Data mining. Compare and contrast at least two different data mining algorithms on your data, for instance: SVN, neural networks, k-nearest neighbour, Apriori association rules, decision tree induction etc. For each experiment you run, describe the data you used for the experiments, that is, did you use the entire dataset of just a subset of it. You must include screenshots and results from the techniques you employ.

Discussion of findings. Explain your results and include the usefulness of the approaches for the purpose of the analysis. Include any assumptions that you may have made about the analysis. In this discussion you should explain what each algorithm provides to the overall analysis task. Summarize your main findings.

Big Data Management. The data you have used will have been very small in comparison with what might be considered "big data" in this course. In this section you are to draw conclusions about how the acquisition, storage, and subsequent analysis of the data would be different if this was truly a "big data" dataset. You are to make reference to the concepts learned about the "V's" of big data (velocity, volume.. etc), data warehouses, OLAP, business intelligence, HADOOP/Spark and so on. Explain how this dataset might have links to data that could be considered be too difficult or very complex to implement in a traditional SQL database, and traditional statistical analysis, and would therefore require Big Data storage and Big Data Analytics.

Report writing. Present your work in the form of a big data management report.

Attachment:- Big Data Management Report.rar

Reference no: EM133137009

Questions Cloud

What was the corporation total equity at the end : A corporation began the year 2019 with retained earnings of $67,000 and $2,500,000 in common stock. What was the corporation total equity at the end
Prepare journal entries necessary to reclassify machinery : The results of the company's operation in 2021 shows $950,000 revenue, Prepare all the journal entries necessary to reclassify the machinery as held-for-sale
Intermodal container and freight technology : Compare examples of intermodal container-freight technology. Discuss how new container tracking devices will improve international intermodal transport of goods
Explain human resource management : What were some of the most essential lessons learned in terms of human resource management? What made them do so?
Fundamental concepts of big data management : Fundamental concepts of big data management, related knowledge and the techniques, and practical software and tools which are required for developing big data
What are the specific hrm challenges in a networked firm : 1. What is the importance of understanding the context of the organization in strategic planning and risk management in sustaining the international HRM.
Prepare the journal entry on mann books : To settle the debt, Doran agrees to accept from Mann equipment with a fair value of $570,000, Prepare the journal entry on Mann books
Why should sally proceed with a team based incentive : An advantage of offering a piece-rate pay plan to the furniture builders at Metropolitan furniture is that it will help in rewarding extra efficiency because re
Create and chair a workplace health and safety committee : You have been given the task to create and chair a Workplace Health and Safety Committee at your organization. Provide a detailed analysis of who will be on the

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd