Write a description of the dataset and project

Assignment Help Other Subject
Reference no: EM133531008

Big Data and Analytics

Assignment - Analytics Report

Overview

The purpose of this task is to provide students with practical experience in writing a data analytical report to provide useful insights, pattern and trends in a chosen dataset in the light of a set of tasks required within this document. This dataset will be chosen from the UC Irvine Machine Learning Repository1. This activity will give students the opportunity to show innovation and creativity in applying the WEKA data mining software, and designing useful visualization and data mining solutions presented as an analytics report.

Project Details

You will use an analytical tool (i.e. WEKA) to explore, analyse and visualise a dataset of your choosing. An important part of this work is preparing a good quality report, which details your choices, content, and analysis, and that is of an appropriate style.
The dataset should be chosen from the following repository:

UC Irvine Machine Learning Repository

The aim is to use the data set allocated to provide interesting insights, trends and patterns amongst the data. Your intended audience is the CEO and middle management of the Company for whom you are employed, and who have tasked you with this analysis.

Task 1 - Data choice. Choose any dataset from the repository that has at least five attributes, and for which the default task is classification. Transform this dataset into the ARFF format required by WEKA.

Task 2 - Background information. Write a description of the dataset and project, and its importance for the organization. Provide an overview of what the dataset is about, including from where and how it has been gathered, and for what purpose.

Task 3 - Data description. Describe how many instances does the dataset contain, how many attributes there are in the dataset, their names, and include which is the class attribute. Include in your description details of any missing values, and any other relevant characteristics. For at least 5 attributes, describe what is the range of possible values of the attributes, and visualise these in a graphical format.

Task 4 - Data preprocessing. Preprocess the dataset attributes using WEKA's filters. Useful techniques will include remove certain attributes, exploring different ways of discretizing continuous attributes and replacing missing values. Discretizing is the conversion of numeric attributes into "nominal" ones by binning numeric values into intervals2. Missing values in ARFF files are represented with the character "?"3. If you replaced missing values explain what strategy you used to select a replacement of the missing values. Use and describe at least three different preprocessing techniques.

Task 5 - Data mining. Compare and contrast at least three different data mining algorithms on your data, for instance:. k-nearest neighbour, Apriori association rules, decision tree induction. For each experiment you ran describe: the data you used for the experiments, that is, did you use the entire dataset of just a subset of it. You must include screenshots and results from the techniques you employ.

Task 6 - Discussion of findings. Explain your results and include the usefulness of the approaches for the purpose of the analysis. Include any assumptions that you may have made about the analysis. In this discussion you should explain what each algorithm provides to the overall analysis task. Summarize your main findings.

Task 7 - Report writing. Present your work in the form of an analytics report.

Your report will include the following in the order provided below:

• A Cover/ Front page with Title, Author, Student ID
• Table of contents
• Sections:
• Background information
• Data description
• Data preprocessing
• Data mining
• Discussion of findings
• Conclusion
• Appendix: Data choice/ arff file (example 30 rows of data)

Reference no: EM133531008

Questions Cloud

Potential investor in firm or perhaps buyer of business : As a potential investor in a firm or perhaps the buyer of a business, would it be advisable for you to evaluate the company's financial statements?
Identify each of the four major transportation modes : Identify each of the four major transportation modes and list their advantages and disadvantages. how a firm's cost, lead time, and associated risks might be
Summarized information is because too much data leads : One of the reasons that financial statements provide summarized information is because too much data leads to
What is the expected value of sample information : What is the expected value of sample information (EVSI) - What is expected value of perfect information (EVPI)? provide a corresponding risk profile for
Write a description of the dataset and project : ITECH1103 Big Data and Analytics, Federation University - Write a description of the dataset and project, and its importance for the organization
Why would he have grounds to dispute this rating : As a result of his decline in sales, Xavier recently received a bad job performance rating. Why would he have grounds to dispute this rating?
How to push back on legal negotiations on contract terms : How to push back on legal negotiations on contract terms for geotechnical consultants , such as data accuracy responsibility, third party work, usefulness of op
What is the expected profit associated with the bid : what is the probability that Strassel will be able to obtain the property with the bid of 125,000? What is the expected profit associated with the bid
How many shares will be outstanding after the stock split : Facto Corp has 10,000 shares of common stock outstanding with a par value of 0.01 per share. How many shares will be outstanding after the stock split?

Reviews

Write a Review

Other Subject Questions & Answers

  Describe the expected interest groups

Describe the expected interplay between demanders and suppliers, interest groups and analyze the public policy environment.

  What recommendations would you make to resolve given issues

What recommendations would you make to resolve these issues? Please include a timeline. Apply the normative theories of business ethics to this situation.

  What is the impact of resource endowments

Need your help for writing essay (5 pages not including the cover page and references page) about factory endowment theory.

  What are the common signs and symptoms of psoriasis

Mrs. Kiley is a 44-year-old white woman with a 2-year history of psoriasis. Her family history includes her father with allergies and asthma and her mother.

  Which instructors are planning on being involved

Which instructors are planning on being involved? What methods will be used to determine student eligibility for the program?

  Reflect on what you have learned about visuals

Throughout this course, we have thoroughly examined the effects visuals have on viewers. We have analyzed the sensory and perceptual responses of viewers.

  Production of a high-quality research report

Detailed research methodology addressing the qualitative-quantitative debate, hypothesis - envisaged analysis and interpretation.

  Make a fake application form for becoming a pope

RELI 101 Cardinal Carter Catholic Secondary School make a fake application form for becoming a Pope. Create 10 questions that do not give Yes or No answers

  Identify the main active ingredient of the pesticide

Identify the main active ingredient of the pesticide and how the pesticide is used. Identify and describe which of the four cornerstones of xenobiotic.

  Explain immigration reform is vital for national security

Explain Immigration reform is vital for national security. Illegal immigrants are cheaper to employ in certain industries which keeps costs down for consumers.

  Needs an explanation from a student nurse practitioner

Tie the concepts together by clearly defining them and how they could be measured. Needs an explanation from a student nurse practitioner's point of view

  Write about eliminating the use of mobile device

Under this thread you will write about your first day of eliminating the use of your mobile device or computer for any communication or accessing information.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd