Describe the data encoding that is required for task

Assignment Help Database Management System
Reference no: EM132120421 , Length: word count:1800

Data Mining Assignment -

In this assignment you are asked to explore the use of neural networks for classification and numeric prediction. You are also asked to carry out a data mining investigation on a real-world data file. You are required to write a report on your findings. You will be assessed on methodology, analysis of results and conclusions.

PART 1: CLASSIFICATION WITH NEURAL NETWORKS

This part involves the following file: heart-v1.arff

For the neural network training runs build a table with the following headings:

Run No

Architecture

Parameters

Train MSE

Train Error

Epochs

Test MSE

Test Error

1

23-10-5

lr=.0

0.5

30%

500

0.6

40%

1. Describe the data encoding that is required for this task. How many outputs and how many inputs will there be?

2. Develop a script to generate the necessary training, validation and test files. You might want to normalize the numeric attributes with Weka beforehand. Include your data preparation script as an appendix (not part of the page count).

3. Determine the "analyze" strategy that you will use.

4. Using Javanns carry out 5 train and rest runs for a network with 10 hidden nodes. Comment on the variation in the training runs and the degree of overfitting.

5. Experiment with different numbers of hidden nodes. What seems to be the right number of hidden nodes for this problem?

6. For 10 hidden nodes, explore different values of the learning rate. What do you conclude?

7. [Optional] Change the learning function to backprop-momentum. Explore different combinations of learning rate and momentum. What do you conclude?

8. Perform a run with 10 hidden nodes and no validation data. Stop training when the MSE is no longer changing. Get the classification error on the training and test data. Comment on the degree of overfitting.

9. Compare the classification accuracy of the neural classifiers with the classification accuracy of Weka J48 and Multilayer Perceptron.

Report Length Up to two pages.

PART 2: NUMERIC PREDICTIONWITH NEURAL NETWORKS

This part involves the following file: heart-v1.arff

The task is to predict the value of the Weight variable. Build a similar table of runs to the one in the previous question.

1. Describe the data encoding that is required for this task. How many outputs and how many inputs will there be? What scaling or normalization is required?

2. Modify your script from part 1 to generate the necessary training, validation and test files. You can use Weka to normalize all of the numeric attributes except for the class, ie weight attribute. You will need to write a suitable program to scale the weight to the range [0,1] and another one to reverse scale the neural net outputs to get the mean absolute error. Include your data preparation script as an appendix (not part of the page count).

3. Using Javanns carry out 5 train and test runs for a network with 5 hidden nodes. Comment on the variation in the training runs and the degree of overfitting. [Hint: When you are comparing the predictive accuracy of different models you don't have to reverse scale the output.]

4. Experiment with different numbers of hidden nodes. What seems to be the right number of hidden nodes for this problem?

5. For 5 hidden nodes, explore different values of the learning rate. What do you conclude?

6. [Optional] Change the learning function to backprop-momentum. Explore different combinations of learning rate and momentum. What do you conclude?

7. Perform a run with 5 hidden nodes and no validation data. Stop training when the MSE is no longer changing. Get the error on the training and test data. Comment on the degree of overfitting.

8. Compare the mean absolute error of the neural classifiers with the mean absolute error of Weka M5P and MultiLayer Perceptron.

Report Length Up to one page.

PART 3: DATA MINING

Choose EITHER the census data OR the movies data.

Choose - data/arff/UCI/adult.arff OR data/other/IMDB-movie-data.csv

The file adult.arff contains American census data collected several years ago. The file adult.names describes the data items. There is some further information about the data at the website.

The movie data is was collected from the IMDb web site which claims to be "the world's most popular and authoritative source for movie, TV and celebrity content".

It was collected to answer the question "How can we tell the greatness of a movie before it is released in cinema?"

IMDB-movie-data.csv has some changes from the kaggle file, mostly to make the genre information more usable.

Your task is to analyze the data with appropriate data mining techniques and identify any "golden nuggets" in the data. You are expected to use classification, clustering, association finding, attribute selection and visualization in your analysis, or to explain why a particular technique is not relevant. Be sure to give the rationale for each experiment.

Report Length Up to three pages.

Attachment:- Assignment Files.rar

Reference no: EM132120421

Questions Cloud

What is a good analogy of subnetting : What is a good analogy of Subnetting? What's a break down of how it is used within a company?
Positions from before the decimal point : Extract all digits at even positions from before the decimal point. Print them in reverse order.
Security of healthcare networks : Security of healthcare The literature review should be supported by at least three (3) academic (Journal/Conference) papers chosen from the current state of art
What are some advantages of artificial intelligence : What are some advantages of artificial intelligence?In your responses to others, discuss some of the disadvantages of AI.
Describe the data encoding that is required for task : COSC2110/COSC2111 Data Mining Assignment, RMIT University, Australia. Describe the data encoding that is required for this task
How does diversity affect social justice : How does diversity affect Social justice? What adjustments need to be made to facilitate participation by people with a disability in a workplace?
Prepare a presentation highlighting the accomplishments : Prepare a presentation highlighting the accomplishments and challenges faced by Whittle, who is known today as the "Father of the Modern Jet Engine."
A summary report on the impact of the 1925 air mail act : Review the website Airmail Service from the Smithsonian National Postal Museum that is dedicated to the history of the U.S. Air Mail Service.
Literature review on healthcare networks : Network technologies is rapidly growing in healthcare sector. Now, healthcare is considered as one of the emerging application of network-based applications

Reviews

len2120421

9/23/2018 10:47:41 PM

Word limit: 1800. This assignment counts for 25% of the total marks in this course. You can work on this assignment individually or in a group of 2. If you are working in a group please establish a group in Assignment 2 Group on Canvas. In this assignment you are asked to explore the use of neural networks for classification and numeric prediction. You are also asked to carry out a data mining investigation on a real-world data file. You are required to write a report on your findings. You will be assessed on methodology, analysis of results and conclusions.

Write a Review

Database Management System Questions & Answers

  Write a detail document about the various business rules

Identify the appropriate relationships among the entities and define the minimum and maximum cardinality of each relationship. Make some additional assumptions about the business rules if necessary.

  What is the assignment operator

What is the problem with the following statement? 60 + 5 = grade, What is the problem with the following statement? lastName = "smith, What is the assignment operator

  Determine the functional dependencies

Using your knowledge of the college environment, determine the functional dependencies that exist in the following?

  Design a database system for a small university

You are going to design a database system for a small university. The database will be used for scheduling classes. The university consists of multiple schools, which each having multiple departments.

  Describe examples in which data warehouses could be used

Describe four examples in which data warehouses and data mining could be used to support data processing and trend analysis in large organizational environment.

  Creates the dealerships star schema dimension table

Creates the DEALERSHIPS star schema dimension table. Creates the VEHICLES star schema dimension table.

  Write a query or queries to print the table description

Write a query or queries to print the table description and the contents of each table, and copy and paste the results to a file called sales. Write a 1- to 2-page paper describing each constraint you created and why it is necessary.

  Critically analyse data visualisations

FIT 2079 Data Visualisation - create effective data visualisations and describe the advantages, drawbacks and pitfalls of the visual presentation of data

  Write a vba code to extract the data

Create a form (frmUpdateMedia), write a VBA code to extract the data and display it in a listbox, dropdownbox or any from of grid. Integrate the spreadsheet data into the application.

  Generate the ddl to create the entities and attributes

Generate the DDL to create the entities, attributes, constraints, primary keys and foreign keys you identified in 3NF and name the script name as gallery.sql.

  Develop database solution to solve a real world data storage

ICT701 Relational Database Systems Assignment. To develop a relational database solution to solve a real world data storage and manipulation problem

  Evaluate the role of conceptual modeling

Database Systems and Administration - (ECM38IS) - Evaluate the role of conceptual modeling in the development of database systems and apply the entity-relationship modeling approach to a realistic scenario

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd