Describe the classification problem and data preprocessing

Assignment Help Management Information Sys
Reference no: EM132134635

Task description: Data Engineering and Mining

The data set comes from the Kaggle Digit Recognizer competition. The goal is to recognize digits 0 to 9 in handwriting images. Because the original data set is large, I have systematically sampled 10% of the data by selecting the 10th, 20th examples and so on.

You are going to use the sampled data to construct prediction models using multiple machine learning algorithms that we have learned recently: nai¨ve Bayes, kNN and SVM algorithms. Tune their parameters to get the best model (measured by cross validation) and compare which algorithms provide better model for this task.

Report structure:

Section 1: Introduction

Briefly describe the classification problem and general data preprocessing.

Note that some data preprocessing steps maybe specific to a particular algorithm. Report those steps under each algorithm section.

Section 3: Nai¨ve Bayes

Build a nai¨ve Bayes model. Tune the parameters, such as the discretization options, to compare results.

Section 3: K-Nearest Neighbor method Section 4: Support Vector Machine (SVM)

Section 4: Algorithm performance comparison

Compare the results from the two algorithms. Which one reached higher accuracy? Which one runs faster? Can you explain why?

Reference no: EM132134635

Questions Cloud

Identification of current skilled information systems : You are a newly appointed Chief Information Officer (CIO) of a $25 million dollar data collection and analysis company .
How nominal and ordinal data relate to a rating scale : Explain how nominal and ordinal data relate to a rating scale. List at least 2 quantitative attributes of outdoor sporting goods that market researchers might
Should children or teens receive guidelines for screen time : Should children or teens receive guidelines for screen time and social media use? Draft three guidelines that seem reasonable for children or teens.
Discuss the problem of underserved populations and subgroups : Discuss the problem of underserved populations and subgroups, includingcharacteristics of those groups and barriers to delivery
Describe the classification problem and data preprocessing : Briefly describe the classification problem and general data preprocessing. Compare the results from the two algorithms.
How should the transaction price be allocated : Oriole prices these services with a 20% margin relative to cost. How should the transaction price of $1,100,000 be allocated among the service obligations
Identify which theory or theories best exemplify : describe why these individuals are so successful and identify which theory or theories best exemplify their leadership style.
How the framework of the ebk can be adapted : Recommend three countermeasures that could enhance the information security measures of an enterprise. Justify your recommendations.
What is the probability that exactly two households withdrew : a. What is the probability that exactly two households withdrew funds from a retirement account for needs other than? retirement?

Reviews

Write a Review

Management Information Sys Questions & Answers

  Create your own sheet and eplain the collected data

As an administrator, you recorded the requests from different locations (i.e. a few states are listed) in the sheet below, you may create your own sheet.

  Analyze the global business processes used in hcit

Consider the following events and technological advantages: Governmental Programs Documenting Care and automation Financial requirements and automation Analyze the global business processes used in Health Care Information Technology (HCIT) and the..

  Who are the users of the encryption technology

When should this encryption technology be used? Who are the users of the encryption technology? What are the benefits/disadvantages of this technology?

  Develop a list of the software needed to restore operations

Establish a sample hardware asset list for this company and classify those assets as tier 1, 2, or 3 assets. Develop a list of the software needed to restore operations of the small bussiness.

  Explain how was this obstacle managed

Question about Resistance to change - How was this obstacle managed? What actions would you now recommend and why?

  Identify potential services you would suggest for company

The HR and executives' departments will need dedicated access and it is preferred to be mobile. Identify what type of assets you would choose to fulfill the company's needs. Identify potential services you would suggest for the company. There a..

  Construct a block diagram of delta sigma modulation system

Construct a block diagram of the delta-sigma modulation system in such a way that it provides an interpretation of the system.

  Explain five web sites you can go to for information

List and explain five Web sites you can go to for information about recent U.S. or World Court laws or court cases involving IT issues.

  Compare the two categories of algorithms

Compare the two (2) categories of algorithms, and determine the major advantages and disadvantages of each.

  Identify a company in your area and ask about its recovery

List the risks associated with the company's choice of recovery location and possible mitigation strategies.

  How advances in information technology have changed

Read, Improving the Efficiency, Accuracy, And Cost Effectiveness of Core Business Activities and distinguish how advances in information technology have changed the way businesses are conducted.

  What is an example of a data mining concept what is the

what is an example of a data mining concept? what is the key benefit of a data warehouse? what is the caveat of

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd