Plot the data from each dataset using a scatter plot

Assignment Help Applied Statistics
Reference no: EM132230053 , Length: word count:1000

Assignment - Introduction to Machine Learning

Format is needed in a R Markdown report.

Data sets included: binary-classifier-data.csv and trinary-classifier-data.csv

Regression algorithms are used to predict numeric quantity while classification algorithms predict categorical outcomes. A spam filter is an example use case for a classification algorithm. The input dataset is emails labeled as either spam (i.e. junk emails) or ham (i.e. good emails). The classification algorithm uses features extracted from the emails to learn which emails fall into which category.

In this problem, you will use the nearest neighbors algorithm to fit a model on two simplified datasets. The first dataset (found in binary-classifier-data.csv) contains three variables; label, x, and y. The label variable is either 0 or 1 and is the output we want to predict using the x and y variables. The second dataset (found in trinary-classifier-data.csv) is similar to the first dataset except for the label variable can be 0, 1, or 2.

Note that in real-world datasets, your labels are usually not numbers, but text-based descriptions of the categories (e.g. spam or ham). In practice, you will encode categorical variables into numeric values.

a. Plot the data from each dataset using a scatter plot.

b. The k nearest neighbors algorithm categorizes an input value by looking at the labels for the k nearest points and assigning a category based on the most common label. In this problem, you will determine which points are nearest by calculating the Euclidean distance between two points. As a refresher, the Euclidean distance between two points: p1 = (x1, y1) and p2 = (x2, y2) is d = √((x1-x2)2+(y1-y2)2).

Fitting a model is when you use the input data to create a predictive model. There are various metrics you can use to determine how well your model fits the data. You will learn more about these metrics in later lessons. For this problem, you will focus on a single metric; accuracy. Accuracy is simply the percentage of how often the model predicts the correct result. If the model always predicts the correct result, it is 100% accurate. If the model always predicts the incorrect result, it is 0% accurate.

Fit a k nearest neighbors model for each dataset for k=3, k=5, k=10, k=15, k=20, and k=25. Compute the accuracy of the resulting models for each value of k. Plot the results in a graph where the x-axis is the different values of k and the y-axis is the accuracy of the model.

c. In later lessons, you will learn about linear classifiers. These algorithms work by defining a decision boundary that separates the different categories.

295_figure.png

Looking back at the plots of the data, do you think a linear classifier would work well on these datasets?

Attachment:- Assignment Files.rar

Reference no: EM132230053

Questions Cloud

Create a scatter plot of resultant clusters for each value : Clustering Assignment - Fit the dataset using the k-means algorithm from k=2 to k=12. Create a scatter plot of the resultant clusters for each value of k
Why is it important to make knowledge work visible : Why is it important to make knowledge work visible? In the technology value stream, which best describes lead time?
Expanding its product line to include three new products : Alan Industries is expanding its product line to include three new products. Calculate the objective value using Excel Solver.
How do you think that the problem can be at least reduced : How do you think that this problem can be at least reduced, if not solved? What do you think about the idea of "ZERO WASTE" as a goal for individuals.
Plot the data from each dataset using a scatter plot : Assignment - Introduction to Machine Learning - Assignment - Introduction to Machine Learning. Format is needed in a R Markdown report
Samsung releases new version of their galaxy phone : Discuss how Apple might respond if Samsung releases a new version of their Galaxy phone which has a brighter screen, faster processor,
Describe four different data sets from the movie : Describe four different data sets from the movie that led scientists to the discovery of global dimming, and explain how the Clean Air Act helped to eliminate.
Two examples of what partial productivity statistic : Give at least two examples of what a partial productivity statistic would be for some of the inputs that you identified.
What did you learn from the experience : In a well-crafted discussion post of at least 200 words, report on the results of all five weeks of the Ecological Footprint Reduction Project.

Reviews

len2230053

2/8/2019 12:58:34 AM

Need 1000+ words report. Regression algorithms are used to predict numeric quantity while classification algorithms predict categorical outcomes. A spam filter is an example use case for a classification algorithm. The input dataset is emails labeled as either spam (i.e. junk emails) or ham (i.e. good emails). The classification algorithm uses features extracted from the emails to learn which emails fall into which category. In this problem, you will use the nearest neighbors algorithm to fit a model on two simplified datasets.

len2230053

2/8/2019 12:58:28 AM

The first dataset (found in binary-classifier-data.csv) contains three variables; label, x, and y. The label variable is either 0 or 1 and is the output we want to predict using the x and y variables. The second dataset (found in trinary-classifier-data.csv) is similar to the first dataset except for the label variable can be 0, 1, or 2. This is needed in a R Markdown Report. Note that in real-world datasets, your labels are usually not numbers, but text-based descriptions of the categories (e.g. spam or ham). In practice, you will encode categorical variables into numeric values.

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd