Plot the data from each dataset using a scatter plot

Assignment Help Applied Statistics

Reference no: EM132230053 , Length: word count:1000

Assignment - Introduction to Machine Learning

Format is needed in a R Markdown report.

Data sets included: binary-classifier-data.csv and trinary-classifier-data.csv

Regression algorithms are used to predict numeric quantity while classification algorithms predict categorical outcomes. A spam filter is an example use case for a classification algorithm. The input dataset is emails labeled as either spam (i.e. junk emails) or ham (i.e. good emails). The classification algorithm uses features extracted from the emails to learn which emails fall into which category.

In this problem, you will use the nearest neighbors algorithm to fit a model on two simplified datasets. The first dataset (found in binary-classifier-data.csv) contains three variables; label, x, and y. The label variable is either 0 or 1 and is the output we want to predict using the x and y variables. The second dataset (found in trinary-classifier-data.csv) is similar to the first dataset except for the label variable can be 0, 1, or 2.

Note that in real-world datasets, your labels are usually not numbers, but text-based descriptions of the categories (e.g. spam or ham). In practice, you will encode categorical variables into numeric values.

a. Plot the data from each dataset using a scatter plot.

b. The k nearest neighbors algorithm categorizes an input value by looking at the labels for the k nearest points and assigning a category based on the most common label. In this problem, you will determine which points are nearest by calculating the Euclidean distance between two points. As a refresher, the Euclidean distance between two points: p₁ = (x₁, y₁) and p₂ = (x₂, y₂) is d = √((x₁-x₂)²+(y₁-y₂)²).

Fitting a model is when you use the input data to create a predictive model. There are various metrics you can use to determine how well your model fits the data. You will learn more about these metrics in later lessons. For this problem, you will focus on a single metric; accuracy. Accuracy is simply the percentage of how often the model predicts the correct result. If the model always predicts the correct result, it is 100% accurate. If the model always predicts the incorrect result, it is 0% accurate.

Fit a k nearest neighbors model for each dataset for k=3, k=5, k=10, k=15, k=20, and k=25. Compute the accuracy of the resulting models for each value of k. Plot the results in a graph where the x-axis is the different values of k and the y-axis is the accuracy of the model.

c. In later lessons, you will learn about linear classifiers. These algorithms work by defining a decision boundary that separates the different categories.

Looking back at the plots of the data, do you think a linear classifier would work well on these datasets?

Attachment:- Assignment Files.rar

Reference no: EM132230053

Questions Cloud

Create a scatter plot of resultant clusters for each value : Clustering Assignment - Fit the dataset using the k-means algorithm from k=2 to k=12. Create a scatter plot of the resultant clusters for each value of k

Why is it important to make knowledge work visible : Why is it important to make knowledge work visible? In the technology value stream, which best describes lead time?

Expanding its product line to include three new products : Alan Industries is expanding its product line to include three new products. Calculate the objective value using Excel Solver.

How do you think that the problem can be at least reduced : How do you think that this problem can be at least reduced, if not solved? What do you think about the idea of "ZERO WASTE" as a goal for individuals.

Plot the data from each dataset using a scatter plot : Assignment - Introduction to Machine Learning - Assignment - Introduction to Machine Learning. Format is needed in a R Markdown report

Samsung releases new version of their galaxy phone : Discuss how Apple might respond if Samsung releases a new version of their Galaxy phone which has a brighter screen, faster processor,

Describe four different data sets from the movie : Describe four different data sets from the movie that led scientists to the discovery of global dimming, and explain how the Clean Air Act helped to eliminate.

Two examples of what partial productivity statistic : Give at least two examples of what a partial productivity statistic would be for some of the inputs that you identified.

What did you learn from the experience : In a well-crafted discussion post of at least 200 words, report on the results of all five weeks of the Ecological Footprint Reduction Project.

Reviews

len2230053

2/8/2019 12:58:34 AM

Need 1000+ words report. Regression algorithms are used to predict numeric quantity while classification algorithms predict categorical outcomes. A spam filter is an example use case for a classification algorithm. The input dataset is emails labeled as either spam (i.e. junk emails) or ham (i.e. good emails). The classification algorithm uses features extracted from the emails to learn which emails fall into which category. In this problem, you will use the nearest neighbors algorithm to fit a model on two simplified datasets.

2/8/2019 12:58:28 AM

The first dataset (found in binary-classifier-data.csv) contains three variables; label, x, and y. The label variable is either 0 or 1 and is the output we want to predict using the x and y variables. The second dataset (found in trinary-classifier-data.csv) is similar to the first dataset except for the label variable can be 0, 1, or 2. This is needed in a R Markdown Report. Note that in real-world datasets, your labels are usually not numbers, but text-based descriptions of the categories (e.g. spam or ham). In practice, you will encode categorical variables into numeric values.

Write a Review

Required(*) Message

User Account

All Pages