Solution-Building a regression model with spark

Building a regression model with spark

Assignment Help Database Management System

Reference no: EM132119741

Big Data Assignment -

Regression Models - Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.

Regression models can be used to predict just about any variable of interest. A few examples include the following:

Predicting stock returns and other economic variables
Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns

In the different sections of this chapter, we will do the following:

Introduce the various types of regression models available in ML

Explore feature extraction and target variable transformation for regression models
Train a number of regression models using ML
Building a Regression Model with Spark
See how to make predictions using the trained model
Investigate the impact on performance of various parameter settings for regression using cross-validation

Types of regression models - The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as features or independent variables).

y = f(w^Tx)

Here, y is the target variable, w is the vector of parameters (known as the weight vector), and x is the vector of input features. w^Tx is the linear predictor (or vector dot product) of the weight vector w and feature vector x. To this linear predictor, we applied a function f (called the link function). Linear models can, in fact, be used for both classification and regression simply by changing the link function. Standard linear regression uses an identity link (that is, y = w^Tx directly), while binary classification uses alternative link functions as discussed here.

Spark's ML library offers different regression models, which are as follows:

Linear regression
Generalized linear regression
Logistical regression
Decision trees
Random forest regression
Gradient boosted trees
Survival regression
Isotonic regression
Ridge regression

Regression models define the relationship between a dependent variable and one or more independent variables. It builds the best model that fits the values of independent variables or features.

Linear regression unlike classification models such as support vector machines and logistic regression is used for predicting the value of a dependent variable with generalized value rather than predicting the exact class label.

Linear regression models are essentially the same as their classification counterparts, the only difference is that linear regression models use a different loss function, related link function, and decision function. Spark ML provides a standard least squares regression model (although other types of generalized linear models for regression are planned).

Assignment -

1. Utilising Python 3 Build the following regression models:

Decision Tree
Gradient Boosted Tree
Linear regression

2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle.

3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2

Gradient boost tree iterations
Gradient boost tree Max Bins

4. Build the following in relation to the decision tree and the dataset choosen in step 2

Decision Tree Categorical features
Decision Tree Log
Decision Tree Max Bins
Decision Tree Max Depth

5. Build the following in relation to the linear regression and the dataset choosen in step 2

a) Linear regression Cross Validation

Intercept
Iterations
Step size
L1 Regularization
L2 Regularization

b) Linear regression Log (see section 5.4)

6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1, 3, 4 and 5.

Attachment:- Assignment Files.rar

Verified Expert

The regression line is constructed by optimizing the parameters of the straight line function such that the line best fits a sample of (x, y) observations where y is a variable dependent on the value of x. Regression analysis is used extensively in economics, risk management, and trading. One cool application of regression analysis is in calibrating certain optimistics results

Reference no: EM132119741

Questions Cloud

What is the time necessary for crossing : What is the time necessary for crossing if the hunter wishes to move neither up- stream nor downstream while crossing the river? Answer in units of min.

Transmission axes of the polarizers : What should be the angle # between the transmission axes of the polarizers if it is desired that one-tenth of the incident intensity be transmitted?

What is the ideal speed : What is the ideal speed to take a 80 m radius curve banked at a 30.0° angle?

Determine the x component of velocity : A particle starts from the origin at t = 0 with an initial velocity having an x component of 26.6 m/s and a y component of -14.8 m/s.

Building a regression model with spark : ICT707 Big Data Assignment - Explore feature extraction and target variable transformation for regression models - Building a Regression Model with Spark

Collisions and reflections of confined gas : What is magnitude of the average force experienced by one of the walls of this cube due to the collisions and reflections of this confined gas?

Simple machines to move the massive stone blocks : Discuss how you might use one or more of the simple machines to move the massive stone blocks up the growing pyramid, and into their proper places.

Why earth is not shaped like a cube : Why Earth is not shaped like a cube? Describe 4 reasons with evidence to support each one.

Reviews

urv2119741

11/3/2018 3:27:59 AM

Please add following points in the file which I mentioned below: 1. Introduction 2. Objectives 3. Data source 4. Structure of the database 5. Explanation of machine learning 6. Data preparation 7. Testing 8. Result and recommendation 9. Conclusion 10. References The assignment was done according to the required instructions and intact it was done methodically and i am very satisfied with it, Infact it was ready before the deadline. I would recommend this site to everyone else.

11/3/2018 3:24:31 AM

Big Data Assignment Marking Criteria The Big Data Assignment is comprised of two parts: ? The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard). ? The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided).

Write a Review

Required(*) Message

User Account

All Pages

Building a regression model with spark

Reference no: EM132119741

Reference no: EM132119741

Questions Cloud

Reviews

urv2119741

urv2119741

Write a Review

Database Management System Questions & Answers

Knowledge and data warehousing

Design a database schema

Entity-relationship diagram

Implement a database of courses and students for a school

Prepare the e-r diagram for the movie database

Design relation schemas for the entire database

Prepare the relational schema for database

Data modeling and normalization

Use cases perform a requirements analysis for the case study

Knowledge and data warehousing

Stack and queue data structure

Practice on topic of normalization

Assured A++ Grade

Academics

Major Subjects

Majors

Get In Touch

TERMS & POLICIES

HELP & SUPPORT