Building a regression model with spark

Assignment Help Database Management System
Reference no: EM132119741

Big Data Assignment -

Regression Models - Regression models are concerned with target variables that can take any real value. The underlying principle is to find a model that maps input features to predicted target variables. Regression is also a form of supervised learning.

Regression models can be used to predict just about any variable of interest. A few examples include the following:

  • Predicting stock returns and other economic variables
  • Predicting loss amounts for loan defaults (this can be combined with a classification model that predicts the probability of default, while the regression model predicts the amount in the case of a default)
  • Recommendations (the Alternating Least Squares factorization model from Chapter 5, Building a Recommendation Engine with Spark, uses linear regression in each iteration)
  • Predicting customer lifetime value (CLTV) in a retail, mobile, or other business, based on user behavior and spending patterns

In the different sections of this chapter, we will do the following:

Introduce the various types of regression models available in ML

  • Explore feature extraction and target variable transformation for regression models
  • Train a number of regression models using ML
  • Building a Regression Model with Spark
  • See how to make predictions using the trained model
  • Investigate the impact on performance of various parameter settings for regression using cross-validation

Types of regression models - The core idea of linear models (or generalized linear models) is that we model the predicted outcome of interest (often called the target or dependent variable) as a function of a simple linear predictor applied to the input variables (also referred to as features or independent variables).

y = f(wTx)

Here, y is the target variable, w is the vector of parameters (known as the weight vector), and x is the vector of input features. wTx is the linear predictor (or vector dot product) of the weight vector w and feature vector x. To this linear predictor, we applied a function f (called the link function). Linear models can, in fact, be used for both classification and regression simply by changing the link function. Standard linear regression uses an identity link (that is, y = wTx directly), while binary classification uses alternative link functions as discussed here.

Spark's ML library offers different regression models, which are as follows:

  • Linear regression
  • Generalized linear regression
  • Logistical regression
  • Decision trees
  • Random forest regression
  • Gradient boosted trees
  • Survival regression
  • Isotonic regression
  • Ridge regression

Regression models define the relationship between a dependent variable and one or more independent variables. It builds the best model that fits the values of independent variables or features.

Linear regression unlike classification models such as support vector machines and logistic regression is used for predicting the value of a dependent variable with generalized value rather than predicting the exact class label.

Linear regression models are essentially the same as their classification counterparts, the only difference is that linear regression models use a different loss function, related link function, and decision function. Spark ML provides a standard least squares regression model (although other types of generalized linear models for regression are planned).

Assignment -

1. Utilising Python 3 Build the following regression models:

  • Decision Tree
  • Gradient Boosted Tree
  • Linear regression

2. Select a dataset (other than the example dataset given in section 3) and apply the Decision Tree and Linear regression models created above. Choose a dataset from Kaggle.

3. Build the following in relation to the gradient boost tree and the dataset choosen in step 2

  • Gradient boost tree iterations
  • Gradient boost tree Max Bins

4. Build the following in relation to the decision tree and the dataset choosen in step 2

  • Decision Tree Categorical features
  • Decision Tree Log
  • Decision Tree Max Bins
  • Decision Tree Max Depth

5. Build the following in relation to the linear regression and the dataset choosen in step 2

a) Linear regression Cross Validation

  • Intercept
  • Iterations
  • Step size
  • L1 Regularization
  • L2 Regularization

b) Linear regression Log (see section 5.4)

6. Follow the provided example of the Bike sharing data set and the guide lines in the sections that follow this section to develop the requirements given in steps 1, 3, 4 and 5.

Attachment:- Assignment Files.rar

Verified Expert

The regression line is constructed by optimizing the parameters of the straight line function such that the line best fits a sample of (x, y) observations where y is a variable dependent on the value of x. Regression analysis is used extensively in economics, risk management, and trading. One cool application of regression analysis is in calibrating certain optimistics results

Reference no: EM132119741

Questions Cloud

What is the time necessary for crossing : What is the time necessary for crossing if the hunter wishes to move neither up- stream nor downstream while crossing the river? Answer in units of min.
Transmission axes of the polarizers : What should be the angle # between the transmission axes of the polarizers if it is desired that one-tenth of the incident intensity be transmitted?
What is the ideal speed : What is the ideal speed to take a 80 m radius curve banked at a 30.0° angle?
Determine the x component of velocity : A particle starts from the origin at t = 0 with an initial velocity having an x component of 26.6 m/s and a y component of -14.8 m/s.
Building a regression model with spark : ICT707 Big Data Assignment - Explore feature extraction and target variable transformation for regression models - Building a Regression Model with Spark
Collisions and reflections of confined gas : What is magnitude of the average force experienced by one of the walls of this cube due to the collisions and reflections of this confined gas?
Simple machines to move the massive stone blocks : Discuss how you might use one or more of the simple machines to move the massive stone blocks up the growing pyramid, and into their proper places.
Simple machines to move the massive stone blocks : Discuss how you might use one or more of the simple machines to move the massive stone blocks up the growing pyramid, and into their proper places.
Why earth is not shaped like a cube : Why Earth is not shaped like a cube? Describe 4 reasons with evidence to support each one.

Reviews

urv2119741

11/3/2018 3:27:59 AM

Please add following points in the file which I mentioned below: 1. Introduction 2. Objectives 3. Data source 4. Structure of the database 5. Explanation of machine learning 6. Data preparation 7. Testing 8. Result and recommendation 9. Conclusion 10. References The assignment was done according to the required instructions and intact it was done methodically and i am very satisfied with it, Infact it was ready before the deadline. I would recommend this site to everyone else.

urv2119741

11/3/2018 3:24:31 AM

Big Data Assignment Marking Criteria The Big Data Assignment is comprised of two parts: ? The first part is to create the algorithms in the tasks, namely: Decision Tree, Gradient Boosted Tree and Linear regression and then to apply them to the bike sharing dataset provided. Try and produce the output given in the task sections (also given in the Big-Data Assignment.docx provided on Blackboard). ? The second part is then use those algorithms created in the first part and apply them to another dataset chosen from Kaggle (other than the bike sharing dataset provided).

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd