Reference no: EM132570477
CSIS 3290 Fundamentals of Machine Learning - Douglas College
Project: Classification
Instructions
• Create a folder and rename it to include your name and course number (e.g., pAdams- CSIS3290-Midterm Exam).
• All the files you are required to submit for the assignment should be placed inside this folder.
• The assignment is to be completed individually. If cheating is determined (i.e., you shared your work with another student in the class, you work will be disqualified and you will face further consequences).
Learning Objective 1. Model Data using machine learning algorithms and select a suitable model
Learning Objective 2. Apply the selected model to make predictions
Background
In this project, you are going to predict customer churn for a telecomm company ("Telco"). Customer churn is when a customer decides to switch a service/product provider. This is a common problem encountered in the real world.
Your are provided with a file called "Telco_Customer_Churn.csv" that contains customer characteristics that could be used to predict "churn" (note that the outcome variable is Churn). The variables are self-explanatory and most of them capture whether a customer uses certain services provided by the Telco or not. The file has the following variables/columns: customerID, gender (female or male), SeniorCitizen (No=0, Yes=1) Dependents (whether customer has dependents or no), tenure (no. of months with Telco), PhoneService (Yes/No), InternetService (Yes/No), DeviceProtection (Yes/No), TechSupport (Yes/No), CableService (Yes/No), Contract (month-to-month, 1-year, 2-year), PaperlessBilling (Yes/No), PaymentMethod (Bank transfer (automatic), Credit card (automatic), Electronic check, Mailed check), (MonthlyCharges ($), TotalCharges ($), Churn (Yes/No).
Wha you should do
1. Import the data as a dataframe for analysis
2. Perform the necessary preprocessing of the data to include:
• Dropping CustomerID from analysis
• Removing rows with null values (hint: TotalCharges column has some missing values. While the values are numbers, you will need to convert this column's values to number format to get the desired result. Obviously, there could be other options)
• Coverting categorical variables to dummies
3. Implement Logistic regression analysis using the full dataset (note: You also need to get the Odds ratios from your analysis)
4. Split the dataset into training set and test set
5. Using the Pipeline class of scikit-learn, impement analysis using the following techniques: Logististic Regression, Nearest Neighbors, Linear SVM, RBF SVM, MLPClassifier, Decision Tree, Naive Bayes (i.e., GaussianNB), Random Forest, Bagginng, AdaBoost, and XGBoost.
6. Use the prediction accuracy score to select the best model
Fit the selected model and use the test data to assess its predictive performance by generating:
• Confussion matrix
• Classification report
• ROC curve
7. Predict whether a customer with the characteristics given in the table on page 3 will churn or not (i.e., predict class membership). Note: put a zero against any feature that is not application.
8. Predict the probability whether a customer with the characteristics given in the table on page will churn or not. Note: put a zero against any feature that is not application.
9. Create a new Word document. Copy and Paste the results from step 3 into your report. Discuss your results and their managerial implications.
10. From 6, state the model you selected for final analysis
11. Copy and paste confussion matrix, classification report, and ROC curve into your Word document. Use these results to discuss the predictive performance of the model.
12. Copy and paste the results from steps 7 and 8 into your Word document. Comment on the results about the expected outcome for this customer and suggest some managerials strategies for the Telco.
Attachment:- Fundamentals Of Machine Learning.rar