Reference no: EM133144850
MLN 601 Machine Learning - Torrens University Australia
Machine Learning Project
Learning Outcome 1: Evaluate and compare the key concepts in machine learning.
Learning Outcome 2: Apply learning algorithms to perform machine learning tasks.
Learning Outcome 3: Implement practical machine learning: data pre-processing, analysis, model selection, and interpret the results.
Learning Outcome 4: Communicate clearly and effectively using the technical language of machine learning to a range of stakeholders
Task Summary
Bike sharing systems are poised to become a major mode of transportation. The prediction problem for determining how many bikes will be present at a station is important to both travellers and the entrepreneurs running the systems. This Assessment requires you to develop at least two models using different modelling algorithms to predict the bike demand from a real-world data set.
Please refer to the Task Instructions (below) for further details on how to complete this task.
Context
This assessment requires you to make predictions based on at least two different techniques in machine learning (ML) using real-world data and compare the performance of your predictions. As a professional, you will often be expected to perform similar tasks using suitable data sets and you must be open to trying different techniques rather than having a bias towards just one approach.
The data sets contain daily counts of bike rentals from the bike sharing company Capital-Bikeshare15 in Washington D.C., along with weather and seasonal information. The goal of your model is to predict how many rental bikes will be available on the street for a given day and weather forecast.
The selection of models has been left to you, but could include a linear model, support vector machine, nearest neighbour, random forest, gradient boosting or any other learning models available in scikit- learn. For the error measurement, use the mean absolute error.
You must consider the Cross-Industry Standard Process for Data Mining (CRISP-DM; Chapman et al 2000) from the outset. You will commence with the business understanding step to ensure the correct framing of the problem (e.g., to predict the demand for bicycles on a given day and weather forecast). In this stage, you should also identify any constraints and assumptions. The point of commencing the analysis and modelling with a business understanding is to provide you with an opportunity to consider some of the variables influencing demand rather than jumping straight into the exploratory analysis.
Task Instructions
Multiple activities are required to complete this assessment task. Follow the steps of the CRISP-DM model using the template (CRISP-DM Template.ipynb) to document and develop your ML model.
Stage 1: Business Understanding
1. This section serves as an introduction. You should write a clear and concise narrative expressing what you are trying to achieve. Think in terms of ML (e.g., the prediction algorithm, the data set selected, what you are seeking from the data set and how you intend to understand the value of your prediction capability).
2. Assess the current situation.
Stage 2: Data Understanding
1. Acquire the relevant bike data set from the UCI repository for your prediction model (https://archive.ics.uci.edu/ml/machine-learning-databases/00275/). Explicitly specify the data source by providing a specific link and the name of the data set (e.g. Bike Sharing Dataset) and the method of acquisition (e.g., direct from the URL or a download of the .csv file). The steps taken need to be clearly stated.
2. Read this data set into your Notebook.
3. Describe the data set inclusive of variables, units and levels.
4. Verify the data quality by analysing the data set for structure and missing data.
5. Conduct an initial data exploration using data visualisation, reporting and querying the data.
6. Use the pairplot function in seaborn to determine the relationship, if any, between the variables. Include the output or the visualisation of the pairplot function in your Notebook and comment on it.
Stage 3: Data Preparation
1. Select the data that you will use for the analysis.
2. Clean the data you have selected to improve the quality of the data.
Stage 4: Modelling
1. For this Assessment, you are required to consider at least two models.
2. Import the models into your code.
3. Record any modelling assumptions.
4. Create a training and testing data set (e.g., 25% testing and 75% training).
5. Build different prediction models to run over the data set and predict the demand for a given day and weather forecast.
6. Record the model parameter settings, including your rationale for the choice of values and the actual model generated.
7. Assess the models according to the performance measurement.
8. Revise any parameter settings for subsequent model runs. Document all revisions until the best model is reached.
9. Comment on the model's most predictive features.
Stage 5: Evaluation
1. Assess the ML results. Ensure you include a statement as to whether the selected model meets the evaluation criteria.
Stage 6: Deployment
1. For this Assessment, you are not required to deploy your model. For this stage, simply include any lessons that you learned and that you wish to share regarding the things that went right and wrong, the areas in which you did well and in which you could improve. You can also detail any of your other experiences in completing this Assessment.
Attachment:- Machine Learning.rar