Compare the performance of the two Linear

Assignment Help Simulation in MATLAB
Reference no: EM132363643

Data Analytics and Optimisation Assignment -

Instructions:

1. Data required for this assessment is available on blackboard alongside this document in ENN543 Assessment 1 Data.zip. Please refer to individual questions regarding which data to use for which question.

2. Matlab code or scripts (or equivilent materials for other languages) should be submitted as supplementary material (i.e. additional files) or appendices. Note that this material will not be directly marked (i.e. marks will not be assigned for code quality). Figures and outputs/results that are critical to question answers should be included in the main question response, and not just be present only in the Matlab (or similar) output.

Problem 1 - Linear Regression

Prediction of residuary resistance of sailing yachts at the initial design stage is of a great value for evaluating the ship's performance and for estimating the required propulsive power. Essential inputs include the basic hull dimensions and the boat velocity. The Delft data set comprises 308 full-scale experiments, which were performed at the Delft Ship Hydromechanics Laboratory for that purpose. The results of these experiments are in the file yacht.dat. These experiments include 22 different hull forms, derived from a parent form closely related to the "Standfast" designed by Frans Maas. The columns correspond to the following variables (in order):

  • Residuary resistance per unit weight of displacement, adimensional;
  • Longitudinal position of the center of buoyancy, adimensional;
  • Prismatic coefficient, adimensional;
  • Length-displacement ratio, adimensional;
  • Beam-draught ratio, adimensional;
  • Length-beam ratio, adimensional;
  • Froude number, adimensional.

Using this data:

1. Using fitlm in MATLAB, fit a model to predict the resistance per unit weight of displacement as a function of the other variables. Discuss if this is a valid model.

2. Given the above model as a starting point, investigate how it can be improved. In this you should consider:

(a) The use of training and validation datasets. The data should be divided such that the split between these two sets is approximately 80% for training and 20% for validation.

(b) Are all variables important for the model?

Problem 2 - Regularised Regression

Web pages collect large volumes of data on page views, page links, etc., to monitor readership. For commercial ventures, this can help inform publishing and layout decisions, as well as advertising. The BlogFeedback dataset contains data on blog readership, and can be used to predict page views in the next 24 hours based on past readership data.

You have been supplied with two variants of this data:

1. Files named blogData noBow train.csv and blogData noBow test.csv contain features that capture the average readership information for the blog, and information for the specific post (see blogData Variables.txt for further information);

2. Files named blogData train.csv and blogData test.csv contains all the features of the noBow files alongside 200 bag-of-words features that capture the blog post content.

Note that the testing data contains examples from later times to the training data, simulating a real-world case where the model is trained on historic data to predict the future.

Using this data:

1. Fit a model using Linear regression, Ridge and LASSO regression on noBowdata. With these models consider the following:

(a) Determine the best value of λ to use in the Ridge model to obtain the best predictive model.

(b) Determine the best value of λ to use in the LASSO model to obtain the best predictive model.

2. Fit a model using Linear regression, Ridge and LASSO regression on the data containing the Bag-of-Words features. With these models consider the following:

(a) Determine the best value of λ to use in the Ridge model to obtain the best predictive model.

(b) Determine the best value of λ to use in the LASSO model to obtain the best predictive model.

3. Compare the performance of the two Linear, Ridge and LASSO models. You should consider factors such as the errors of the models, the R2 and Adjusted R2, and the model validity in your discussion. Which, if any, models are suitable for use? Justify your response.

Problem 3 - Clustering I

Understanding power use in the home is increasingly important as society strives to improve energy efficiency. The Household Power Consumption dataset captures energy use in a single home over a period of several years, and can be used to analyse usage patterns and detect periods of abnormal power use. You have been provided data covering a single year (2007) in household power consumption 2007.csv. The columns in this data correspond to the following variables (in order):

  • date: Date in dd/mm/yyyy format.
  • time: Time in hh:mm:ss format.
  • global active power: Household global minute-averaged active power (in kilowatts).
  • global reactive power: Household global minute-averaged reactive power (in kilo-watts).
  • voltage: Minute-averaged voltage (in volts).
  • global intensity: Household global minute-averaged current intensity (in ampere).
  • sub metering 1: Energy sub-metering No. 1 (in watt-hour of active energy). It corresponds to the kitchen, containing mainly a dishwasher, an oven and a microwave.
  • sub metering 2: Energy sub-metering No. 2 (in watt-hour of active energy). It corresponds to the laundry, containing a washing-machine, a tumble-drier, a refrigerator and a light.
  • sub metering 3: Energy sub-metering No. 3 (in watt-hour of active energy). It corresponds to an electric water-heater and an air-conditioner.

Using this data, you are to investigate if usage patterns can be identified in the data, and if abnormal behaviours can be detected. In particular you are to:

1. Cluster the data considering the three sub-meter readings only (sub metering 1, sub metering 2, sub metering 3), using the clustering method (and number of clusters) of your choice.

Justify your selection for the clustering method and parameters (i.e. number of clusters) based on the requirements of this problem, the nature of the data, and the capabilities of the clustering method.

2. With the clustered data investigate:

(a) Are trends visible in the clustered data? For example, can changes in use be seen at different times of the year (i.e. summer vs winter), or from a weekday to a weekend?

(b) Can any abnormal usage be detected? If abnormalities can be found, show a visual comparison between the abnormal time period and a nearby (i.e. the previous or next day) normal time period. The method to select abnormal samples should classify approximately 1% of the data as abnormal. For the purposes of this problem, a period of abnormal usage is a period of 2 hours (or more) where 50% or more of the samples are abnormal.

In completing this question you may also like to consider:

1. Is it reasonable (or practical) to learn the clusters on all the data?

2. Can the data be aggregated in any way to reduce the volume of data? Does such aggregation alter the findings?

Problem 4 - Clustering II

Sensors such as accelerometers and gyroscopes are becoming increasingly common in wearable and mobile devices. From these signals, it is possible to detect different activities, and potentially even different people. You have been supplied with three files that capture wearables signal data as follows:

  • wearables signal.csv contains 3,237 samples of 561 dimensional wireless sensor data;
  • wearables activity.csv contains the ground truth activity being performed for each of the samples in wearables signal.csv. There are 6 activities (Walking, Walking Upstairs, Walking Downstairs, Sitting, Standing, Laying) in total.
  • wearables subject.csv contains the ground truth subject ID for each of the samples in wearables signal.csv. There are 10 subjects in total.

Using this data, you are to investigate if the classes of activity and the users can be separated via clustering. In particular you are to:

1. Cluster the data using HAC with the aim of:

(a) Separating the data into the 6 activity classes. Using the provided ground truth, evaluate the accuracy of the clustering result.

(b) Separating the data into the 10 identity classes. Using the provided ground truth, evaluate the accuracy of the clustering result.

(c) Separating the data 60 clusters such that each cluster corresponds to a particular individual performing a particular activity. Using the provided ground truth, evaluate the accuracy of the clustering result.

2. Repeat the three clustering tasks using DBScan, and compare the performance of the clustering results obtained using DBScan and HAC. Comment on any differences observed between the two methods, and which method is more suitable in this situation.

Attachment:- Data Analytics and Optimisation Assignment File.rar

Reference no: EM132363643

Questions Cloud

What you would do if you were the supreme court : On April 18, 2016, The United States Supreme Court denied a petition for certiorari (refused to review the lower court's ruling) in the case of Authors Guild.
Undertake training and coaching in the workplace : List 3 questions you can ask somebody to determine their feelings, thoughts and reactions to being one of the employees
Explained how you would analyse cost information : ACC 306 - Accounting for Strategic Management and Sustainability - Review and elaborate on possibilities to reduce costs and enhance value and profitability
Developing managerial skills : Describe how you can provide specific workplace learning, mentoring and coaching to assist her in developing managerial skills
Compare the performance of the two Linear : ENN543 Data Analytics and Optimisation Assignment, Queensland University of Technology, Australia. Compare the performance of the two Linear
How should management respond to these scenarios : Which typically takes place the night before the IPO, and how should management respond to these scenarios.
Performing a careful critical analysis of the reading : Make a course project that reflect on the theme and points made by the author and summarizes the article after performing a careful critical analysis
Consider multiple perspectives in negotiation : Why is it important to consider multiple perspectives in negotiation?
Calculate the total proceeds for? yext ipo : a. Calculate the total proceeds for? Yext's IPO. b. Calculate the percentage underwriter discount.

Reviews

len2363643

8/30/2019 11:44:24 PM

This assignment sets out the four (4) questions you are to complete. The assignment is worth 25% of the overall subject grade. Weights for individual questions are indicated throughout the document. Students should submit their answers in a single separate document (either a PDF or word document), and upload this to TurnItIn. Further Instructions: Data required for this assessment is available on blackboard alongside this document in ENN543 Assessment 1 Data.zip. Please refer to individual questions regarding which data to use for which question. Answers should be submitted via the TurnItIn submission system, linked to on Black- board. In the event that TurnItIn is down, or you are unable to submit via TurnItIn.

len2363643

8/30/2019 11:44:18 PM

Matlab code or scripts (or equivilent materials for other languages) should be submitted as supplementary material (i.e. additional files) or appendices. Note that this material will not be directly marked (i.e. marks will not be assigned for code quality). Figures and outputs/results that are critical to question answers should be included in the main question response, and not just be present only in the Matlab (or similar) output.

Write a Review

Simulation in MATLAB Questions & Answers

  Calculate the stress intensity factor

Use the three-parameter zone finite element method or the boundary collocation method to calculate the stress intensity factor K, at the crack tip for the plate

  Build a simulation using newtons laws of motion

Build a new and different simulation of your own using Newtons laws of motion and Show the code and describe how it works

  Write the specification of load mover

Write the specification of LOAD MOVER detailed of the whole design and precise for automatic control section and divide the design into various modules and Is the kernel required if yes which one?

  Design the automatic control section using statecharts

Aim of this project is to design an embedded system which can move loads from one place to another. The system can be operated manually, automatically and wirelessly.

  Need an expert who can model a drill in simulink

Need an expert who can model a drill in Simulink. Working model of a drill needing for an improvment to behave more realistically as a drill to drill through plastic block.

  Project is on load frequency control using fpid

Project is on load frequency control using FPID tuned using GA and PSO algorithm and the system is a two area system.

  Number of packets received with time

Let x be the number of packets received with time -

  Build a matlab based graphical user interface

Build a Matlab based graphical user interface (GUI) that operates in conjunction with a base Matlab/ Simulink simulation program. Any base simulation is considered acceptable.

  Build a matlab based graphical user interface

Build a Matlab based graphical user interface (GUI) that operates in conjunction with a base Matlab/ Simulink simulation program. Any base simulation is considered acceptable.

  Simulate the standardised sum of independent

Simulate the standardised sum of independent and identically distributed variates - Fit a linear regression model as in Q5, and plot your estimates for β0 and β1 as N increases, together with a line indicating their true values. Supply your code.

  Plot the original periodic square wave

Plot the original periodic square wave on the same graph. Comment on the difference between the original periodic square wave and its truncated Fourier series presentation.

  Use matlab to plot the function

Plot the original periodic square wave on the same graph. Comment on the difference between the original periodic square wave and its truncated Fourier series presentation.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd