CSE3CI - Computational Intelligence for Data Analytics

CSE3CI - Computational Intelligence for Data Analytics Assignment - La Trobe University, Australia

Problem Description - Forecasting Electricity Prices

The problem is to forecast electricity price based on historical data. Let the temperature and total demand of electricity at time instant t be T(t) and D(t) respectively. The goal is to predict the recommended retail price (RRP) price by using some historical data as system inputs. The historical data set consists of the following variables: T(t-2), T(t-1), T(t), D(t-2), D(t-1), D(t). The output should be a prediction of the Recommended Retail Price (RRP) of electricity at the next time instant t+1, denoted by P(t+1).

You have been provided with real-world electricity pricing data from Queensland, Australia. There are two datasets: a training set, to be used for model development; and a test set, to be used to evaluate the performance of your models. Each dataset has the same structure. Rows correspond to successive time instants, and contain seven values: the predictor variables T(t-2), T(t-1), T(t), D(t-2), D(t-1), D(t), and the target variable P(t+1). The objective is to predict the value of P(t+1) on the basis of one or more of the six predictor variables.

There are five parts to the assignment, described below, with the approximate assessment weighting. Parts 1, 2 and 3 are based on content that has been covered up to then end of Week 5. Content for Part 4 will be covered in Week 6 and 7.

Part I - Data Preparation

The performance of many systems can be improved through careful preparation of the data. Visualising the electricity prices will reveal that there are potential outliers in the dataset; i.e., observations that lie an abnormal distance from other values in a random sample from a population.

Tasks -

Use an appropriate technique to identify and remove outliers of the output variable from the datasets (for both training and test sets).

Provide a plot showing the price data before and after the removal of outliers.

Part 2 - Linear Regression Models

Linear regression is often a good baseline against which to compare the performance of other models.

Tasks -

Apply linear regression to the prediction of electricity prices.

For both the training and test sets, provide the Average Relative Error.

For both training and test sets, produce a plot showing, for each data point, how the predicted price compares with the actual price.

Part 3 - Multilayer Perceptron Models

Multilayer perceptrons can sometimes yield better performance over linear models.

Tasks -

Experiment with the application of MLPs to predicting electricity prices. You should try varying MLPRegressor parameters such as the regularization coefficient, the number of training epochs, and the number of hidden units. Make sure that you record the training error and test error in each case. It is suggested that you use logistic units in the hidden layer, but you can use others if you wish.

Provide results for three different MLPRegressor parameter settings.

- one of these should be the result for the best performing MLP that you were able to train;

- one should clearly demonstrate underfitting;

- one should clearly demonstrate overfitting.

For each of these cases, provide the learning parameters that you have used, as well as the training error and the test error.

For the best-performing MLP, for both training data and test data, produce a plot showing, for each data point, how the predicted price compares with the actual price.

Part 4 - Fuzzy Forecasting System

For this part, you will develop a fuzzy forecasting system for predicting the electricity price.

Tasks -

Select appropriate values or fuzzy subsets for the linguistic variables that you will use in your fuzzy rules.

Apply statistical analysis (correlation coefficients) and heuristics to develop a set of fuzzy rules;

Implement your fuzzy system in Python, and produce clear plots of all membership functions involved in your system;

Evaluate the system performance in terms of the average relative error on both training and test sets.

You may use either Mamdani-type or Sugeno-type inference, but you should include some justification for your decision.

Part 5 - Report and Presentation

This is the assignment 'deliverable'; i.e., what you are required to submit. It should contain your results from Tasks 1 to 4, put together in a clear and coherent manner. It should also clearly describe how you conducted your investigation and any design choices you made (e.g., What parameters did you experiment with when applying the MLP?, What different membership functions did you experiment with in creating your fuzzy system?, Why did you opt for Mandami-type inference as opposed to Sugeno-type inference?, and so on). Basically, the more thorough and systematic your analysis, the better. A summary of your overall findings should also be provided in the report.

Attachment:- Computational Intelligence for Data Analytics Assignment File.rar

