Reference no: EM131510293
This section deals with electricity usage data for 241 households in New Zealand collected in 2009. The data is for a single 24 hour period, and has been sampled every 30 minutes. Hence there are 48 measures per household. The first entry is at 12:30am, the second at 1am, and so on. So, for example, the 14th sample is at 7am, and the 24th sample is at midday.
Run the supplied R script power.R, which loads the electricity data and produces a boxplot which aggregates the 241 households over time and an example plot for one household. Note that the data is a data.frame (each row is labelled with each household identifier associated with this data), and that to plot an individual example we turned the individual 48 data points into a numeric vector (as.numeric).
Question a) Examine the boxplot and briefly discuss the aggregated pattern of electricity usage.
The electricity company is interested in understanding some general patterns of usage so that they can identify different types of user and therefore target them with specific pricing structures. Your task is to identify 9 overall patterns of usage.
This problem is basically one of clustering - but to solve the problem we will first want to construct explanatory variables that characterise the pattern of usage for each household. Note that some patterns will want to be related to how the electricity usage varies over the day, as well as summary statistics such as the mean/variance/etc. electricity usage.
Question b) Examine the R file power.kmeans.R. This has some functions already setup that you will find useful. There is even a function that you can call to apply simple functions that take a single input.
Question c) Construct a table of explanatory variables, where each row is a household, and each column is an explanatory variable that you have constructed from the original usage data. d) Describe the functions that you have constructed to create these explanatory variables, and in particular discuss how you have described (represented) the temporal structure of usage.
Question d) Apply kmeans clustering with 9 centers to your final explanatory datatable. Produce two figures: one with boxplots (just like the original script example) showing the final clustering patterns of usage for each of the nine clusters, the second as 9 line plots showing the mean usage pattern for each timestep for each cluster. Ensure that you set the SAME ylim for each figure so that a direct visual comparison of range is possible. See Figures 1 and 2 below for an example.
Question e) Comment on how the patterns of usage vary in the 9 clusters.
Question f) Select a pair of explanatory variables (say x1 and x2) that are not highly correlated and plot x1 versus x2 for each household colouring each point by cluster number. Comment on how the clustering is related to these variables.
INVESTMENT PORTOLIO MANAGEMENT:
This section deals with modelling the selection of stocks, bonds and cash to make up an investment portfolio. Load the data as follows:
invest <-read.table("invest.tab")
The ROI column is the percentage predicted return on investment, the Risk column is a measure of the risk associated with this particular investment, and the Type indicates the type of investment. Note that each row is labelled with the type of investment and a number, so that you can (if needed) refer to individual investments.
Question g) Visualise and discuss the different ROI and Risk associated with Stocks, Bonds and Cash.
Question h) Run the "invest.R" script. This script does a multi-objective criteria analysis to determine the best mix of stocks, bonds and cash over a range of tradeoffs.
Describe in words what the "invest.R" script is doing. In particular, state how a solution on the pareto front is represented and the relationship between the solution space, the objective space and the constraints.
Qustion I) Using the result of the nsga2 model you have previously run, examine and present the blend of stocks, bonds and cash for a low risk, moderate risk and high risk investment blend (just pick one from each general category). Discuss, in relation to Table 1, the level of risk that seems to be taken by the brokerage houses and whether the one year return performance is related to the associated risk of the brokerage house.
Question j) Examine the plot shown in Figure 3. This shows how the percentage of bonds, stocks and cash vary as you move along the pareto front from the least to greatest risk. Outline the approach (set of steps, algorithm, ...) that would be required to produce this figure given the output from nsga2.
Question K) Find out some historical information on the performance of 2 of the riskiest brokerage houses from Table 1 and comment on whether they still exist, etc.
Question L) Assume you only want to investigate the mix of solutions with the ROI between 9.0 and 13.0. State the constraint function that you would define to focus the search space between these ranges of ROI, and produce a plot of the pareto front using this additional constraint. NOTE that it may take several runs before you get the optimal front (the plot will be shown as BLUE DOTS if it has not found the optimal front).
Attachment:- Graphs.rar