Reference no: EM133119703
CST2330 Data Analysis for Enterprise Modelling - Middlesex University
Data clustering and classification
Analysis and classification of market conditions
In this task you have to:
Question 1. Classify the data into days when the market was bullish (most assets go up in price) or when it was bearish (most assets go down in price). Include plots visualising these clusters.
Question 2. Describe the behaviour of some main cryptocurrency pairs, such as BTC/USD or ETH/USD, on the days with bullish and bearish trends.
Question 3. Identify outliers (points that lie far from cluster centres) and look at the corresponding dates. Search past news headlines to see if there were any interesting events on those dates that could explain the unusual market movements.
You have to support your analysis and conclusions by plots. Include your R code into the Appendix.
Additional details
To complete this task, you need to analyse the log.returns dataset, where columns (variables) are different cryptocurrency pairs, and rows (observations) are different trading days, such as the sample shown below:
You can use one or any of the following methods:
• Principle component analysis (see Ex. 5, Lab 14).
• k-means or hierarchical clustering (see Ex. 3, Lab 15).
• Self-organising map (see Ex. 4, Lab 16).
Clustering of cryptocurrency returns
In this task you have to:
1. Identify groups (4 or more) of cryptocurrency pairs that have similar log- returns. Mention examples of pairs in each group. Include plots visualising these clusters.
2. Identify a group of so-called ‘stable coins', and use your visualisations to explain how are they different from other groups. 2 marks 3.Choose any cryptocurrency pair that you think looks interesting or different on your graphs. Search information about it online to find some possible explanations for your observation.
Support your conclusions by plots. Include your R code into the Appendix.
Additional details
To complete this task, you need to analyse the transposed version of the log.returns dataset, where columns (variables) are different trading days, and rows (observa- tions) are different cryptocurrency pairs, such as the sample shown below:
As before, you can use one or any of the following methods:
• Principle component analysis (see Ex. 5, Lab 14).
• k-means or hierarchical clustering (see Ex. 4, Lab 15).
• Self-organising map (see Ex. 5, Lab 16).
2 Data modelling and prediction (20%)
In this task you have to
1. Choose and arrange a subset of the log-returns data for modelling and prediction.
2. Split the arranged subset into the training and testing sets.
3. Use one or more techniques to train the models on the training set and then evaluate their predictions on the testing set.
4. Measure and compare the performance of two or more models.
Additional details
To complete this task, you need to choose log-returns of any one cryptocurrency pair from the log-returns dataset and prepare the sets for training models and testing their predictions. Here you can choose any of the following arrangements of the data:
You can use log-returns of several previous days of the same cryptocurrency as predictors of the next day log-return. The example below shows arrange- ment for IOT/BTC: the first three columns are predictors (log-returns on 3 consecutive days) and the last columns is the response.
Attachment:- Data Analysis for Enterprise Modelling.rar