Reference no: EM133096418
Analysis of Financial Indicators Using Clustering
Objective: The objective of this exercise is to segment stocks by various factors to determine a possible investment portfolio.
Activities:
• Import and prepare data
• Apply data mining algorithms
• Configure predictive models
• Create data visualizations
• Analyze and interpret output from models
• Publish results
Scenario
You have been investing in savings and stocks for several years. You have invested wisely and the return on your investment is slightly above the average investor's. However, you believe that by using analytical techniques that you developed in school, you should be able to create an investment portfolio that will provide better than average returns. Because you have limited funds to invest, you want to target the stocks that will help you create a strong portfolio that will meet your investment goals. You scraped some performance indicators from a highly reliable financial website and downloaded it into a
.csv file which you have since converted to Excel.
The data you acquired includes several attributes and measures on which to base your analysis and there are 7,112 rows of data so you realize that you will need to segment the data based on various attributes to narrow down your search for the ideal portfolio given your constraints. At least initially, you would like to focus on stock price, invested capital, and total debt.
Cluster Analysis
Given a dataset, organizing it into meaningful groups is a basic and useful approach to data mining and data analysis. Clustering classifies samples into groups using a measure of association so that data points within a group are similar. Data points from different groups are not similar. Data points are multidimensional, that is they consist of several variables. Visualization is not practical for humans when datasets consist of more than three dimensions.
The input to a clustering exercise is a dataset and the number of clusters. The result of the analysis is a set of clusters. K-means clustering is a method of finding clusters and their centers (R) given a choice in the number of clusters (K). It is often used for market segmentation. The goal is to make the inter-cluster difference (distance) high and the intra-cluster difference (distance) low.
To build an analysis for segmentation analysis, proceed as follows:
1. Open the file FinancialIndicators.xlsx (Hands-on_3_FinancialIndicator.xlsx) and explore its contents. Notice that there are plenty of variables to choose from for segmentation.
2. Close Excel.
3. Launch SAP Predictive Analytics.
4. Click Expert Analytics ? Expert Analytics
5. From the menu, choose File ? New.
6. In the New Dataset window choose Excel. Next
7. Search for the FinancialIndicators.xlsx file provided to you and open.
8. The first row is the header data. Check to see that 7,112 rows of data have been acquired. Create.
9. Switch to the Prepare panel.
10. Notice that some of the columns of the spreadsheet have come through as measures and the default aggregation is SUM. It does not make sense to add up Stock Price. Click on the cog next to the Stock Price measure and change the aggregation method to Average.
11. Switch to the Predict panel.
12. From the Algorithms tab (on the right side, within Components panel), drag and drop or double click the R-K-Means algorithm into your analysis.
13. The algorithm component is automatically connected to the data source component.
14. Hover over the R-K-Means algorithm and either click on the cog or choose Configure Settings (on the right).
15. In the R-K-Means properties dialog box, provide the necessary details:
a. In the Number of Clusters field, enter 12.
b. Select stock price, invested capital, and total debt to be used for the cluster analysis.
c. Retain the default values for the advanced properties.
d. Choose Done.
16. From the Data Writers tab, drag and drop or double click on the CSV Writer component.
17. Configure Settings of the CSV data writer.
a. In the CSV Writer Configure Settings, select a CSV file to store the result (use Browse and give the file a name).
b. Chose Done.
18. Click Run to run the analysis
19. You should receive a succeeded message. OK
20. You are now in the Results Grid view.
21. Switch to the Summary view to see the results in Figure 3.
22. You can see the center coordinates of the clusters. Also the size of each cluster which is the number of stocks in each cluster.
23. Results visualization and interpretation...
a. In the Cluster Representations pane, select Cluster Distribution.
i. You see a chart of cluster size vs cluster number, (Figure 4). These are the number of stocks in each cluster. You can roll over the bars to see the number.
ii. Stocks within a cluster are similar to each other and dissimilar to all other stocks in other clusters.
b. In the Cluster Representations pane, select Cluster Density and Distance.
i. You see that cluster 3 in Figure 5 has the lowest/weakest density and cluster 12 in the same figure has the highest. Low density clusters imply clusters of noise, outliers, or other loosely associated data. The distance shows how dissimilar the clusters are.
b. In the Cluster Representations pane, select Cluster Density and Distance.
i. You see that cluster 3 in Figure 5 has the lowest/weakest density and cluster 12 in the same figure has the highest. Low density clusters imply clusters of noise, outliers, or other loosely associated data. The distance shows how dissimilar the clusters are.
c. In the Cluster Representations pane, select Cluster Center Representation.
i. You see a radar chart of the cluster centers (radar axes are the variables); you can change the cluster number in the Data panel. Notice in Figure 7 that the average stock price in cluster 6 is much higher than that of other clusters.
c. In the Cluster Representations pane, select Parallel Coordinate Chart.
i. The axes are all normalized. Parallel lines between the axes imply a positive relationship between the two dimensions. Intersecting lines imply a negative relationship.
d. In the Cluster Representations pane, select Scatter Matrix Charts.
i. You see the scatter charts of store clusters plotted between various pairs of dimensions
24. The fitted results are stored in the CSV file. You can open the saved csv file and explore the 12 clusters that have been generated or you can explore further with visualizations
25. From the File menu, select Save.
26. Enter a name for the document.
27. Choose Save.
Analysis
28. Switch to the Visualize panel.
29. From the dropdown on Select Analysis shown in Figure 10, choose Analysis 1
30. Select Component R-K Means.
31. Create a column chart with Stock Price on the Y Axis and ClusterNumber as the dimension. Notice that Cluster 6's stock price is much higher than others. Since we have limited funds available, we will not want to purchase any stocks in cluster 6. Filter cluster 6 from your column chart. Sort by stock price.
32. Create appropriate visualizations to answer the following questions.
Question 1: Which cluster contains those stocks with the lowest average stock price? What is the average stock price in this cluster?
Question 2: Excluding the largest cluster (cluster 6 in this example), which is the cluster with the highest average stock price? Of that cluster, what company has the highest average stock price and what is that price? What company has the lowest?
Question 3: Continue to examine the cluster from question 2. What do your observations tell you regarding total debt and invested capital for the companies in this cluster?
Question 4: What other observations can you make about the clusters in this analysis? (Provide at least 3 with their associated visualizations and justifications for your observations. Why are these observed relationships important?)
Attachment:- Workshop Clustering Analysis.rar