Reference no: EM132412006
FE520 Introduction to Python for Financial Applications Assignment - Stevens Institute of Technology, USA
1. Clustering & Classification
1. Use sklearn.cluster.KMeans to do clustering on the given data set points.csv. There are 4 clusters in this data set. Draw a scatter plot for the data and use color to indicate their clusters.
2. Regard the clusters given by your KMeans model as the ground truth labels, randomly split the data set into training data and testing data. Create a linear SVM classifier and train it on training data set. Use the confusion matrix to evaluate its performance on testing data set.
3. Regard the data set labels.csv as the ground truth labels, repeat the second question. Compare their performance, discuss what do you observe, and how would explain it.
4. Use tensorflow.keras API to create a fully connected neural network model, repeat the second question. Draw a plot to show how loss changes when the step of training increases.
2. Regression
1. In this question, we are going to use the diabetes data set. Use sklearn.datasets.load diabetes() to load the data and labels.
2. Randomly split the data into training set and testing set.
3. Create a linear regression model using sklearn, and fit training data. Evaluate your model using test data. Give all the coefficient and R-squared score.
4. Use 10-fold cross validation to fit and validate your linear regression models on the whole data set. Print the scores for each validation.
5. Use sklearn to create RandomForestRegressor model, and fit the training data into it.
6. Use Grid Search to find the optimal hyper-parameters (max depth:[None, 7, 4] and min samples split: {2, 10, 20}) for RandomForestRegressor.
3. Web Scraping
The task is to scrape weather information from .
1. Write Python program to scrape the location, last update time, current weather, temperature, humidity and wind speed. The attached image 1 shows the information you need to scrape. Put all the information into a python dictionary.
2. Pack above code into a function which will take the latitude and longitude as inputs and return a dictionary mentioned above. (Hint: You can query the information for a specific latitude and longitude by putting them into the URL).
3. A list of geometric coordinates are given in locations.csv, map your function in 2 to each coordinates, and create a pandas.DataFrame (rows: coordinates, columns: 6 features mentioned in 1). Drop the duplicate locations and invalid records, sort the table by current temperature. Print 5 locations at the top and 5 locations at the bottom.
Submission Requirement: For all the problems in this assignment you need to design and use Python 3, output and present the results in nicely format. Please submit a written report (pdf), where you detail your results and copy your code into an Appendix. You are required to submit a single python file and a brief report. Your grade will be evaluated by combination of report and code. You are strongly encouraged to write comment for your code, because it is a convention to have your code documented all the time.
Attachment:- Python for Financial Applications Assignment Files.rar