Reference no: EM132585225
B9DA106 Data Visualization - Dublin Business School
Python Data Visualization Assignment
Learning Outcome 1: Ability to visualize and analyse datasets using Python
Python Data Visualization Assignment
Task
Visualize natural groupings or clusters in each of the two given datasets (sign_mnist.csv and customers.csv) using dimensionality-reduction & clustering algorithms. You can use either TSNE or UMAP for dimensionality-reduction.
Analysis of Dataset 1 carries 40
Dataset 1: sign_mnist.csv
This dataset follows the same format as the mnist dataset studied in the lectures. It refers to 28×28 pixel images of 24 hand gestures in the American Sign Language representing 24 English alphabets respectively (See Figure1). Two alphabets - J & Z could not be represented by still images because they require hand motion.
Number of Instances: 10,000
Number of Variables: 784 independent variables + 1 label variable
Each row represents an image. Each independent variable represents a pixel (with value between 0-255). Label variable indicates which letter the image corresponds to. Letters A-Z are represented by numeric labels 0-25 (with no instances for 9=J or 25=Z as these alphabets cannot be represented by still images).
Dataset 2: customers.csv
This dataset refers to customer data of a telecom company. Each row corresponds to a customer.
Number of Instances: 7033
Number of Variables: 19 independent variables -
1 - gender - Whether the customer is a male or a female
2 - SeniorCitizen - Whether the customer is a senior citizen or not (1, 0) 3 - Partner - Whether the customer has a partner or not (Yes, No)
4 - Dependents - Whether the customer has dependents or not (Yes, No)
5 - tenure - Number of months the customer has stayed with the company
6 - PhoneService - Whether the customer has a phone service or not (Yes, No)
7 - MultipleLines - Whether the customer has multiple lines or not (Yes, No, No phone service)
8 - InternetService - Customer's internet service provider (DSL, Fiber optic, No)
9 - OnlineSecurity - Whether the customer has online security or not (Yes, No, No internet service)
10 - OnlineBackup - Whether the customer has online backup or not (Yes, No, No internet service)
11 - DeviceProtection - Whether the customer has device protection or not (Yes, No, No internet service)
12 - TechSupport - Whether the customer has tech support or not (Yes, No, No internet service)
13 - StreamingTV - Whether the customer has streaming TV or not (Yes, No, No internet service)
14 - StreamingMovies - Whether the customer has streaming movies or not (Yes, No, No internet service)
15 - Contract - The contract term of the customer (Month-to-month, One year, Two year) 16 - PaperlessBilling - Whether the customer has paperless billing or not (Yes, No)
17 - PaymentMethod - The customer's payment method (Electronic check, Mailed check,
Bank transfer (automatic), Credit card (automatic))
18 - MonthlyCharges - The amount charged to the customer monthly 19 - TotalCharges - The total amount charged to the customer
Students must submit the following in a zipped folder:
1. Report (.pdf) detailing critical visual design decisions/ approach
2. Presentation Slides (.ppt) with recorded audio narration explaining/interpreting results
3. Python Code (.py)
4. Plotly Visualization Files (.html)
Naming convention:
Report should be named as -
Report_Surname1_Surname2_Surname3_Surname4.pdf
Slides should be named as -
Slides_Surname1_Surname2_Surname3_Surname4.ppt
Code should be named as -
Code_Surname1_Surname2_Surname3_Surname4.py
Zipped folder should be named as -
Surname1_Surname2_Surname3_Surname4.zip
There is no cap on word-count or number of slides. Submitted work will be assessed on quality, and not quantity of content.
Only one submission is required per group (any group member can upload assignment).
Attachment:- Data Visualization.rar