Perform an exploratory analysis of your dataset

Assignment Help Computer Engineering
Reference no: EM132677769

Assignment: In this individual assignment, you will perform an exploratory analysis with What-If Tool, to better understand the structure of datasets, investigate initial questions, and develop preliminary insights and hypotheses. Your final submission will take the form of a report consisting of key insights gained during your analysis.

Step 1: Dataset Selection and Initial Questions

Pick two datasets. These can be ones that are available for demo. But we'll give you additional points if you choose to use datasets that are not available there.

After selecting datasets - but prior to analysis - write down an initial set of three questions you'd like to investigate about the datasets and prediction results from ML models.

Part 2: Exploratory Visual Analysis

Next, you will perform an exploratory analysis of your dataset and results from ML models using What-If Tool. You can either use their web demo if you use their provided datasets. You can also use notebooks and revise them with your datasets and models.

You should consider two different phases of exploration.

In the first phase, you should seek to gain an overview of the structure of your datasets and results from their models. What is the structure of datasets? Which features are used? Are there any notable issues with the distributions of datasets? What is the model performance? What features contributed the most? Are there any surprising relationships among subsets of data and model results? Are there any fairness issues?

In the second phase, you should investigate your initial questions, as well as any new questions that arise during your exploration. For each question, playing with the visualizations in What-If Tool, that might provide a useful answer. Interact with their functionalities (e.g., datapoint editors, dropdown menus, fairness analysis) to develop better perspectives, explore unexpected observations, or sanity check your assumptions. You should repeat this process for each of your questions, and also feel free to revise your questions or branch off to explore new questions.

What to submit?

You'll submit a single PDF as a form of a report. For each dataset, you will provide 10 most interesting or surprising findings (or "insights") with details and screenshots. Your "insights" can include important surprises or issues (such as skewed data distributions, critical fairness issues) as well as responses to your analysis questions. Each finding will consist of a title and 2-4 sentence descriptions, and screenshots. Provide sufficient detail so that anyone could read through your report and understand what you've learned. You are free, but not required, to annotate your images to draw attention to specific features of the data.

Do not submit a report cluttered with everything little thing you tried. Submit a clean, succinct report that highlights the most interesting, insightful observations. You don't need to tell us how the tool works -- we already know that. Think of this like a report to your manager who wants to know what the datasets look like and how the model worked.

The structure of the report will be:

1. Dataset 1

• Which dataset?

• Three initial questions

• 10 most interesting findings

2. Dataset 2

• Which dataset?

• Three initial questions

• 10 most interesting findings

Reference no: EM132677769

Questions Cloud

Compare the pros and cons of vertical integration for tesla : Compare the pros and cons of vertical integration for Tesla with respect to acquiring lithium
Geographically dispersed teams collaborate effectively : How do geographically dispersed teams collaborate effectively?
Policy requirements of the government and health care sector : Discuss the differences in policy requirements of the government and health care sectors. The Health Insurance Portability and Accountability Act
Explain at least three roles of the data definition wizard : Explain at least three roles of the Data Definition Wizard, and describe for each role how the auditor will use the Data Definition Wizard feature
Perform an exploratory analysis of your dataset : You will perform an exploratory analysis of your dataset and results from ML models using What-If Tool. You can either use their web demo if you use their.
Find concept of economies of scope : Use the concept of economies of scope to discuss how this initiative can contribute to Apple's overall performance.
Calculate the balance in cash account at the end of March : During March the business purchases equipment on account for $25,000; Calculate the balance in the cash account at the end of March
What is the value of the ending inventory at lifo : Assuming that the perpetual inventory method is used and cost are computed at the time of each withdrawal, what is the value of the ending inventory at LIFO?
Describe some of the negative impacts management information : Are there any potential disadvantages to using management information systems? Be sure to come up with potential negative impact.

Reviews

Write a Review

Computer Engineering Questions & Answers

  Reviewing terminology-from the cryptography flash cards

This week we are reviewing terminology. From the Cryptography Flash Cards in this week's learning, Provide a relevant example of the term's use

  What kind of mitigations would you recommend for protection

Lessons learned from the Mirai Botnet attack of 2016. What did this exploit? What kind of mitigations would you recommend for protection?

  Write a program segment that displays the last character

Write a program segment that displays the last character on every line of input. Write a program segment that determines and displays total number of elements.

  What is the use of the variable classpath

Write a method code defined by yourself for the sum of 3 numbers and that outputs the value. Write only the code of the Method.

  Compute the ideal number of clusters

Determine the ideal number of clusters. Choose random center points (centroids) for each cluster. Using a standard distance formula measure the distance from each data point to each center point.

  Describe the relationship between nist and fisma

Discuss in 500 words or more the relationship between NIST and FISMA. This should not be a two part paper explaining what NIST and FISMA are separately. This.

  Create an organizational chart for the it department

Briefly describe the roles and responsibilities associated with the positions within the organizational chart.

  Explain briefly the reasons behind the given trend

In parallel processing systems with multiple processors, there has been a trend away from shared media interconnects. Explain briefly reasons behind this trend.

  How many different ways can you up the stair way

Suppose a stair way has N steps where N is a positive integer. How many different ways can you up the stair way if, as you up, sometimes

  Write a sub procedure that plays the game ro-sham-bo

IE 212 Homework: Programming Structures. Write a sub procedure that plays game Ro-Sham-Bo (a.k.a., Rock, Paper, Scissors) for player and computer opponent

  Service is used to automatically assign ip addresses

explain an IP address. Describe Class A, B, and C networks. Are a MAC address and an IP address the same thing? What network service is used to automatically assign IP addresses? Describe the four steps of the IP address lease process.

  Create a method that takes three number

Create a method that takes three number as parameters and returns the maximum of three to the calling method.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd