What is the difference between the two histograms

Assignment Help Computer Engineering
Reference no: EM131451449

Assignment 1-

1. Basic histogram. 

a. Import the data in Table 1  Data Set 10.1.1 - Basic Histogram Datato Tableau and make a basic histogram from it.  To make a histogram, you can follow the directions here.  For this data, use a bin size of 2.  (Your graph may vary a bit from the example below).(Online help at https://www.tableau.com/learn/tutorials/on-demand/histograms)

b. Change the y-axis on your histogram to reflect percentages (you can do this in the Quick Table Calculation pull-down from your ROWS variable.)

c. Create a different histogram with the exact same data, with a bin size of 8. 

d. Create a dashboard with the two histograms on it, and submit a screenshot of this dashboard in your TURN IN TEMPLATE.

e. Write a few sentences - what is the difference between the two histograms? Which one would you use under which circumstances?

2. Histogram with a parameter slider for the bin size.

a. Follow the directions in the histogram video and implement an interactive parameter for the user for the bin size.  Set it so it's a slider.  To do this, you may have to right click on the "choose a bin size" parameter and make sure it's set to slider. 

b. Take two or three screenshots of your sliding bin, with two different bin sizes, and submit them in the TURN IN TEMPLATE.

c. Answer this question: how can the sliding bin size parameter help you as a data analyst?

3. Histograms across Warehouses.  Import the data given in Table 2  Data Set 10.1.2 - Warehouse Histogram Data.

a. Create a histogram of this data with a bin size of 3.  Take a screenshot.

b. Use Tableau to incorporate the additional warehouse information.  We are looking for something like the stacked histogram shown below.  (Your data may vary slightly.)  Here's what I got:

c. Look at how these histograms differ from the overall histogram you created in step a. 

d. Submit a screenshot of the overall histogram, a screenshot of your split by warehouse in the TURN IN TEMPLATE.  (Just put the two screenshots next to each other in the same box.)

e. Answer the following questions: 

f. Question 1:  If you only had the overall histogram, how would you narrate the order delivery time? 

g. Question 2:  If you then saw the by-warehouse split, how would you narrate the order delivery time?  Use the phrases "skewed left" and "skewed right" wherever applicable, and make sure you get them correct (look them up if you need to - the Internet is a great place to start.)

h. Question 3:  You have 30 seconds of the CEO's attention.  What single business action would you recommend to her based on your histograms here?

4. Boxplots. We are going to make some boxplots in Tableau!Many of you have seen boxplots before; this week, we emphasize the statistical knowledge that can be pulled from a boxplot.

a. Import the data shown in Table 3  Data Set 10.1.3- Sales Data by Time Zone for Boxplots above.  Make a boxplot of this data (online help:  https://onlinehelp.tableau.com/current/pro/desktop/en-us/help.htm#buildexamples_boxplot.html )

b. To get it to work, you want to make sure your measures aren't aggregated.   To get mine to work, I started with the Time Zone in the Columns, the Sales Volume in the Rows, and then had it make me a bar chart and then I switched the mark type to circles.  (To get a bunch of little circles, make sure the Analysis -> Aggregate Measures box is not checked.)

c. Hover over one of your boxplots and it will show you the actual data.  Here, I'm hovering over the Pacific data. In the TURN IN TEMPLATE, submit a screen print of yourself hovering over one of your boxplots.

d. The boxplots show you at a glance not only the median value (line in the middle) but also the spread and any outliers.  An outlier is something above the top "whisker" or below the bottom "whisker;" on the example chart above, there's an outlier in the Central sales (way at the bottom, where the sales are very close to 0).   Write a sentence or two describing what you see here.  In particular, do you see any differences in median, spread, or quartiles between the regions?  If you wanted to boost sales in one region, which would you pick and why?

5. Heatmaps.  A heat map conveys numeric information, using colors (or "heat") to show one of the dimensions.  We're going to make one covering those three very important pieces of data about the success metrics for all of us:  IQ, shoe size, and salary.  You can find directions from Tableau here:  https://onlinehelp.tableausoftware.com/v8.0/pro/online/en-us/buildexamples_heatmap.html

Here's how I did it:

a. Import the data in Table 4 Data Set 10.1.5  Success Metrics for Everybody into Tableau.

b. If necessary, move IQ and Shoe Size from Measures to Dimensions

c. Bin IQ into something reasonable (here I chose bin sizes of 10)

d. Make a graph with IQ on one axis and Shoe Size on the other, and use Squares as the marks:

e. Converted Annual Salary to a Continuous Dimension

f. Used Annual Salary as the color and changed its Measure to Average:

g. Make your own heatmap from the data here.  Use what you know about colors and graphs to ensure that the color scheme is highlighting what you want to highlight.  Consider the divergent color schemes to bring out interesting data.  Submit a copy of your heat map in the TURN IN TEMPLATE.

h. Write a few sentences - what can you conclude from your heat map?  What, if any, relationships did you find between IQ, shoe size, and salary in this data set?

We're now going to practice on some larger data sets.  Use the following data sets from Tableau to answer the following questions.  Answer the questions using Tableau visualizations.  Please try to answer each question with one and only one Tableau graph (but if you absolutely need more than one, go ahead and be sure to justify why you need it.)

Do not use Excel or other methods to determine your answers.

You can download the data sets from here:  https://public.tableau.com/s/resources?qt-overview_resources=1

6. Millennial vs Baby Boomer Employment Data set.  Use the "National, 5-digit" sheet.    For Baby Boomers, for 2013, what were the three biggest job titles ("Occupation" field) and how many total Baby Boomers worked in those three fields?  Please submit the Tableau screenshot(s) you used to determine this, and give a little narration.  Paste your answers in the TURN IN TEMPLATE.

7. Millennial vs Baby Boomer Employment Data set.  Use the "States" sheet.   In which state was the total Job Change the most negative (i.e. in which state were the most jobs lost between 2007 and 2013?)  How many of those were Boomer jobs vs. Millennial jobs?  Paste your answers in the TURN IN TEMPLATE.

8. Global Sport Finances data sheet, Top Athlete Salaries data.  After you connect to the data, you will need to scrub it up a bit before Tableau can work meaningfully with it.  In particular, I had to do this:

a. After I loaded the "Top Athlete Salaries" sheet, I had to split the salary data.  It comes with a "M" appended at the end of the salary, but that means Tableau will view the entire field as a text field.  We want to do number analytics on it, so I want to tell it the salaries need to be numeric.  First step:  remove the M.

b. Then, I had to convert the split field to a Measure:

c. Next, I had to continue to convince it to treat salary as a number:

d. Scrub your input data as per above.  Make a graph to answer this question:  which sport has the highest *average* pay for 2014?  What was the average pay?  Paste your graph and your answers in the TURN IN TEMPLATE.

9. Global Sport Finances data sheet, Top Athlete Salaries data.  You notice that Basketball, Cricket, and Soccer all have very similar average earnings for their players in this list, all about $30 M for 2014.  If you were told you would be given the annual salary for one athlete chosen at random from these three categories from this data set, would you choose to be given the salary of a randomly chosen basketball player, a randomly chosen cricket player, or a randomly chosen soccer player?  Why?  Make one graph to answer this question, and paste it in the TURN IN TEMPLATE.

Assignment 2 -

Take the following Python code that stores a string:

str = 'X-DSPAM-Confidence: 0.8475

Use find and string slicing to extract the portion of the string after the colon character and then use the float function to convert the extracted string into a ?oating point number

The objective here is to correctly isolate the numeric portion of the given string before applying the float() function to turn it into a floating point number. We can see here that the numeric part to be extracted appears at the end of the string. So what we need to do is to extricate the part of the string from one character position after the colon, up to the index position that represents the end of the string. The position of the colon within the string can be found using the "find" function as shown. The index position that represents the end of the string can be found using the "len" function on the whole string. Remember that index values begin at 0! Once we have stored the isolated string with the numeric part into the variable strNum, we just remove any blank space around it using "strip". If the extraction has been done correctly, this step is redundant, but it is good to ensure this before we turn the extracted string into a Python floating point number. The output from the program is shown below:

 TURN IN #1:  Severance Chapter 7 - Exercise 2 - ASSIGNMENT

Write a program to prompt for a file name, and then read through the file and look for lines of the form:

X-DSPAM-Confidence: 0.8475

When you encounter a line that starts with "X-DSPAM-Confidence:" pull apart the line to extract the floating-point number on the line. Count these lines and then compute the total of the spam confidence values from these lines. When you reach the end of the file, print out the average spam confidence.

Enter the file name: mbox.txt

Average spam confidence: 0.894128046745

Enter the file name: mbox-short.txt

Average spam confidence: 0.750718518519

Test your ?le on the mbox.txt and mbox-short.txt ?les

HINT:

1. Download the two text files: mbox.txt and mbox-short.txt from https://www.pythonlearn.com/code3/to your local machine. For ease, ensure these files reside in the same folder as the .py file for this assignment.

2. Begin writing your code by prompting the user for the file name. Use a try-except block to exit with a user-friendly error message if there is an error opening the file name specified.

3. Once the file is opened, use an iterative loop (e.g. "for" or "while") to traverse each line of the file.

4. (A quick manual exploration of mbox text files reveals that the number representing spam confidence is found at the end of the line).In each line, find the pattern "X-DSPAM-Confidence:". If this is found, extract the portion of the line after this pattern until the end of the line. "find", string extraction and "strip" functions are useful here.

5. Convert the numeric part extracted from the line into a float.

6. When the program has finished traversing each line in the specified file, total the number of lines that had the pattern and compute average spam confidence.

7. Note: In your calculation for average spam confidence, do NOT count the lines that did that not contain the pattern. Be sure to comment your program adequately!

8. Good programmers test all cases, so you should make sure you test for erroneous inputs and also test on the mbox-short.txt and the mbox.txt files.

When you are ready to check your DSPAM code, open the Assignment 10.2 DSPAM Grading.  It will give you a new mbox file to download. Run your DPSAM code on the new file, and enter the average spam confidence.

Attachment:- Assignment Files.rar

Reference no: EM131451449

Questions Cloud

Explore how the dsp is achieved from a technical perspective : Define the topic you are researching, provide examples, and explore how the DSP is achieved from a technical perspective.
Explain the given procedure : A new approach to freshness. Divide into two groups, and prepare arguments for and against the following behavior: You work in the meat department.
Which mutually exclusive project would you select : Which mutually exclusive project would you select, if both are priced at $1,000 and your required return is 15%:
Describe the term california organic : Divide into two groups and prepare arguments for and against the following behavior: You work in the accounting department of a family-owned mushroom.
What is the difference between the two histograms : UMUC Data 620 Assignment 1. Write a few sentences - what is the difference between the two histograms? Which one would you use under which circumstances
Are there similarities in the two securities types : Are there similarities in the two securities types? What are the key differences? (Do not focus on any specific investment; rather the category of investment)
Develop a policy for recruiting and hiring employees : The Vice President of HR asked you to develop a policy for recruiting and hiring employees for the hospital. State why these are important from your viewpoint.
Create an employee safety handbook : Create an employee safety handbook.Explain the need for safe work environment.Incorporate employee relations doctrines.
Define the term principled leaking : Movies like Silkwood and The Insider have portrayed whistle-blowers as lone heroes working against corrupt organizations at great personal risk.

Reviews

len1451449

4/5/2017 5:13:42 AM

You will not turn in any code or screenshots with this assignment. Instead, you will submit the results of your code to the LEO system, which will immediately let you know if you’ve earned your 10 points. You do not need to turn in anything from the walkthroughs. They are just so you can see how to do a few demonstration bits. Each student creates and submits this week’s graphs using Tableau software. Question: what is the difference between the two histograms? Which one would you use under which circumstances?

Write a Review

Computer Engineering Questions & Answers

  Question 1 consider the functions f g and h all defined on

question 1 consider the functions f g and h all defined on the set 0 1 2 3 ... 12i write down the values of ghf8 and

  What is server port number for data channel in active ftp

What is server port number for data channel in active FTP?

  Which is a physical security measure

Which of the following is a physical security measure? A(n) ____ is an electronic audio file that is posted on the Web for users to download to their mobile devices or computers.

  Iterate through vector

Iterate through vector A using a for loop and create a new vector B containing logical values. The new vector should contain true for positive values and false for all other values.

  Questionthe fibonacci sequence is series of integers0 1 1 2

questionthe fibonacci sequence is series of integers.0 1 1 2 3 5 8 13 21 34 55 89observe the pattern? each element in

  Evaluates equivalence relation

For each of the subsequent relation, evaluates if it's an equivalence relation, if it's right-invariant, and give the index if it is an equivalence relation.

  Modify a program that reads a line of text

Write down a program that reads a line of text, changes each uppercase letter to lowercase, and places each letter both in a queue and onto a stack. The program should then verify whether the line of text is a palindrome.

  Identify the key features of a laptop and a tablet next

explain the main factors that you should consider before deciding between a laptop or a tablet to buy.identify the key

  1 the degreev of a pendant vertex may be either one or

1. the degreev of a pendant vertex may be either one or zero.nbspnbspnbspnbspnbsp tnbsp ornbsp fnbsp2. a tree is any

  Explain how are information systems transforming business

prepare a 4-6 page paper that addresses the following questions in narrative form apa form and style. be sure to

  Identify the test objectives and test approaches

A typical download takes one hour, and an interrupted download must be restarted from the beginning. The number of customers downloading at the same time ranges from 10 to 100 throughout peak hours. On average, your system could go down about once..

  While design a relational database

What guidelines/steps should you follow as you design a relational database? In addition to using the course material, use the Library to find best-practice guidelines. Be sure to cite your sources.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd