Plot the distribution of the rate using histograms

Assignment Help Python Programming
Reference no: EM131549098

Exercise detail

For our first project, we're going to take a look at SAT scores around the United States. We'll be exploring this data to see what we can learn using the descriptive statistics skills covered this week. Your client, the College Board, is expecting some pretty graphs to add to their presentations this year, so don't let them down!

Goal: A Jupyter notebook that describes your data with visualizations & statistical analysis.

Goal: A five to seven minute presentation targeted to your hypothetical client that highlights your findings.

Requirements

Your work must:

Describe your data
Perform methods of exploratory data analysis, including:
Use Matplotlib to create visualizations
Use NumPy to apply basic summary statistics: mean, median, mode
Determine if the dataset appears to follow a normal distribution
Bonus:

Recreate all of your MatPlotLib graphs in Seaborn!
Use Tableau (public) to create visualizations!
Create a blog post of at least 500 words (and 1-2 graphics!) describing your data, analysis, and approach. Link to it in your Jupyter notebook.
Using existing features, engineer a new feature
Necessary Deliverables / Submission

Materials must be submitted in a clearly commented Jupyter notebook.
Notebook must be submitted via a GitHub pull request to the instructor's repo (the same way you submit labs).
Presentation must be submitted via slack (for a powerpoint file) or shared via a google slides Link
Materials must be submitted by 9:00 AM on Friday, June 30.
Starter code

For this project we will be using a Jupyter notebook. This notebook will use matplotlib for plotting and visualizing our data. This type of visualization is handy for prototyping and quick data analysis. We will discuss more advanced data visualizations for disseminating your work.

Open the starter code instructions in a Jupyter notebook.

Dataset

Dataset: SAT Scores
This data, taken from the College Board, gives the mean SAT math(s) and verbal scores, and the participation rate for each state and the District of Columbia for the year 2001.

Suggested Ways to Get Started

Read in your dataset.
Try out a few NumPy commands to describe your data.
Write pseudocode before you write actual code. Thinking through the logic of something helps.
Read the docs for whatever technologies you use. Most of the time, there is a tutorial that you can follow, but not always, and learning to read documentation is crucial to your success!
Document everything.
Useful Resources

How to find the data you need
How to give a good lightning talk
Presentation Structure

5-7 minutes long.
Use Powerpoint or some other visual aid.
Consider the audience. Assume you are presenting to non-technical executives with the College Board (the organization that administers the SATs).
Start with the guiding question/big idea.
Talk about your procedure/methodology (high level, no need to show code unless you found a useful method to share).
Talk about your findings/answers to prompts (include visuals).
Conclude - highlight any next steps, further questions, what you would do with more time, additional data that would be useful.
Be sure to rehearse and time your presentation before class.

Project Feedback + Evaluation

Your instructors will score you using the scale below:

Score | Expectations
----- | ------------
**0** | _Incomplete._
**1** | _Does not meet expectations._
**2** | _Meets expectations, good job!_
**3** | _Exceeds expectations, you wonderful creature, you!_
This will serve as a helpful overall gauge of whether you met the project goals!

STEP 1 STARTER CODE INSTRUCTIONS

Step 1: Open the sat_scores.csv file. Investigate the data, and answer the questions below.

1. What does the data describe?

In [ ]:
## your answer here
2. Does the data look complete? Are there any obvious issues with the observations?

In [ ]:
## your answer here
3. Describe in words what each variable(column) is.

In [ ]:
## your answer here
Step 2: Load the data.

4. Load the data into a list of lists

In [ ]:

5. Print the data

In [ ]:

6. Extract a list of the labels from the data, and remove them from the data.

In [ ]:

7. Create a list of State names extracted from the data. (Hint: use the list of labels to index on the State column)

In [ ]:

8. Print the types of each column

In [ ]:

9. Do any types need to be reassigned? If so, go ahead and do it.

In [ ]:

10. Create a dictionary for each column mapping the State to its respective value for that column.

In [ ]:

11. Create a dictionary with the values for each of the numeric columns

In [ ]:

Step 3: Describe the data

12. Print the min and max of each column

In [ ]:

13. Write a function using only list comprehensions, no loops, to compute Standard Deviation. Print the Standard Deviation of each numeric column.

In [ ]:

Step 4: Visualize the data

14. Using MatPlotLib and PyPlot, plot the distribution of the Rate using histograms.

In [ ]:

15. Plot the Math(s) distribution

In [ ]:

16. Plot the Verbal distribution

In [ ]:

17. What is the typical assumption for data distribution?

In [ ]:

18. Does that distribution hold true for our data?

In [ ]:

19. Plot some scatterplots. BONUS: Use a PyPlot figure to present multiple plots at once.

In [ ]:

20. Are there any interesting relationships to note?

In [ ]:

21. Create box plots for each variable.

In [ ]:

BONUS: Using Tableau, create a heat map for each variable using a map of the US.

In [ ]:

DATA

State Rate Verbal Math
CT 82 509 510
NJ 81 499 513
MA 79 511 515
NY 77 495 505
NH 72 520 516
RI 71 501 499
PA 71 500 499
VT 69 511 506
ME 69 506 500
VA 68 510 501
DE 67 501 499
MD 65 508 510
NC 65 493 499
GA 63 491 489
IN 60 499 501
SC 57 486 488
DC 56 482 474
OR 55 526 526
FL 54 498 499
WA 53 527 527
TX 53 493 499
HI 52 485 515
AK 51 514 510
CA 51 498 517
AZ 34 523 525
NV 33 509 515
CO 31 539 542
OH 26 534 439
MT 23 539 539
WV 18 527 512
ID 17 543 542
TN 13 562 553
NM 13 551 542
IL 12 576 589
KY 12 550 550
WY 11 547 545
MI 11 561 572
MN 9 580 589
KS 9 577 580
AL 9 559 554
NE 8 562 568
OK 8 567 561
MO 8 577 577
LA 7 564 562
WI 6 584 596
AR 6 562 550
UT 5 575 570
IA 5 593 603
SD 4 577 582
ND 4 592 599
MS 4 566 551
All 45 506 514

MISCELLANEOUS (NOT NEEDED DATA)

# OSX DS Store
.DS_Store

# Byte-compiled / optimized / DLL files
__pycache__/
*.py[cod]
*$py.class

# C extensions
*.so

# Distribution / packaging
.Python
env/
build/
develop-eggs/
dist/
downloads/
eggs/
.eggs/
lib/
lib64/
parts/
sdist/
var/
*.egg-info/
.installed.cfg
*.egg

# PyInstaller
# Usually these files are written by a python script from a template
# before PyInstaller builds the exe, so as to inject date/other infos into it.
*.manifest
*.spec

# Installer logs
pip-log.txt
pip-delete-this-directory.txt

# Unit test / coverage reports
htmlcov/
.tox/
.coverage
.coverage.*
.cache
nosetests.xml
coverage.xml
*,cover
.hypothesis/

# Translations
*.mo
*.pot

# Django stuff:
*.log
local_settings.py

# Flask stuff:
instance/
.webassets-cache

# Scrapy stuff:
.scrapy

# Sphinx documentation
docs/_build/

# PyBuilder
target/

# IPython Notebook
*.ipynb_checkpoints

# pyenv
.python-version

# celery beat schedule file
celerybeat-schedule

# dotenv
.env

# virtualenv
venv/
ENV/

# Spyder project settings
.spyderproject

# Rope project settings
.ropeproject

Verified Expert

In this work we had to use Python and Python Libraries to analyze and visualize a set of data containing the SAT Scores from the year 2001. Various data visualization libraries such as MatPlotLib and Seaborn are used along with Tableau Public. The program is written using advanced features of Python like List Comprehensions. All the work has been done in Jupyter Notebook. The required Python libraries to be install are : NumPy, Seaborn and MatPlotLib.

Reference no: EM131549098

Questions Cloud

Working for a medium-to-large aviation-related enterprise : You are working for a medium-to-large aviation-related enterprise and thanks to your ERAU College of Business degree, you have substantial knowledge of MIS.
How did african americans shape the outcome of the military : Black Military Service How did African Americans shape the outcome of the military history of the Civil War? Use specific examples.
One-year-long forward contract on non-dividend-paying stock : A one-year-long forward contract on a non-dividend-paying stock is entered into when the stock price is $50 and the risk-free interest rate is 5% per annum.
Calculate the operating cash flow for year : Calculate the operating cash flow for Year 1.
Plot the distribution of the rate using histograms : Write a function using only list comprehensions, no loops, to compute Standard Deviation. Print the Standard Deviation of each numeric column.
Plot a graph of load versus deflection from your data : Results (presentation of experimental and theoretical results, plotting of graphs/figures, generating tables, short calculations etc.)
Advantage of the industrial internet : What are the key performance measures that GE should use to assess whether it has been successful in taking advantage of the industrial internet?
Terms of dollars and terms of pounds : What should Dash do to act on his speculation by using futures, in terms of dollars and in terms of pounds?
Explain the most critical market impact : Explain The most critical market impact. An analysis of leadership within the organization and its ongoing role in innovation moving forward.

Reviews

inf1549098

7/11/2017 4:26:16 AM

With regards to NumPy all you have to do in your sheet/tool is use the function that imports NumPy before proceeding with the code. Not exactly sure how to import the NumPy library but I believe its as simple as the code - import NumPy and then you can proceed with rest of the code per the question. Please search through other lectures or readme filesnin the repo that may reference importing NumPy. Sometimes jupyter notes which is used in the class may already have NumPy imported. Regarding the Tableau Heat Map, as per instructions it has to be done on the Tableau public cloud. I Here we need to create an account on https://public.tableau.com/ to create visualization. The code has to executed in Jupyter Notebook. Regarding the slides, the images f the graphs can be extracted from the compiled .html file and put in slides. Graph images has to put in slides with some explanation added to them. Also share instructions on how I can open the attachment you sent in jupyter notebook.

inf1549098

7/11/2017 4:26:05 AM

Requirement for the task:- 1. Classroom materials, 2. The data source link. 3. Possible authentication is required in the system and 4. the sat_scores.csv file. Instructions:- Go to this link and under repositories go to Project 1 to view all details and source files needed. Jjupyter notes is what we use to open file from terminal. https://git.generalassemb.ly/DSI-DC-5 While working on the task, expert does not need to perform and compile this in jupyter notebook or any software. He just needs to take the code and run it in his own preferred tool and return to me in text format while I can access the system myself and push the projects. All the codes will be in Python. Regarding objective #13, please clarify if NumPy library functions can be used. The use of NumPy as its use is not specified in the Started Code. As per understanding, the terminologies used in the Starter Code require all the data cleaning, transformation, and analysis work to be done using plain python only, except for the data visualization, but the assignment document requires the use of NumPy.

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd