Reference no: EM132238820
Project: Jupyter Notebook (Python)
Topic: "Is online education growing in demand versus traditional classroom education?"
1. Compare Online versus Classroom education.
2. Compare age, sex and state of students.
3. Analyze income to Online versus Classroom education, are people choosing cheaper Online classroom versus traditional accredit university.
4. What is the completion rate in academics?
You will also be required to select a dataset, form a question or hypothesis of that dataset and then using the various techniques toward proving or disproving your hypothesis. What will be required of you is a summary of your results along the way, along with a written and visual presentation of your findings.
The first step is selecting a dataset and forming your question/hypothesis. There are no restrictions on what dataset you use, other than you cannot use datasets used primarily in the previous exercises: National Survey of Family Growth and the Behavioral Risk Factor Surveillance System.
Project Overview: Here are some of the milestones to help you gauge where you should be at during the project.
1) Evaluate datasets, start thinking of statistical questions.
2) Select a dataset, solidify your statistical question, begin describing the single variables in your dataset to determine which variables are relevant to your question (Distributions, PMFs, CDFs). You should know your statistical question you are trying to answer by no later than Week 3.
3) Start identifying relationships between the variables you have identified vs looking at just one variable at a time.
4) Start evaluating if the results you are seeing in a sample would happen in the large population and start testing out the results and hypothesis you have made up to this point.
5) Wrap up your summarization of analysis.
The following is required in the project:
- Your dataset
- A minimum of 5 variables in your dataset used during your analysis (for help with selecting, the author made his selection on page 6 of your book). Consider what you think could have an impact on your question - remember this is never perfect, so don't be worried if you miss one (Chapter 1).
- Describe what the 5 variables mean in the dataset.
- Include a histogram of each of the 5 variables - in your summary and analysis, identify any outliers and explain the reasoning for them being outliers and how you believe they should be handled
- Include the other descriptive characteristics about the variables: Mean, Mode, Spread, and Tails.
- Compare two scenarios in your data using a probability mass function (PMF). Reminder, this isn't comparing two variables against each other - it is the same variable, but a different scenario. Almost like a filter. The example in the Think Stats book is first babies compared to all other babies, it is still the same variable, but breaking the data out based on criteria we are exploring.
- Create one Cumulative Distribution Functions (CDF) with one of your variables, what does this tell you about your variable and how does it address the question you are trying to answer.
- Plot 1 analytical distribution and provide your analysis on how it applies to the dataset you have chosen.
- Create two scatter plots comparing two variables and provide your analysis on correlation and causation. Remember, covariance, Pearson's correlation, and Non-Linear Relationships should also be considered during your analysis.
- Conduct a test on your hypothesis using one of the methods.
- For this project, conduct a regression analysis on either one dependent and one explanatory variable, or multiple explanatory variables.
Using Python, submit your results via your Jupyter Notebook.
A 250-500-word paper summarizing the following:
1) Statistical/Hypothetical Question
2) Outcome of your EDA
3) What do you feel was missed during the analysis?
4) Were there any variables you felt could have helped in the analysis?
5) Were there any assumptions made you felt were incorrect?
6) What challenges did you face, what did you not fully understand?
Attachment:- Assignment File.rar