Reference no: EM133159248 , Length: 2000 words
ICT110 Introduction to Data Science - University of the Sunshine Coast
Assignment Task
You work at Real Beer as a data scientist. The product development team have approached you because they want to develop a new line of beer. Real Beer has a long history in the brewery market, but their target market has typically been pitched at the lower end. They are now looking to develop a range of beer for very discerning beer connoisseurs. This beer will be more expensive and will be sold through specialty stores or direct sales on a new website.
The product development team aren't sure what the characteristics of this new beer should have taste wise but know that they want it to have distinctive characteristics. An executive in the product development team at Real Beer head office has provided you with a dataset with most current producers and has asked you to provide a report with recommendations about what attributes this new beer could have. Note: not all columns are related to this purpose.
You need to use the data to develop a cohesive and convincing story that describes the process of finding the key features of a top beer.
First, the product development team would like to get a better understanding about what sorts of attributes top beers have. They have asked you to describe the data and find interesting phenomena.
Second, the product development team have asked you to explore the data in more detail. They would like you to use your expertise in data science to dig out anything you feel is interesting or significant. They are looking for attributes of top beers that could be put together to create a distinctive yet tasty beer.
You are required to prepare a report about your findings and to make suggestions about which attributes you would recommend be considered in the new product - whether it be based on some values, such as style, IBU, sweet, sour, etc.
The potential audiences of this report include other staff within Real Beer, such as executives or sales staff - this means that each graph will need a detailed explanation and some narrative around why or how this image adds to the story. Staff may have limited ICT or mathematical knowledge therefore the report should be technical but have clear explanations describing the findings.
To prepare the report, please include the following sections:
1. Introduction
Provide an introduction to the problem. Include background material as appropriate: who cares about this problem, what impact it has, where does the data come from, what are the dimensions and structure of the data.
2. Data Setup
Describe how to load the data, and how the pre-processing is performed.
The original dataset is not ready for analysis and it is different from the data forms that we are familiar with in previous practices. This means we need to do some pre-processing, either for the whole dataset, or for a subset of the dataset required for each sub task described later.
Once you have some ideas of exploratory or advanced analysis, you need to adjust the form of dataset. This can be achieved either by manipulating records in R by transposition or subsetting, or with other tools (e.g. notepad or excel) before reading them into R. Please clearly explain the way you have cleaned the data in this section. If you use Excel please still explain the steps that you used for cleaning.
3. Exploratory Data Analysis
Two, one-variable analyses with graphs
One-variable analysis studies one variable (one column/attribute) each time. You can choose the attribute you want to for this but the attributes you select need to add to the story you are telling about which features are keys to a top beer.
• Perform 2 one-variable analyses and graph them
• Explain the findings for each graph
• Provide the code for each graph
Two, two-variable analyses with graphs
A two-variable analysis studies the relation between two variables. It is up to you to decide which attributes/variables you use for this analysis but the attributes you select need to add to the story you are telling about which features are keys to a top beer.
• Perform 2 two-variable analyses and graph them
• Explain the findings for each graph
• Provide the code for each graph
4. Advanced Analysis
Two, Linear regression analyses with graphs
Briefly explain the concept of linear regression (with references). It is up to you to decide which attribute/s you use for this analysis. You may choose to use any two attributes for this but the values you select need to add to the story you are telling about which features are keys to a top beer.
• Perform 2 linear regression analyses and graph them
• Explain the findings for each graph
• Provide the code for each graph
Decision tree
Briefly explain the concept of decision trees (with references). It is up to you to decide which attribute/s you use for this analysis. You may choose the attributes for this but the values you select need to add to the story you are telling about which features are keys to a top beer.
• Create a decision tree and resulting visualisation
• Explain the findings for the decision tree
• Provide the code for the decision tree
5. Conclusion
Sum up your findings and provide some insight into the findings. Provide your overall recommendation/s in this section eg. which features have you selected and why.
6. Reflections
In this part, discuss any difficulties you had performing the analysis and how you solved those difficulties. Reflect on how the analysis process went for you, what you learnt, and what you might do differently next time. Aim to write 2-4 paragraphs.
Report Format
Your report should be no less than 1,200 words and it would be best to be no longer than
~2,000 words long. Texts in R code snippets are not counted.
The report MUST be formatted using the following guidelines:
1. Title Page - Include your name as the report's author.
2. Header - Report title
3. Footer - your name and the page number
4. Paragraph text - 12 point Calibri or Times New Roman single line spacing
5. Headings - In an appropriate type and size
6. Margins - 2.5cm on all margins
7. Page numbering - Introduction and onwards to use conventional numerals (1, 2, 3, 4) starting on page 1 from the introduction.
Attachment:- Introduction to Data Science.rar