TEB2164 Introduction to Data Science Assignment

Assignment Help Programming Languages
Reference no: EM132847516

TEB2164 Introduction to Data Science

Question 1.

a. Consider the structure of training data with 32 rows as shown in TABLE Q1 for a classification problem with four possible classes.

TABLE Q1: Structure of Training Data

ID

<attr1>

<attr2>

<attr3>

<attr4>

Class

1

 

 

 

 

 

2

 

 

 

 

 

3

 

 

 

 

 

4

 

 

 

 

 

5

 

 

 

 

 

6

 

 

 

 

 

7

 

 

 

 

 

8

 

 

 

 

 

9

 

 

 

 

 

10

 

 

 

 

 

11

 

 

 

 

 

12

 

 

 

 

 

13

 

 

 

 

 

14

 

 

 

 

 

15

 

 

 

 

 

16

 

 

 

 

 

17

 

 

 

 

 

18

 

 

 

 

 

19

 

 

 

 

 

20

 

 

 

 

 

21

 

 

 

 

 

22

 

 

 

 

 

23

 

 

 

 

 

24

 

 

 

 

 

25

 

 

 

 

 

26

 

 

 

 

 

27

 

 

 

 

 

28

 

 

 

 

 

29

 

 

 

 

 

30

 

 

 

 

 

31

 

 

 

 

 

32

 

 

 

 

 

Fill up appropriate headers for <attr1>, <attr2>, attr3> and <attr4> according to attributes of your own training data. Headers for ID and Class have been provided. Later, fill up data corresponding to the four attributes and the four-option Class. The type of data for <attr1> is binary, <attr2> is continuous, <attr3> is nominal, and <attr4> is ordinal. The values for each attribute must be diverse.

Based on the training data that have been furnished, compute the Entropy for the overall collection of the training data, the attribute with binary data, the attribute with continuous data using multiway split, the attribute with nominal data using multiway split, and the attribute with ordinal data using multiway split. The split breakdowns for attributes requiring multiway split must be clearly indicated. Lastly, suggest with justification which attribute in the training data is the most heterogenous.

b. Suppose that you have been hired by a digital news agency to summarize top-10 daily news on a specific vertical such as computing, medicine, finance, entertainment, or law in Malaysia. As a junior data scientist, suggest a complete text mining process that you will perform to achieve the goal.

Question 2. TABLE Q2 displays an unfilled temperature readings summary from ABC weather station in East Borneo comparing October, November, and December from 1990 till 1999 to that of from 2010 till 2019. The table should display the number of months in which the average maximum daily temperature was low (< 16°C), medium, or high (> 26°C). The investigation aims to discover whether a significant difference between the two rows exists.

TABLE Q2: Temperature Readings Summary

 

Low

Medium

High

1990 - 1999

 

 

 

2010 - 2019

 

 

 

Firstly, furnish TABLE Q2 with data in the Low, Medium, and High columns. The data for 1990-1999 must be unique from that of 2010-2019.

Assuming that the readings are independent from month to month, let unknown parameters pd,m be the probability that a month's reading goes to bin m ∈ {Low, Medium, High} in decade d ∈ {1990 - 1999, 2010 - 2019}. As a junior data scientist, you have been requested to (i) provide expressions for the maximum likelihood estimates ^Pd,m, stating what to maximize and over which variables, (ii) establish a null hypothesis H0 such that the probabilities are identical in both 1990-1999 and 2010-2019 and these probabilities are called qk to provide the maximum likelihood estimates ^qk under H0, perform a test onto H0 using the test statistics given as

t = ∑  (^Pd,m - ^qm)2/^qm
     d,m

and (iii) considering parametric sampling to compute the distribution for t under H0. Additionally, your tasks also include (iv) explaining the relevance of one- sided test vs two-sided test for this investigation, (v) providing pseudocode to compute the p-value for the H0 test, and finally (vi) explaining an advantage and a disadvantage of a count-based test as opposed to a linear regression-based test.

Reference no: EM132847516

Questions Cloud

Discusses the high cost of employee absenteeism : Evaluate an article from the South University Library or Harvard Business Review that discusses the high cost of employee absenteeism.
What is the z-score for a worker : What is the z-score for a worker that works 50 hours per week?
What is the value of c : Let X be a continuous random variable with density function f(x) = ce^(-x) , x > 0, 0, otherwise, where c > 0.
Identifying and measuring employee absence : Explain the Understanding the Financial Impact of Employee Absence. Identifying and Measuring Employee Absence. Employee Absence, Productivity Loss, and Costs.
TEB2164 Introduction to Data Science Assignment : TEB2164 Introduction to Data Science Assignment - Perform a test onto H0 using the test statistics given - count-based test as opposed to a linear regression
Confidence interval for the mean time : a. Check the conditions for constructing a 95% confidence interval for the mean of all body temperatures.
Draw a tree diagram showing the possibilities : (a) Draw a tree diagram showing the possibilities for each outcome. (b) Create the binomial distribution table for P(X)
Evaluate the reasons for employee absenteeism : Evaluate the reasons for employee absenteeism. Analyze the direct costs and indirect costs associated with absenteeism (provide examples).
Determine the cost of inventory at the end of March : According to a physical count, 600 units were on hand at the end of March. Determine the cost of inventory at the end of March applying the FIFO method

Reviews

Write a Review

Programming Languages Questions & Answers

  Develop a text-based version of the TowerDefence game

SBM 4103 Introduction to Programming Assignment, Asia Pacific International College, Australia. Develop a text-based version of the TowerDefence game

  Create a design for simulating the behavior of an elevator

You are to create an object-oriented design for simulating the behavior of an elevator. Your design should consist of an appropriate use case diagram.

  What innovation introduced in algol68 is credited to pascal

If the left-hand side (LHS) appears in the right-hand side (RHS) of a rule, it is a(n):

  Design a system for the organization

explain the main systems in place, discuss the integration, and explain how the data collected by these systems helps the organization make decisions and carry out the strategic plan.

  Create a very simple four function integer calculator

Create a very simple four function integer calculator with buttons for Add, Subtract, Multiply, and Divide, and with two text-type input fields. When the user enters two numbers and clicks one of the buttons, the answer is displayed where shown.

  What is the difference between a source file and an object

What is the difference between a source file and an object file?

  Complete various exercises in bluej using the java language

CSC72003 - Programming II - Southern Cross University - create a new instance of the Game class, run the play method and familiarize yourself with the game

  Write program to open file for reading

Write the program to open file for reading which has twenty (20) rows and in each row there are three (3) columns. After reading each row call user-defined function to display each row.

  Construct a markov algorithm

Construct a Markov Algorithm that will reverse the order of an input string that consists of zero or more upper case letters - It will always have the following effect: whatever is on the RHS of that rule (except the period, if there is one) will b..

  Write program to take as input salesperson-s expected sales

Write program to take as input the salesperson's expected weekly sales and outputs the wages paid under each plan as well as announcing the best-paying plan.

  Compute the total sales and commission rate applied

Write a program that prompts a salesman to enter his-her status and total sales. Compute the following: their status; total sales; commission rate applied; the commission ($) earned (the appropriate rate times the sales)

  Show the graphics simulation of drinks machine

When a coin is clicked on with the mouse it is placed into the slot, it then operates the coin sensor IP0. This should operate OP0 (coin hold solenoid), which will hold the coin in place. At this point either a drink is selected or the coin rejected.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd