Describe operations and show resulting data table

Assignment Help Applied Statistics
Reference no: EM131231200

Assignment 1:

1. Using heritage data (release 1) in SQL

a. Find support for all single itemsets

b. List all itemsets with 2 elements and support of at least 0.2

c. List all itemsets with 3 elements and support at least 0.2

2. In Weka

a. Load heritage data (release 1)

b. Apply at least two association rule generation algorithms and compare results

c. Apply FPTree algorithm with at least two measures of rule metrics

Assignment 2:

1. In SQL/Weka:

a. Prepare heritage data for classification learning

b. Load heritage data release 3 (preprocessed to binary representation, including demographics and output attribute(s))

c. Perform exploratory analysis

d. Create at least three classification models for predicting hospitalization based on Year 1 data.

e. Which model performs the best on year 2 data?

f. Create regression model for predicting hospitalization days.

g. What is the difference between regression and classification models?

h. Present your results in a form of short report that includes screenshots, tables, an d needed description.

Assignment 3:

Classification Part 2

1. Using heritage release 3 data prepared last assignment

a. Include drug information into data

b. Include laboratory information into data

c. Import newly created data into Weka and run classification algorithms

d. Does inclusion of the information improve predictions?

There are many ways to complete question 4, so you need to make different decisions.

Try not to overcomplicate the problem.

2. In Weka using heritage 3 dataset

a. Apply kmeans algorithm for k=2, 3, 5, 10

b. Apply EM algorithm. What is the optimal number of clusters obtained by EM?

c. Compare the created clusters to classification based on hospitalization in year 2.

Assignment 4:

3.Using the data table shown below.

a.Calculate distance between all points in 1
-norm, 2
-norm and infinity
-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c.Apply k
-means clustering algorithm with k=2.

Using the data table shown below.

a. Calculate distance between all points in 1-norm, 2-norm and infinity-norm. Show dissimilarity matrix.

b. Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table.

c. Apply k-means clustering algorithm with k=2.

ID

Age

BMI

Gender

Total Cholesterol

1

30

24

M

180

2

70

19

M

190

3

65

26

M

220

4

40

32

F

260

Assignment 5:

-Text Mining

1. Write regular expression to:

a. detect zip codes in text

b. Find last names of all patients whose first name is John (note that regular expressions may have some false positives/false negatives).

2. List challenges in automatically retrieving ICD-9 codes from clinical notes. Search literature for to find relevant published work. Also, include own observations and comments.

3. Using the SMS data

a. Split data into training (80%) and testing (20%) sets

b. Build naïve Bayes classifier for detecting spam based on bag of words

i. List all words in the documents

ii. Count occurrences in spam and ham

iii. Assign likelihoods P(word|spam) and P(word|ham) for all words

iv. Convert test data into list of words. For each message you need, 2 columns: message id and word

v. Classify test data. This can be done by a series of joins with the data prepared in (iii).

vi. Calculate accuracy of your model (accuracy, precision, recall)

Attachment:- Assignment 1.rar

Reference no: EM131231200

Questions Cloud

Enablers to prevention programs in managed health care plans : What do you think are some of the barriers and, more importantly, the "enablers" to prevention programs in managed health care plans to prevent diseases and how do you think those barriers could be eliminated?
Determine the illuminances on a vertical : Determine the illuminances (sun, sky, and ground-reflected) on a vertical, south-facing window at solar noon at 36°N latitude on June 21 and December 21 for
Create a balance sheet for a typical bank : Bank Balance Sheet - Create a balance sheet for a typical bank, showing its main liabilities (sources of funds) and assets (uses of funds).
Compute the inventory using lower of average cost or market : Compute the inventory for this department as of January 31, at Retail. Compute the inventory using lower of average cost or market.
Describe operations and show resulting data table : Is there any need to preprocess the data to be more suitable for clustering? If so, describe the operations and show the resulting data table - Calculate distance between all points in 1-norm, 2-norm and infinity-norm.
Discuss nature of victim participation in criminal justice : Discuss the nature of victim participation in the criminal justice process. Provide your assessment on the adequacy of this participation.
What are four major sources of funds for banks : What alternatives does a bank have if it needs temporary funds? What is the most common reason that banks issue bonds?
What are the different potential issues associated : You are a facility manager for a local high school and your facility is being activated by the American Red Cross to serve as an emergency shelter. On the “Facility as Shelter” page of your group wiki, answer the following: What are the different pot..
Calculate the auxiliary energy required in march : Using the SLR method, calculate the auxiliary energy required in March for a 2000 ft2, NLC 12,000 Btu/F-day house in Boston with a 150 ft2, night-insulated double-glazed direct gain system with 6 in thick storage floors of 45 Btu/ft2 F capacity.

Reviews

len1231200

10/5/2016 2:00:52 AM

Here you go I have the data for the first 3 assignment for now which i needed to be done by this coming Saturday and the rest I can wait for them till i got the dataset. I will upload the data set for the 1st question which i need by Fri Oct7 next week I will upload the next data set. - CLIENT TO SHARE THIS Please I need screen shot of the work as will (its required by the professor) It will look something like this

Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd