CS 6140 Data Mining Assignment

Assignment Help Python Programming
Reference no: EM132784901

CS 6140 Data Mining - The University of Utah

Hash Functions and PAC Algorithms

Overview

In this assignment you will experiment with random variation over discrete events.

It will be very helpful to use the analytical results and the experimental results to help verify the other is correct. If they do not align, you are probably doing something wrong (this is a very powerful and important thing to do whenever working with real data).

if your assignment is difficult to read or hard to follow. Find a sample form in: Canvas -> Files -> Assignments -> Assignment Latex Template.zip. As usual, it is recommended that you use LaTeX for this assignment. If you do not, you may lose points

Birthday Paradox

Consider a domain of size n = 3000.

A: Generate random numbers in the domain [n] until two have the same value. How many random trials did this take? We will use k to represent this value.

B: Repeat the experiment m = 200 times, and record for each time how many random trials this took. Plot this data as a cumulative density plot where the x-axis records the number of trials required k, show a curve that starts at a y value of 0, and increases as k increases, and eventually reaches a y value of 1. and the y-axis records the fraction of experiments that succeeded (a collision) after k trials. The plot should

C: Empirically estimate the expected number of k random trials in order to have a collision. That is, add up all values k, and divide by m.

D: Describe how you implemented this experiment and how long it took for m = 200 trials.
Show a plot of the run time as you gradually increase the parameters n and m. (For at least 3 fixed values of m between 200 and 10,000, plot the time as a function of n.) You should be able to reach values of n = 1,000,000 and m = 10,000.

Coupon Collectors

Consider a domain [n] of size n = 200.
A: Generate random numbers in the domain [n] until every value i [n] has had one random number equal to i. How many random trials did this take? We will use k to represent this value.

B: Repeat step A for m = 300 times, and for each repetition record the value k of how many random trials we required to collect all values i ∈ [n]. Make a cumulative density plot as in 1.B.

C: Use the above results to calculate the empirical expected value of k.

D: Describe how you implemented this experiment and how long it took for n = 200 and m = 300 trials. Show a plot of m between 300 and 2,000, plot the time as a function of n.) You should be able to reach n = 10,000 and m = 2,000. Show a plot of the run time as you gradually increase the parameters n and m. (For at least 3 fixed values

Comparing Experiments to Analysis
A. random trials needed so there is a collision with probability at least 0.5 when the domain size is n = 3000. A: (15 points) Calculate analytically (using formulas from the notes in L2 or M4D book) the number of There are a few formulas stated with varying degree of accuracy, you may use any of these - the more
accurate formula, the more sure you may be that your experimental part is verified, or is not (and thus you need to fix something).
[Show your work, including describing which formula you used.]
How does this compare to your results from 1.C?

number of random trials before all elements are witnessed in a domain of size n = 200? Again, there are a B: (15 points) Calculate analytically (using formulas from the notes in L2 or M4D book) the expected few formulas you may use - the more accurate, the more confidence you might have in your experimental part.
[Show your work, including describing which formula you used.]
How does this compare to your results from 2.C?

BONUS : PAC Bounds
i ∈ [n] with probability 1/n. Let fi denote the number of trials that have value i. Note that for each i ∈ [n] we have E[fi] = k/n. Let µ = maxi∈[n] fi/k. Consider a domain size n and let k be the number of random trials run, where each trial obtains each value
Consider some parameter ε (0, 1). As a function of parameter ε, how large does k need to be for Pr[ µ 1/n ε] 0.02? That is, how large does k need to be for all counts to be within (ε 100)% of the average with probability 0.02? (Fine print: you don't need to calculate this exactly, but describe a bound as a function of ε for the value k which satisfies PAC property. Chapter 2.3 in the M4D book should help.)

How does this change if we want Pr[ µ 1/n ε] 0.002 (that is, only 0.002 probability of exceeding ε error)?

Attachment:- Data Mining.rar

Reference no: EM132784901

Questions Cloud

What is opinion on the issue : Some top-level managers prohibit workers from telecommuting because they think such activity interferes with teamwork. What is your opinion on the issue?
Alphabet eyes new frontiers case : Was Alphabet providing the resources startup venture desperately needed to succeed, or would the conglomerate structure stifle innovation?
Discuss the importance of keeping score in business : Question - Discuss the importance of "keeping score" in business. Discuss what information should be shown and to whom it should be shown
Role of social media in global business communication : Discuss the role of social media in global business communication. Discuss the noise that might arise if a manager texts employees in the company
CS 6140 Data Mining Assignment : CS 6140 Data Mining Assignment Help and Solution, The University of Utah - Assessment Writing Service - Describe how you implemented this experiment
Make amortization table on semi-annual payment base for note : Mr. Shahid Iqbal bought furniture from ideal, Calculate Semi-Annual Installment amount. Make Amortization table on Semi-Annual payment base for notes.
Explain how people from different countries : Explain how people from different countries who speak the same language may still miscommunicate.
Find the amount of net cash flows from operating activities : Using the following information compute the amount of net cash flows from operating activities. Find the amount of net cash flows from operating activities
Calculate semi-annual installment amount : Calculate Semi-Annual Installment amount. Make loan Amortization table on Semi-Annual payment base. Make all necessary journal entries.

Reviews

Write a Review

Python Programming Questions & Answers

  Write a python program to implement the diff command

Without using the system() function to call any bash commands, write a python program that will implement a simple version of the diff command.

  Write a program for checking a circle

Write a program for checking a circle program must either print "is a circle: YES" or "is a circle: NO", appropriately.

  Prepare a python program

Prepare a Python program which evaluates how many stuck numbers there are in a range of integers. The range will be input as two command-line arguments.

  Python atm program to enter account number

Write a simple Python ATM program. Ask user to enter their account number, and print their initail balance. (Just make one up). Ask them if they wish to make deposit or withdrawal.

  Python function to calculate two roots

Write a Python function main() to calculate two roots. You must input a,b and c from keyboard, and then print two roots. Suppose the discriminant D= b2-4ac is positive.

  Design program that asks user to enter amount in python

IN Python Design a program that asks the user to enter the amount that he or she has budget in a month. A loop should then prompt the user to enter his or her expenses for the month.

  Write python program which imports three dictionaries

Write a Python program called hours.py which imports three dictionaries, and uses the data in them to calculate how many hours each person has spent in the lab.

  Write python program to create factors of numbers

Write down a python program which takes two numbers and creates the factors of both numbers and displays the greatest common factor.

  Email spam filter

Analyze the emails and predict whether the mail is a spam or not a spam - Create a training file and copy the text of several mails and spams in to it And create a test set identical to the training set but with different examples.

  Improve the readability and structural design of the code

Improve the readability and structural design of the code by improving the function names, variables, and loops, as well as whitespace. Move functions close to related functions or blocks of code related to your organised code.

  Create a simple and responsive gui

Please use primarily PHP or Python to solve the exercise and create a simple and responsive GUI, using HTML, CSS and JavaScript.Do not use a database.

  The program is to print the time

The program is to print the time in seconds that the iterative version takes, the time in seconds that the recursive version takes, and the difference between the times.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd