Reference no: EM1388359
The basic task is to measure similarity between any two files in our collection. To do this, we will require a appropriate universe of words. This will consist of all words in collection that are (a) more than four letters long, (b) don't occur more than 20 times overall, and (c) do not happen in more than 7 files in collection. Now we constructor a vector (in mathematical sense) corresponding to each file. Vector will have as many coordinates as words in universe -- so there is one coordinate for each word in universe. If word occurs in file, corresponding coordinate is 1, otherwise it is 0.
Let us give example: assume universe consists of five words: apple, grapes, banana, doctor, program. Assume file1 contains: apple, banana, program. Then the vector for file1 is (1,0,1,0,1).
We require to normalize each of vectors so that it has unit length. So each coordinate in above vector gets divided by square root of 3.
Similarity of two files is defined to be scalar product of corresponding two vectors. Scalar product of two vectors is obtained by multiplying corresponding components and adding. For instance, scalar product of (2,1,3) and (0,5,6) is 2 * 0 + 1 * 5 + 3 * 6.
Your task is to write down the program which prints names of two files with highest similarity among files in collection, and names of two files with lowest similarity.
Market following a weibull distribution
: A Manager needs to decide between two machines to put into market following a Weibull distribution. Machine X test unit cost $3000 with beta=3 and theta=500 Machine Y test unit cost $2000 with beta=3 and theta=400
|
Issues of health care legal liability
: As a new member of the Institutional Policy Review Team, you're seeking information about institutional, professional, and personal ethical standards and dilemmas with respect to privacy of medical information, professional and personal ethical st..
|
Determine the equation of the line
: You are estimating the cost ($K) of optical sensors based on the power output of the sensor. Using the preliminary calculations from a data set of 8 sensors, determine the equation of the line. (Round your intermediate calculations to 3 decimal pl..
|
A business organization intends to develop a new e-commerce
: A business organization intends to develop a new e-commerce Web site to enable its customers to make online purchases of computers in a quicker and more efficient manner
|
Write program to print names of files with similarity
: Write down the program which prints names of two files with highest similarity among files in collection, and names of two files with lowest similarity.
|
Compare an experimental medication
: A clinical trial is organized to compare an experimental medication designed to lower blood pressure to a placebo. Before starting the trial, a pilot study is conducted involving ten participants.
|
Null and alternative hypothesis
: what statement should be made about the null and alternative hypothesis based on sample data and significance level?
|
Productivity is measured by the ratio of outputs
: Productivity is measured by the ratio of outputs to inputs. Some organizations use a partial measure of productivity to measure actual operations, such as a restaurant using number of customer meals per labor hour.
|
Measurement process-improvement process
: Organization selected for the project is a Pharmaceuticals company. I want help in finding information for section six (Measurement process) and seven (Improvement process). If you could provide me some ideas and push me in right direction, I woul..
|