Give the coordinates of the ''outliers''

Assignment Help Basic Computer Science
Reference no: EM13165453

Programs must be running, and would be demonstrated to the instructor. There is 30% weight for writing a detailed high level algorithm and 70% weight for the running code. No credit for not running code even if you have correct algorithm. All material should be neatly typed on the computer. There is a 20% bonus points for neatness of presentation.

Problem # 1. Write a K-means high level algorithm and program for clustering the N-dimensional data point in your own language. The algorithm should be able to read the data from a data file that has data in the following form. Note: you can use software library for sorting if needed.

Dimensions = <integer>

<Tuple 1> <disease>

<tuple 2> <disease>

....

<tuple N> <disease>

<Tuple> will be given as comma separated dimension coordinates starting from dimension 1. The dimension value will be given from 1..100. The <disease> will be a word for example 'diabetes', 'kidney problem', 'acidity' etc. If the <disease> column is missing, it would mean no disease is associated with that value.

The distance measure will be Euclidean that means it is square root of squares of difference of coordinate and centeroids. Your program should display the coordinates of the centroid on the screen, the threshold value you gave, and the maximum distance from the centroid to the farthest point in a cluster for all the clusters. It should also give the coordinates of the 'Outliers' in a separate output file. Outliers are those points that do not belong to any cluster.

Problem #2. Write a minimum spanning tree high level algorithm and program for clustering the N-dimensional data point in your own language. The algorithm should be able to read the data from a data file that has data in the following form. Note: Greedy algorithms are good for MST. You can use software library for sorting if needed.

Dimensions = <integer>

<Tuple 1> <disease>

<tuple 2> <disease>

....

<tuple N> <disease>

<Tuple> will be given as comma separated dimension coordinates starting from dimension 1. The dimension value will be given from 1..100. The <disease> will be a word for example 'diabetes', 'kidney problem', 'acidity' etc. If the <disease> column is missing, it would mean no disease is associated with that value.

The distance measure will be Euclidean that means it is square root of squares of difference of coordinate and centeroids. Your program should display the coordinates of the centroid on the screen, the threshold value you gave, and the maximum distance from the centroid to the farthest point in a cluster for all the clusters. It should also give the coordinates of the 'Outliers' in a separate output file. Outliers are those points that do not belong to any cluster.

Reference no: EM13165453

Questions Cloud

Once the user enters a 0 : Once the user enters a 0 you will exit the loop, close the file and execute the code as previously designed until you have displayed all of the scores and the average handicap.
Determine the number of whole units : Determine the number of whole units to be accounted for and to be assigned costs and the equivalent units of production for the Drawing Department.
Create a class account that represents a banking account. : Create a class Account that represents a banking account.
Find the number of automobiles : The mass of an average blueberry is 0.72 g and the mass of an automobile is 1900 kg. Find the number of automobiles whose total mass is the same as 1.0 {rm mol} blueberries.
Give the coordinates of the ''outliers'' : The threshold value you gave, and the maximum distance from the centroid to the farthest point in a cluster for all the clusters. It should also give the coordinates of the 'Outliers' in a separate output file. Outliers are those points that do no..
Analyze the purity of aspirin : although aspirin solutions do not absorb visible light, we are still able to use visible light in an experiment to analyze the purity of aspirin. Why?
Prepare a common-size income statement and balance sheet : Prepare a common-size income statement and balance sheet for McDonough Products. The first column of each statement should present McDonough Products common-size statement, and the second column should show the industry averages.
What is the total password population : A phonetic password generator picks two segments randomly for each six-letter password. the form of each segment is consonant, voul, consonant, where V= and C= (V)
How many grams mtbe must be present in each gasoline : If fuel mixtures are required to contain 3.0 oxygen by mass, how many grams MTBE must be present in each 110 gasoline.

Reviews

Write a Review

Basic Computer Science Questions & Answers

  The initialization program

The initialization program will start with how much you have in the cash register and in what denominations the money is in.

  Determine value of maximum element for mle is maximum

Assume that n = 5 points are drawn from distribution and maximum value of which occurs to be 0.6. Plot likelihood p(D|) in range 0  1. Describe in words why you do not need to know values.

  Explain diagnosing and troubleshooting excel-based problems

How does versatility of Excel affect application support? Because of its versatility, write assumptions should be made when diagnosing and troubleshooting Excel-based problems?

  What hit rate is required for the cache to produce a 50%

One option used to speed up disk drives is to add a cache that hold either recently accessed blocks, blocks waiting to be written to the disk, or that is used to "prefetch" blocks for a file. If the disk speed is approximately 3ms for a block and the..

  How light source is treated during ray tracing processing

Typically, this setting is left at default until final renderings are being produced. To reduce gaps or facets in rendering which setting do you adjust?

  Lid technology approach to stormwater management

LID technology is the alternative comprehensive approach to stormwater management. It can be utilized to address wide range of Wet Weather Flow (WWF) issues.

  How silicon-based semiconductors revolutionized computing

New materials frequently lead to new technologies that change society. Describe how silicon-based semiconductors revolutionized computing.

  How to choose optimal location of pipeline for oil field

A Consultant is to lay oil pipeline running east to west through the oil field of n wells. From each well spur pipeline is to be connected directly to main pipeline along shortest path.

  Explain remote batch-processing operation

A band is always equal to? In signal power as light travels down fiber is called. what does remote batch-processing operation in which data is only input to central computer would need?

  Analyze use of databases in business environment

Create the 2-3 page memorandum analyzing use of databases in the business environment. Include what database applications must be used: Microsoft Access, IBM DB2, Oracle, etc.

  Is it possible for an instruction to be receiving forwarding

Is it possible for an instruction to be receiving forwarding information and simultaneously being flushed?If possible, can you provide an example sequence of instructions?

  Compute the mips rating for processor

Determine the average CPI? On a 500MHz Pentium III program takes 1 second. Compute the MIPS rating for this processor? Determine the CPI?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd