Write a function that has as input two row numbers

Assignment Help Applied Statistics
Reference no: EM131918872

Question - The city of Pittsburgh, Pennsylvania, lies where three rivers, the Allegheny, Monongahela and Ohio, meet. It has long been important to build bridges there, to enable its residents to cross the rivers safely. See List of bridges of Pittsburgh Wikipedia page for a listing (with pictures) of the bridges. The data contains detail for a large number of past and present bridges in Pittsburgh. All the variables we will use are categorical.

Here they are:

  • id identifying the bridge (we ignore)
  • river: initial letter of river that the bridge crosses
  • location: a numerical code indicating the location within Pittsburgh (we ignore)
  • erected: time period in which the bridge was built (a name, from CRAFTS, earliest, to MODERN, most recent.
  • purpose: what the bridge carries: foot traffic ("walk"), water (aqueduct), road or railroad.
  • length categorized as long, medium or short.
  • lanes of traffic (or number of railroad tracks): a number, 1, 2, 4 or 6, that we will count as categorical.
  • clear g: whether a vertical navigation requirement was included in the bridge design (that is, ships of a certain height had to be able to get under the bridge). I think G means "yes".
  • t_d: method of construction. DECK means the bridge deck is on top of the construction, THROUGH means that when you cross the bridge, some of the bridge supports are next to you or above you.
  • material the bridge is made of: iron, steel or wood.
  • span: whether the bridge covers a short, medium or long distance.
  • rel_l: Relative length of the main span of the bridge (between the two central piers) to the total crossing length. The categories are S, S-F and F. I don't know what these mean.
  • type of bridge: wood, suspension, arch and three types of truss bridge: cantilever, continuous and simple.

The website SteelConstruction is an excellent source of information about bridges.

(a) The bridges are stored in CSV format. Some of the information is not known and was recorded in the spreadsheet as ?. Turn these into genuine missing values by adding na="?" to your file-reading command. Display some of your data, enough to see that you have some missing data.

(b) The R function complete.cases takes a data frame as input and returns a vector of TRUE or FALSE values. Each row of the data frame is checked to see whether it is "complete" (has no missing values), in which case the result is TRUE, or not (has one or more missing values), in which case the result is FALSE. Add a new column called is complete to your data frame that indicates whether each row is complete. Save the result, and then display (some of) your length column along with your new column. Do the results make sense?

(c) Create the data frame that will be used for the analysis by picking out only those rows that have no missing values. (Use what you have done so far to help you.)

(d) We are going to assess the dissimilarity between two bridges by the number of the categorical variables they disagree on. This is called a "simple matching coefficient", and is the same thing we did in the question about clustering fruits based on their properties. This time, though, we want to count matches in things that are rows of our data frame (properties of two different bridges), so we will need to use a strategy like the one I used in calculating the BrayCurtis distances.

First, write a function that takes as input two vectors v and w and counts the number of their entries that differ (comparing the first with the first, the second with the second, . . . , the last with the last. I can think of a quick way and a slow way, but either way is good.) To test your function, create two vectors (using c) of the same length, and see whether it correctly counts the number of corresponding values that are different.

(e) Write a function that has as input two row numbers and a data frame to take those rows from. The function needs to select all the columns except for id, location and is complete, select the rows required one at a time, and turn them into vectors. (There may be some repetitiousness here. That's OK.) Then those two vectors are passed into the function you wrote in the previous part, and the count of the number of differences is returned. This is like the code in the Bray-Curtis problem. Test your function on rows 3 and 4 of your bridges data set (with the missings removed).

There should be six variables that are different.

(f) Create a matrix or data frame of pairwise dissimilarities between each pair of bridges (using only the ones with no missing values). Use loops, or crossing and map2 int, as you prefer. Display the first six rows of your matrix (using head) or the first few rows of your data frame. (The whole thing is big, so don't display it all.)

(g) Turn your matrix or data frame into a dist object. Do not display your distance object.

(h) Run a cluster analysis using Ward's method, and display a dendrogram. The labels for the bridges (rows of the data frame) may come out too big; experiment with a cex less than 1 on the plot so that you can see them.

(i) How many clusters do you think is reasonable for these data? Draw them on your plot.

(j) Pick three bridges in the same one of your clusters (it doesn't matter which three bridges or which cluster). Display the data for these bridges. Does it make sense that these three bridges ended up in the same cluster? Explain briefly.

Finish Question 8 - d, e, f, g, give me both R code and output.

Attachment:- Assignment Files.rar

Reference no: EM131918872

Questions Cloud

How can sales promotion reinforce a brand image : How can sales promotion reinforce a brand's image? Is this a major objective of sales promotion? Compare sweepstakes, contests, and games in terms.
Biggest employment challenge at organization : In a study of 405 nonprofits? nationwide, 87 indicated that turnover has been the biggest employment challenge at their organization. Complete parts? (a)
Capital budgeting apply to both foreign-domestic operations : How do international factors affect decision making? Although the same basic principles of capital budgeting apply to both foreign and domestic operations,
Write a discussion response about the technological issues : Write a discussion board response to the claim that we should have more courses that are focused on technological issues that are presented in shows like Black
Write a function that has as input two row numbers : Write a function that has as input two row numbers and a data frame to take those rows from. The function needs to select all the columns except for id
What sample size is? needed : If the manager of a bottled water distributor wants to? estimate, with 90?% ?confidence, the mean amount of water in a? 1-gallon bottle to within ±0.004 gallons
What is the chance the baby will be a carrier of the disease : Draw a Punnett square to determine the likelihood of Marsha and Clement. What is the chance the baby will be a carrier of the disease, just like the parents?
Buy-sell to delta hedge this position : How many shares of stock should you buy/sell to delta hedge this position?
Why a neuron normally does not transform into a tumor : ANATOMY AND PHYSIOLOGY II - Why a neuron normally does not transform into a tumor and What other signs are commonly seen apart from blepharoptosis


Write a Review

Applied Statistics Questions & Answers

  Hypothesis testing

What assumptions about the number of pedestrians passing the location in an hour are necessary for your hypothesis test to be valid?

  Calculate the maximum reduction in the standard deviation

Calculate the maximum reduction in the standard deviation

  Calculate the expected value, variance, and standard deviati

Calculate the expected value, variance, and standard deviation of the total income

  Determine the impact of social media use on student learning

Research paper examines determine the impact of social media use on student learning.

  Unemployment survey

Find a statistics study on Unemployment and explain the five-step process of the study.

  Statistical studies

Locate the original poll, summarize the poling procedure (background on how information was gathered), the sample surveyed.

  Evaluate the expected value of the total number of sales

Evaluate the expected value of the total number of sales

  Statistic project

Identify sample, population, sampling frame (if applicable), and response rate (if applicable). Describe sampling technique (if applicable) or experimental design

  Simple data analysis and comparison

Write a report on simple data analysis and comparison.

  Analyze the processed data in statistical survey

Analyze the processed data in Statistical survey.

  What is the probability

Find the probability of given case.

  Frequency distribution

Accepting Manipulation or Manipulating

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd