Write a regular expressions that captures all html tags

Assignment Help Applied Statistics
Reference no: EM131635333

Problem Assignment -

The Enron scandal led to the bankruptcy of the Enron Corporation, the largest bankcruptcy reorganization in US history at that time, and to the dissolution of Arthur Andersen, one of the five largest audit and accountancy partnerships in the world. In this exercise, you will download text on the scandal available through Wikipedia and filter it to the sentences dealing with Kenneth Lay, one of the main figures in the scandal.

Download the source code from following wikipedia page: Enron scandal. Use readLines ( . . . ).

Go to the same webpage in your browser and look at the source code (Google Chrome: right mouse click & view page source). All lines that include text from the main body (no headers, info boxes, etc.) always start with the same html tag, namely <p>. Use a regular expression to limit the downloaded data to lines that include text from the main body. Use grep (. . .).

Remove html tags using gsub( . . . ). Html tags always have the same format, namely a certain number of characters within angle brackets (also called guillemets, '<' and '>'), e.g. <table>. Write a regular expressions that captures all html tags.

We want to construct a vector where each element is a single sentence, which is currently not the case First, collapse the current vector into one character string, using paste( . . . , collapse = " " ) Subsequently, seperate the vector again at the end of individual sentences. We assume that '.' is the only sentence seperator. However, '.' is also a special character for constructing regular expressions. In order to use '.' as full stop, and not as the meta character 'any character', use backslashes as shown below. In addition, use the suffix [[1]] as the output is a list.

strsplit(..., "\\.")[[1]]

Find all sentences that include the term kenneth lay, ignoring cases.

Save the resulting vector of sentences in a text file named enron_ scandal . txt. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Reference no: EM131635333

Questions Cloud

Why does pure communism not work : Why does pure communism not work? Why did it fail in the USSR, in Cuba, and to a large extent even in China.
Paper on the uprisings and protest in saudi arabia : Discuss the grievances of the people, i.e. the causes of the uprising and /or protests - Chronicle the major events during the uprising/protests.
What are the constitutional issues related to each search : What are the constitutional issues and legal doctrines related to each search? Is the search of all fifth and sixth-grade students' lockers a legal search?
The criticism of south africas transiton to democracy : What are the criticism of south africas transiton to political democracy.
Write a regular expressions that captures all html tags : The Enron scandal led to the bankruptcy of the Enron Corporation, Find all sentences that include the term kenneth lay, ignoring cases
Solving problem related to opening an account on e-trade : You and your friend have opened an account on E-Trade and have each decided to select five similar companies in which to invest.
The rights of the individual and the need to protect society : What is the role of critical thinking by both ordinary citizens and elected officials in the attempt to find solutions to this dilemma?
What is the minimum number of investors : The Darkroom Window shade Company has 100,000 shares of stock outstanding. The investors in the firm own the following numbers of shares.
Develop a hazard assessment for your workplace : Using Subpart I Appendix B as a guide, develop a hazard assessment for your workplace or a workplace you are familiar with.

Reviews

len1635333

9/9/2017 7:41:45 AM

Subject: Textual analysis in R. Also each bullet point with an exception of a few shouldnt amount to more than one line of code. It should be simple. Save the resulting vector of sentences in a text file named enron scandal. Make sure that the resulting file does not have column names, row names, or quotation marks around the individual entries.

Write a Review

Applied Statistics Questions & Answers

  What are three possible cause and effect relationships

You observe that classmates who get good grades tend to sit toward the front of the classroom, and those who receive poorer grades tend to sit toward the back.- What are three possible cause-and-effect relationships for this non experimental observ..

  Construct a box-plot for the ages of the bachelorettes

Construct a box-plot for the ages of the bachelorettes. Compare the box-plot you constructed in this question to the graph you constructed in Question 3A. Which do you think better represents the distribution of the ages of the bachelorettes? Expla..

  Use the data to compute the ten residuals

1.  A researcher interested in the relationship between body mass index (BMI) and total serum cholesterol wished to fit a simple linear regression model in which total serum cholesterol is predicted from BMI using the following data. Use the data to..

  Skin stapler assembly and welding operation

Normal 0 false false false EN-US X-NONE X-NONE Skin Stapler Assembly and We..

  A screening test for a newly discovered disease is being

a screening test for a newly discovered disease is being evaluated. in order to determine the effectiveness of the new

  Calculate the sample size needed given these factors

Calculate the sample size needed given these factors

  Established a record in a major midwestern city

The post office has established a record in a major Midwestern city for delivering 90% of its local mail the next working day. If you mail the eight local letters, what is the probability that all of them will be delivered the next day? Of the eight,..

  Whether new system has succeeded in lowering bacterial count

You are the statistician assigned to report to the hotel whether the strategy has worked. Base your analysis on a confidence interval.

  What would be the 80th percentile for height

What would be the 80th percentile for height?

  Find the expected value of marketing

Find the Expected Value of Marketing - the marketing manager at Cooke Collectibles has analyzed the possible demand levels

  Hypothesis test and measure of effect size

Write a sentence demonstrating how a research report would present the results of the hypothesis test and the measure of effect size and determine whether there are any significant differences among the three treatment means.

  Use only r programming language

If twenty-seven students are to be assigned to groups of three for each problem set, and no student can be assigned to the same group as a student whom he or she has previously worked with, how many problem sets can Dr Lee assign? Extend the function..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd