Identify the key properties of a web crawler

Assignment Help JAVA Programming
Reference no: EM131442586

Use Crawler Java Assignment

Review, fix and run the crawler.

Add code for additional requiments.

Make sure you crawler does the following.

Test your crawler only on the data in:

https://lyle.smu.edu/~fmoore

Make sure that your crawler is not allowed to get out of this directory!!! Yes, there is a robots.txt file that must be used. Note that it is in a non-standard location.

The required input to your program is N, the limit on the number of pages to retrieve and a list of stop words (of your choosing) to exclude.

Perform case insensitive matching.

You can assume that there are no errors in the input. Your code should be robust under errors in the Web pages you're searching. If an error is encountered, feel free, if necessary, just to skip the page where it is encountered.

1. Identify the key properties of a web crawler. Describe in detail how each of these properties is implemented in your code.

2. Use your crawler to list the URL of all pages in the test data and report all out-going links of the test data. [10 points] display the contents of the <TITLE> tag

3. Implement duplicate detection, and report if any URLs refer to already seen content.

4. Use your crawler to list all broken links within the test data.

5. How many graphic files are included in the test data?

6. Have your crawler save the words from each page of type (.txt, .htm, .html). Make sure that you do not save HTML markup. Explain your definition of "word". In this process, give each page a unique document ID.

Implement Stemming

7. Report the 20 most common words with its document frequency. words or stemmed words?

Attachment:- crawler_project.zip

Reference no: EM131442586

Questions Cloud

Skeptical of the business school claim : You are skeptical of the business school claim and decide to evaluate the salary of the business school graduates, using ?= 0.05 (2-tail) what do you conclude?
Everything you think you know about addiction is wrong : Psyc 164 : Please watch the following TED talk (there is some overlap with my module - wish I'd known that before I re-typed everything...haha) but he goes into more research and details around solutions.From the module and from this TED talk, the ..
Best estimate of the average savings : Based on the answer from question 9, calculate 90% confidence limit around your best estimate of the average savings.
Estimate chances to earn : We toss an unfair coin 100 times in a row. We play according to following rules: If tail: +$1 If head: -$1.45 P (head=0.4) Estimate chances to earn at least $3 at the end of this experiment.
Identify the key properties of a web crawler : Identify the key properties of a web crawler. Describe in detail how each of these properties is implemented in your code.
Write an essay on the effects of internet usage : Write an essay on the effects of Internet usage or lack thereof on your daily life. Following the steps Diane Wood took to write "The Hazards of Movie going," free write and explore your topic
Design database diagram for database that store information : Design a database diagram for a database that stores information about the downloads that users make. Each user must have an email address, first name, and last name.
Why would it important to occasionally check your hyperlinks : Why would it be important to occasionally check your hyperlinks manually? Why would it be important to use both external and internal links on your Web site?
Probability that a randomly selected dropout : According to a recent study,9.3 % of high school dropouts are 16- to 17-year-olds. In addition,6.5 % of high school dropouts are white16- to17-year-olds. What is the probability that a randomly selected dropout is white, given that he or she is 16..

Reviews

Write a Review

JAVA Programming Questions & Answers

  Java application prompt user to put in integer from keyboard

Write a java application that performs the following task: prompt user to put in an integer from the keyboard, search for the user input from the array created in step 1.

  Implement a class quiz

Implement a class Quiz that implements the Measurable interface.

  Programming sorting algorithms

Describe an approach to modifying the Sorts.java program so that after calling a sorting method the program prints out the number of swaps needed by the sorting method.

  Write a program that will prompt the user for a file name

Write a program that will prompt the user for a file name and open that file for reading. Print out all the information in the file, numbering each new line of text.

  Discuss the pros and cons of compilers and interpreter

Pick a problematic situation that you think a Java program can solve or make easier.

  Videorental store operatorrecord clients

Project is to design a program to help a videorental store operatorrecord clients' transactions - Design a primitive database indicating

  Using a sentinel value to control a while loop

On this exercise I need to write a while loop that uses a sentinel value to control a loop in a Java program.  I also need to write the statements that make up the body of the loop.  I have already entered the necessary variable declarations and o..

  Have an array of integers with user input instead of given

change the current code to have an array of integers with user input intead of given input from the main where it says int[] a=....; And also from a text file but the same numbers as what is given in main.

  This project mainly focuses on explaining your

this project focuses on demonstrating your understanding of java collections. before attempting this project be sure

  Constructor that accepts a file name as its argument

Write a class with a constructor that accepts a file name as its argument. Assume the file contains a series of numbers, each written on a separate line. The class should read the contents of the file into an array, and then displays the following..

  Write a java program to demonstrate the use of an arraylist

Write a Java program (non-GUI preferred) to demonstrate the use of an ArrayList. The program should allow a user to do the following: Add, edit, delete different types of animals.

  How can you use a hash function to find duplicate files

How can you use a hash function to find duplicate files (even when the file name is changed)?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd