Design a search algorithm based on both the vector model

Assignment Help Database Management System
Reference no: EM131800298

Assessment Task

This assessment task enables students to demonstrate their proficiency against Unit Learning Outcome 5. ULO5: Demonstrate data retrieval skills in the context of a data processing system.

Question 1

Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.

You are supposed to collect unstructured documents for the following topics, and apply an index technique to convert them into an inverted index.

Please collect 3 documents (less than 30 words for each) in three different topics. Topics are listed as follows, you can also choose some other topics you prefer.
- Science
- Computer Vision
- Search Engine
- Database
- Security and privacy.
An example of document:
"Google is the most widely used Web search engine in the World. It claims to be the World's most comprehensive search engine, indexing over 2.4 billion Web pages."

1. Creating the inverted index. In the process of creating the inverted index, please complete the following steps:

a. Find a stopword list in the Internet and remove all stopwords and punctuation from those three documents. Then apply Porter's stemming algorithm to all documents. Note that there are plenty of online stemming applications available, and you may use Porter algorithm for this question. The output will be a set of stemmed terms.

c. Use the index created in step (b) to create a dictionary and the related posting file.

d. You may like to test the inverted index by using some keywords, please select some keywords from the documents. For example: google, web, search.

2. Boolean and Vector queries.

a. Please design three Boolean queries, (for example, web AND search) and list the relevant documents for each query.

b. Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

Question 2 (IR Evaluation):

For this exercise, you are required to evaluate the performance of different search engines.

First, please find two search engines you are familiar with, such as Google, Bing, Yahoo!, etc.

Second, please choose a target in the following groups, and design two queries to search in both search engines.

The target is chosen by the last number of your student ID. For example, if your student ID ends with the number is 1, please choose target 1; if it is 0, please choose target 10.
- Target 1: obtain the unit guide of SIT771.
- Target 2: obtain the unit guide of SIT772.
- Target 3: obtain the unit guide of SIT773.
- Target 4: obtain the unit guide of SIT774.
- Target 5: obtain the price of the new Macbook.
- Target 6: obtain the price of the new iPhone.
- Target 7: obtain the price of a Lenovo Laptop.
- Target 8: obtain the install document of MongoDB.
- Target 9: obtain the manual of MongoDB.
- Target 10: obtain the operation guide of MongoDB.

Select the first 20 results in both search engines, if they return the target, then mark them as relevant documents, otherwise, they are irrelevant. The following exercises are based on your search results.

a. List your target and designed search queries (you can use any keywords you think are related to the target).

For Search Engine 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels.

Also plot the average precision versus recall curve for Search Engine 1 (all three curves should be on a single chart).

b. For Search Engine 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Search Engine 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c. Plot the averages for Search Engine 1 and Search Engine 2 on a separate chart, and compare the algorithms in terms of precision and recall.

Which search engine do you think is superior? Why?

Reference no: EM131800298

Questions Cloud

Determine the fixed manufacturing overhead rate : Determine the fixed manufacturing overhead rate under absorption costing. Hint: I am asking for the predetermined overhead rate per unit.
What are the ending capital balances for each partner : Any remaining profits are split 20%, 30%, and 50% respectively. The net income for the year is $30,000. What are the ending capital balances for each partner
Protecting employee data : Kraft Foods Inc. is the largest food and beverage company in North America and the second largest food and beverage company in the world.
Prepare the journal entries relating to the transaction : Assume this hedge is designated as a cash flow hedge. Prepare the journal entries relating to the transaction and the forward contract
Design a search algorithm based on both the vector model : SIT772 Database and Information Retrieval - demonstrate proficiency against Unit Learning Outcome 5. ULO5: Demonstrate data retrieval skills
Support a finding of liability : Of course, it still remains a question of fact whether the primary wrongdoer was able to exercise reasonable care to summon emergency assistance
What the amount of consolidated cost of goods sold : This year, Rose Company acquired all of the common stock of Hayley. What the amount of consolidated cost of goods sold at end of the year
Calculate the modified internal rate of return : Calculate the Modified Internal Rate of Return, Calculate the IRR of the following project
Calculate the direct materials price variance : The company planned to work 100,000 direct labor hours and produce 20,000 units of product in 2014. Calculate the Direct materials price variance

Reviews

len1800298

1/6/2018 4:59:58 AM

Please read the rubric carefully as it outlines what criteria your assessment will be evaluated on. Instructions • Read these instructions • Answer as many questions as possible • Place your name, ID and answers in your document. • Please submit your word file with your answers and graphs (embedded) where appropriate as a SINGLE document in the Submission Portal.

len1800298

1/6/2018 4:56:34 AM

Assessment Task 2 This assessment task enables students to demonstrate their proficiency against Unit Learning Outcome 5. ULO5: Demonstrate data retrieval skills in the context of a data processing system. Assessment 2 (Individual) Written report Weight (% of total mark) 30 Due date Submission method Referencing style Monday, 5 5 PM AEST Through CloudDeakin via FutureLearn Harvard

Write a Review

Database Management System Questions & Answers

  Write a query to display the author last name and first name

Use the Ch08_Fact database shown in Figure P8.44. For problems with very large results sets, only the first several rows of output are shown.

  What kind of a design would you try in this case

Suppose that your database system has very ine?cient implementations of index structures. What kind of a design would you try in this case?

  Critically analyse and modify an existing database design

Find the customers whose balance is greater than 200 but less than 300 - Critically analyse and modify an existing database design

  Relational and logical operators to evaluate logical

Boolean w6 using Relational and Logical Operators to Evaluate Logical (Boolean) Expression Evaluate the logical (Boolean) expressions in the following exercises and circle the correct answer after you evaluation

  How much supervisory wages and factory supplies

How much supervisory wages and factory supplies cost would NOT be assigned to products using the activity-based costing system - How much supervisory wages and factory supplies cost would be assigned to the Batch Processing activity cost pool?

  Analyze the data in at least three different ways

Analyze the data in at least three different ways. Each form of Data Analysis should be provided on a separate, appropriately labeled worksheet. It is expected that each sheet will be professionally formatted and clearly documented with titles, co..

  Use r to calculate following probability from z distribution

Use R to calculate the following probability from Z distribution

  Create an entity-relationship diagram and design

create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices

  Condition on attribute rating in table

Write a CHECK constraint that expresses the following condition on attribute Rating in table Movies in Movie Database: "Any possible value of Rating is either a null value or a number in the interval from 0 to 10 inclusively".

  Draw the entity-relationship diagram for all of the entities

Note: The counties for the state capitals shown in the figure above are Travis and Williamson counties for Austin TX; Hartford county for Hartford CT; Clinton, Eaton, and Ingham counties for Lansing, MI; Davidson county for Nashville TN; Hughes co..

  What functionality the screen will provide

Look at the advantages and disadvantages for each approach.Provide a description of what functionality the screen will provide. What can the user do with this screen?

  Give a list of employee names

Using the Northwind database - Give a list of employee names and each employee's corresponding customers by company name.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd