Calculate the ranking score for each of the ten documents

Assignment Help Database Management System
Reference no: EM13841747

1. Vector Model

This question requires you to use the following data. Assume a repository of 10 documents over eight key terms. Table 1 gives the document-term table that shows the raw frequencies with which the eight key terms appear 1 in each of the 10 documents, as well as the TF values for a query document.

Using the information from Table 1, which documents would be returned by the following queries:

a) Term2 AND Term7

b) Term4 OR Term2

c) (Term2 OR Term7) AND (NOT Term7)

Task

Table 1: A2: Document-Term and Query-Term Table


Term 1  Term 2  Term 3  Term 4  Term 5  Term 6  Term 7  Term 8
Doc 1 4 8 9 0 10 8 0 9
Doc 2 1 5 0 0 12 0 1 3
Doc 3 0 3 0 0 0 4 2 0
Doc 4 1 0 4 3 9 0 0 0
Doc 5 0 4 0 0 0 5 1 0
Doc 6 1 2 2 0 3 1 0 1
Doc 7 0 5 3 4 0 0 4 2
Doc 8 0 7 0 3 0 0 3 3
Doc 9 0 5 0 0 0 4 1 2
Doc 10 0 3 4 0 0 2 4 0
Query  2 3 1 2 2 0 1 0

Is it possible to rank the documents returned in (a) to (c)? If it is possible, then supply the rankings in each case. If it is not possible, then state why.

Exercise 1: Answer the following questions.

a) Using the information from Table 1, calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures:

dot product using TF weight for both documents and query vectors cosine coefficient using TF weight for both documents and query vectors.

b) Compare the rankings that you obtained using the two similarity measures. If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 2: Answer the following questions.

a) Using the information in Table 1, calculate the idf (inverse document frequency) weight vector. Make sure you show how your calculation was performed.

b) Construct a table similar Table 1, but, instead of raw term frequencies, show the tf-idf weights.

c) Using tf weights for the query vector, and tf-idf weights for the document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

d) How does this ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 3: Answer the following questions.

a) This time, using tf-idf weights for both the query and document vectors, and the cosine coefficient as a similarity measure, compute the ranking scores using the cosine coefficient. Show how your calculations were performed for the first document only.

b) How does this ranking compare with ranking obtained in Exercise 21? If there are differences between the rankings, then discuss why you think these differences occurred.

Exercise 4: This time, use tf weights for the query vector, tf-idf weights for the document vectors, and the Dice coefficient rather than the Cosine coefficient as the similarity measure.

a) Compute the ranking scores for all documents. Show how your calculations were performed for the first document only.

b) How does the ranking compare with ranking obtained in Exercise 1? If there are differences between the rankings, then discuss why you think these differences occurred.

2.2. IR Evaluation

Exercise 5: The following data displays retrieval results for two different algorithms (Algorithm 1 and Algorithm 2) in response to two distinct queries (Query 1 and Query 2). An expert has manually labelled each of the documents as being either relevant or not relevant to the queries.

Algorithm 1 Returns the following results:

Query 1 :  d4 ,  d15 ,  d1 ,  d3 ,  d8 ,  d76 ,  d2 ,  d33 ,  d30 ,  d5 ,  d11 ,  d29 ,  d66 ,  d10
Query 2 :  d9 ,  d91 ,  d2 ,  d87 ,  d13 ,  d52 ,  d92 ,  d16 ,  d17 ,  d22 ,  d20 ,  d71 ,  d48 ,  d60 ,  d56

Algorithm 2 Returns the following results:

Query 1 : d8 , d29 , d6 , d5 , d15 , d17 , d20 , d65 , d2 , d33 ,
d44 , d41 , d7 , d77 , d13 , d14 , d90 , d80 , d70 , d4
Query 2 : d3 , d87 , d2 , d28 , d15 , d14 , d12 , d10 , d41 , d11 ,
d85 , d89 , d1 , d49 , d52 , d76 , d55 , d9 , d91 ,
d99 , d30 , d17 , d13 , d26 , d94 , d18 , d86 , d72 , d48 , d8 , d93 ,
d42 , d79 , d43 , d88 , d7 , d98 , d51 , d50 , d6

Relevance The known one is as follows:

Query 1 : d2 , d4 , d7 , d15 , d29
Query 2 : d1 , d2 , d3 , d7 , d8 , d9 , d11 , d12 , d13 , d15 , d16 , d20

a) For Algorithm 1, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 1 (all three curves should be on a single chart).

b) For Algorithm 2, plot the precision versus recall curves for Query 1 and Query 2, interpolated to the 11 standard recall levels. Also plot the average precision versus recall curve for Algorithm 2 (all three curves should be on a single chart, but a separate chart from that used in part (a)).

c) Plot the averages for Algorithm 1 and Algorithm 2 on a separate chart, and compare the algorithms in terms of precision and recall. Do you think one of the algorithms is superior? Why?

Reference no: EM13841747

Questions Cloud

Positive or negative influence on collective bargaining : Do you believe globalization has had a positive or negative influence on collective bargaining? Why or why not? Globalization in the 21st century has increased competition between nations for investment, technology, and labor
Write a helloworld program : Write a helloworld program by using Java programming language.
Complete a business model of the process : Complete a Business Model of the process. Make sure to include the entity attributes and key attribute. Flowchart the process and use other methods from this chapter to document the specific business process.
Relationship between the individual and the organization : As increasing complexity emphasizes the interdependent relationship between the individual and the organization, the OD practitioner will need to develop _____ among organizational elements.
Calculate the ranking score for each of the ten documents : Calculate the ranking score for each of the ten documents based on each of the following query-document similarity measures - How does ranking compare with ranking obtained using the cosine similarity measure in Exercise 20? If there are differences..
Role to coordinate emergency operations planning : The emergency manager is in a challenging position. It is their role to coordinate emergency operations planning and they must also negotiate and foster effective relationships with federal, state, local, and private-sector partners in order to promo..
Prepare the entry to record the issuance of the bonds : Prepare the entry to record the issuance of the bonds and prepare a bond amortization table.
Significant challenges for effectively coordinating : What is the process in your community for the activation of local, state, and federal resources? What are the most significant challenges for effectively coordinating these resources?
Incident command system has contributed to effective respons : How well do you think the Incident Command System (ICS) has contributed to effective response efforts in the United States? Has it made the jobs of first responders more manageable? Where do you think further improvements are needed?

Reviews

Write a Review

Database Management System Questions & Answers

  Brazilian federal data processing service

Examine the proposed business ethical problem that the Brazilian Federal Data Processing Service is presently experiencing. Determine whether you agree or disagree that Brazil's problem is an ethical one that should be corrected. Provide a rationa..

  Translation from erd to the relational model

Complete (i.e., reverse engineering) ER diagram below such that 4 relation schemas above are exactly result of a translation from the ERD to the relational model.

  Write a program to keep track of a cd or dvd collection.

write a program to keep track of a CD or DVD collection. This can only work exclusively with either CDs or DVDs since some of the data is different. The data will be stored ina file. The data from the file will be stored ina text file as records. Eac..

  Identify the functional dependencies

Given the following table, identify the functional dependencies:

  Prove that your algorithm correctly computes the attribute

Describe a linear-time (in the size of the set of FDs, where the size of each FD is the number of attributes involved) algorithm for ?ndingthe attribute closure of a set of attributes with respect to a set of FDs

  Describe what entity represents in an er mode

Describe what entity represents in an ER mode and provide examples of entities with a physical or conceptual existence. Describe how strong and weak entities differ and provide an example of each.

  Assignment related to er diagram

Question 1: How clear andwell-presented your submission is. Question 2: E-R diagram:adherence too your standard,assumptions made,in clusion of correct primary and foreign keys,appropriate entities,relationships,and attributes.

  Create matrix report showing territory sales totals by year

Create a matrix report showing territory sales totals by year and quarter. Provide an interpretation of the results. The interpretation must be a minimum of one paragraph (3 to 5 well-formed sentences) with no spelling or grammatical errors. Based..

  Produce a distributed data design for enterprise

Produce a distributed data design for this enterprise. Show data fragmentation/partitioning and replication for each regional database location. Indicate what attributes are in each fragment

  Create a supplier database and related reports

Create a supplier database and related reports and queries to capture contact information for potential PC component suppliers that might be used to purchase the equipment your specified in your MS Word project - the PC specifications

  Discussed and implemented the mvc design pattern

Find another design pattern which could be used for web based development and write a synopsis on it, pointing out whether it would be applicable for use within your project or not. Comment as applicable on design patterns that other class members..

  Suppose a movie database in which data is recorded about

suppose a movie database in which data is recorded about the movie industry. the data needs are summarized as follows

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd