SIT772 Database and Information Retrieval Assignment

Assignment Help Database Management System
Reference no: EM132450089

SIT772 - Database and Information Retrieval - Deakin University

Assessment: Information Retrieval Techniques Problem Solving Task

Learning Outcome

LO1. Demonstrate data retrieval skills in the context of a data processing system.
LO2. Discipline-specific knowledge and capabilities

Purpose

This task evaluates the student's technical skills in the management of unstructured data, with potential usage in real applications.

Problem 1

Suppose you have joined a search engine development team to design a search algorithm based on both the Vector model and the Boolean model.

You have collected the following (3) documents (unstructured) and plan to apply an index technique to convert them into an inverted index.

Doc 1:data science is a field to use scientific method, process, algorithm, system to extract knowledge.

Doc 2:data mining is the process to discover pattern in large data to involve method at the database system. Doc 3:information system is the study of network of hardware and software that people use to process data. To answer the below Problems, you have to provide the detailed procedures step by step.

Problem 1.1: In the process of creating the inverted index, please complete the following steps: Remove all stop words and punctuation. The list of stop words for this task is provided as follows: Is, An, That, Use, And, To, From, In, Both, Of, At, The

Problem 1.:2 Create a merged inverted list including the within-document frequencies for each term.

Problem 1.3: Use the index created as above to create a dictionary and the related posting file.

Problem 1.4: Please design three Boolean queries, (e.g., web AND search) and list the relevant documents for each query. Each query must contain at least two keywords while no one keyword appears in one document only.

Problem 1.5: Please use the Vector model to query on the inverted index, and compare the result with the Boolean model. (Hint: you can use cosine similarity and set a similarity threshold).

Problem 2 (IR Evaluation)

In this Problem, you are required to evaluate the performance of different search engines. First, please find two search engines you are familiar with, such as Google, Bing, Yahoo!, etc.

Second, please choose one target from the following list, and design two queries to search in both search engines. So both query 1 and query 2 have to be tested in both search engines.

Target 1: obtain the new features of the new iPad. Target 2: obtain the user manual for installing Tera Term. Target 3: obtain a tutorial how to install Oracle SQL. Target 4: obtain the features of the new Xbox one.

Third, select the first 20 results in both search engines, if they return the target, then mark them as relevant documents, otherwise, they are irrelevant. We can assume there are 12 relevant documents in total (retrieved and not-retrieved). If you think there are more relevant documents to be searched, you can use higher expected relevance as threshold.

The following Problems are based on your search results.

Problem 2.1: List your target, results and designed search queries (You can use any keywords you think are related to the target). For each result, you can click the link and go to the page, and take the screenshot if you think this result is relevant. At your report, you are required to provide the screenshots and detailed explanation why they are relevant to your queries.

Problem 2.2: Get the precision and recall values for 20 documents for query 1 in search engine 1. Interpolate them to 11 standard recall levels. Then plot them into a chart. Get the precision and recall values for 20 documents for query 1 in search engine 2. Interpolate them to 11 standard recall levels. Then plot them into a chart.

Problem 2.3: Get the precision and recall values for 20 documents for query 2 in search engine 1. Interpolate them to 11 standard recall levels. Then plot them into the same chart as above. Get the precision and recall values for 20 documents for query 2 in search engine 2. Interpolate them to 11 standard recall levels. Then plot them into the same chart as above.

Problem 2.4: Now find the average interpolated precision of query 1 and query 2 for search engine 1 and plot it into the same chart. So you will have total of 3 interpolated curves in one single chart. Now find the average interpolated precision of query 1 and query 2 for search engine 2 and plot it into the same chart. So, you will have total of 3 interpolated curves in one single chart.

Problem 2.5: Plot the average interpolated values for Search Engine 1 and Search Engine 2 on one single chart, and compare the algorithms in terms of precision and recall. Which search engine do you think is superior? Why?

Attachment:- Information Retrieval Techniques.rar

Reference no: EM132450089

Questions Cloud

Determine RTI Company Flexible budget operating income : RTI Company's master budget calls for production and sales of 18,000 units for $81,000; Determine RTI Company Flexible budget operating income
Recent news article on water quality : Find a recent news article on water quality and give a summary of it. What Section of the Clean Water Act relates to your article? Cite your source.
Discuss the risks to its financial health that the firm face : Address issues of a qualitative nature that may impact the financial health of the firm (litigation, quality of management, foreign operations, legislation
Rachel carson excerpts in platers environmental law : What are the lesson intended by both Aldo Leopold and Rachel Carson excerpts in Platers Environmental Law and Policy: Nature,Law, and Society?
SIT772 Database and Information Retrieval Assignment : SIT772 Database and Information Retrieval Assignment Help and Solution, Deakin University Assessment - Information Retrieval Techniques Problem Solving Task
Explain the bubble concept as applied to air quality : Explain the bubble concept as applied to air quality permitting (HINT: this question is on the CAA provisions - not the Kyoto Protocol).
Discuss audit about public - held company : What are these and why are they necessary instead of using generally accepted auditing standards like you used on your recent audit of a publicly-held company
Where do you think the next water war will be : Where do you think the next "Water War" will be? Give your reasons why you think that place has the most conflicts over water. Cite your sources.
Assignment - Financial Reporting Problem : Assignment - Financial Reporting Problem. Summary of findings and recommendations - What is the par or stated value per share of Apple's common stock

Reviews

Write a Review

Database Management System Questions & Answers

  Knowledge and data warehousing

Design a dimensional model for analysing Purchases for Adventure Works Cycles and implement it as cubes using SQL Server Analysis Services. The AdventureWorks OLTP sample database is the data source for you BI analysis.

  Design a database schema

Design a Database schema

  Entity-relationship diagram

Create an entity-relationship diagram and design accompanying table layout using sound relational modeling practices and concepts.

  Implement a database of courses and students for a school

Implement a database of courses and students for a school.

  Prepare the e-r diagram for the movie database

Energy in the home, personal energy use and home energy efficiency and Efficient use of ‘waste' heat and renewable heat sources

  Design relation schemas for the entire database

Design relation schemas for the entire database.

  Prepare the relational schema for database

Prepare the relational schema for database

  Data modeling and normalization

Data Modeling and Normalization

  Use cases perform a requirements analysis for the case study

Use Cases Perform a requirements analysis for the Case Study

  Knowledge and data warehousing

Knowledge and Data Warehousing

  Stack and queue data structure

Identify and explain the differences between a stack and a queue data structure

  Practice on topic of normalization

Practice on topic of Normalization

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd