Ce706 information retrieval assignment

Assignment Help JAVA Programming
Reference no: EM132484247

CE706 - Information Retrieval - University of Essex

Assignment: Elasticsearch & Evaluation

Imagine you have just finished university and started a job with an organisation that is in desperate need of a new search engine that allows employees to search the document collection at hand. This is your chance to shine!

Provide a report demonstrating the different stages in development

The Task

The idea of this assignment is that you apply the information retrieval knowledge you acquired during this term and put it into practice. You are already familiar with Elasticsearch. You also know the processing steps that turn documents into a structured index, commonly applied retrieval models and you know the key evaluation approaches that are being employed in IR. Now is a good time to put it all together.

Crowdsourcing mechanisms are now routinely being used for labelling data for information retrieval and other computational intelligence applications. As part of your assignment, you will participate in a crowdsourcing task aiming to predict media memorability when watching short videos. You will need to participate in two task in the labs during Week 25. The first task will last 25 minutes and the second task will last 20 minutes. Notice that the second task will need to be complete within 24 to 72 hours after completing the first task.

Indexing

The Signal Media One Million News Articles Dataset 1 is a collection of news articles from a variety of sources that has been made available to the research community. The first step for you will be to obtain the dataset and load it into Elasticsearch. If you run into problems using the upload script provided, then feel free to use your own approach. You might also want to start loading a small sample of documents first before using the full collection.

Searching Once you have indexed the collection you want to be able to search it. You can do that on the command line but it would be much better to have an interactive system. You could start with Kibana for that but you are free to use other open source tools for your GUI. Note that the collection is provided in JSON format and each article contains different fields. Make sure that a user can decide which field to search (note that one of the fields is the publication date of the article).

Building a Test Collection

Imagine you would like to explore what search engine settings are most suitable for the collection you are indexing to make search as effective as possible. To start with this you should devise a small test collection that contains a number of queries together with their expected results. Identify ten specific events covered by the collection and then compose some sample queries that you might reasonably expect a user to submit to find documents about this event.

Evaluation Once you have a test collection you can explore different search engine settings to see what effect they have on the evaluation results. To do that you need to identify a suitable metric (MAP, for example). You can then vary different parameters. You could for example change the pre-processing pipeline by comparing a system that uses stemming with one that does not. However, this will require you to re-index the collection. Alternatively, you might want to try different retrieval models such as Boolean versus TF.IDF.

Engineering a Complete System The final system should have control over all the individual components so that as the final result we have a complete search engine.

Crowdsourcing This mark will only be consider you complete both crowdsourcing tasks. More details on the tasks will be provided in the labs on the designated dates.

You will have noticed that the percentages above only add up to 80%. The remaining 20% will come from a report that describes your work. The report should contain:
• Instructions for running your system
• Screenshots illustrating the functionality you have implemented
• Design and design decisions of your overall architecture
• A description of the document collection you have chosen

The actual ground truth data that make up your test collection (i.e. queries with their matching docu- ments)
• A short description and motivation of your evaluation methodology
• Evaluation results

Discussion of your solution focussing on functionality implemented and possible improvements and exten- sions.
Short description of your crowdsourcing experience e.g. lessons learned, how to improve the user experi- ence, etc.
The report does not need to be long as long as it addresses all the above points.

You may work in pairs. Both members of a pair will get the same mark unless there is reason to do otherwise except for the 20% of marks assigned for crowdsourcing which will be individually assigned based only on the completition of both tasks. If you do work in a pair, then please make sure that you both submit an assignment and that this will be identical for both of you.

Attachment:- Information Retrieval.rar

Reference no: EM132484247

Questions Cloud

How much more are the payments worth : How much more are the payments worth if they are received at the beginning of the year rather than the end of the year?
What do you like most and least about school : What do you like most and least about school? The purpose of this question is to inquire how the client is adjusting to school by discovering areas of weakness.
What is the option value : Assume that you have been given the following information on Purcell Industries' call option: (1) current stock price is $14, (2) strike price is $13
What was ICUP Cash Flow for the year : If (under GAAP) ICUP changed its depreciation method so that the Depreciation Expense tripled to $300,000, what would be the new Cash Flow?
Ce706 information retrieval assignment : CE706 Information Retrieval Assignment help and solution, University of Essex - assessment writing service - Crowdsourcing mechanisms are now routinely
What effect will this growth have on funds : Galehouse Gas Stations Inc. expects sales to increase from $1,520,000 to $1,720,000 next year. Galehouse believes that net assets (Assets - Liabilities)
Calculate the additional annual financing cost : Wontaby Ltd. is extending its credit terms from 45 to 60 days. Sales are expected to increase from $4.73 million to $5.83 million as a result. Wontaby finances
What will philip ending cash balance be : Philip notes that net assets (Assets - Liabilities) will remain unchanged. His clothing firm will enjoy a 11 percent return on total sales.
What are the incremental earnings in the second year : Support costs of $1.5 million per year. If CathFood's marginal tax rate is 35%, what are the incremental earnings in the second year of this project?

Reviews

Write a Review

JAVA Programming Questions & Answers

  Program that inputs a degree of difficulty

Write a computer program that inputs a degree of difficulty and seven judges' scores and outputs the overall score for that dive. The program should ensure that all inputs are within the allowable data ranges.

  How many objects can you make from a public class

Can a java program have more than one file open at the same time?

  Why is an array like a list

Why is an array like a list? How do you identify and find elements in an array? Explain the purpose of a try-catch block and give an example.

  Make sure you are exception handling by verifying

Make sure you are exception handling by verifying that all of the customer-entered information is valid before the order is submitted to ensure order accuracy.

  Write a line of code that change an applets background color

Write a line of code to declare and construct a Date object named curDate. Write a line of code that will change an applet's background color to red.

  Write a program called gf2java to implement the finite

write a program called gf2.java to implement the finite field gfpn where p is a prime number and n is a positive

  Change the applications background color

Change the applications background color from BLACK to dark gray - Change the title of the application from "Shooting missiles" to "Cuttlefish lunch

  Differences in paths between dos and posix

How do you tell a path apart in each standard? What are some applications where it might be important to understand how to construct paths?

  Write a class house that correctly compiles and runs

Write a class House that correctly compiles and runs with the following TestHouse code. You cannot change a single thing in the TestHouse class,

  How can an applet get information about the url

How can an applet get information about the URL it was called from? What is an applet? What API changes are there for applets with JDK 1.4?

  Create a java program to calculate the bmi

In this assignment you are required to create a java program to calculate the BMI. Enter the weight and height of person and let the program automatically calculate the BMI.

  Determine which of a set of integers are prime numbers

Describes the trial division algorithm as Given an integer n, the integer to be factored, trial division consists of systematically testing whether n is divisible by any smaller number.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd