Building of a search engine use of Elasticsearch

Assignment Help Web Project
Reference no: EM132252625

Information Retrieval Assignment - Elasticsearch & Evaluation

Assignment Task - This assignment involves the building of a search engine. It involves the use of Elasticsearch. Elasticsearch will serve as the backend search engine. In this assignment task, either of two document collections could be used. One makes use of a Shakespeare dataset that can be found in an Elasticsearch tutorial. The other makes use of the University of Essex website.

The Context of your Task - Imagine you have just finished university and started a job with an organisation that is in desperate need of a new search engine that allows employees to search the document collection at hand. This is your chance to shine!

The Task - The idea of this assignment is that you apply the information retrieval knowledge you acquired during this term and put it into practice. You are already familiar with Elasticsearch. You also know the processing steps that turn documents into a structured index, commonly applied retrieval models and you know the key evaluation approaches that are being employed in IR. Now is a good time to put it all together.

Specific Requirements - This assignment is composed of two parts which are: The Implementation and The Documentation.

1. For the implementation, there are five stages involved namely:

  • Indexing
  • Searching
  • Building a Test Collection
  • Evaluation
  • Engineering a Complete System

The code for the assignment should be well commented describing each stage and the stages should also be labelled in the code.

This assignment comes in stages. You may choose not to attempt some stages.

The stages are as follows:

  • Indexing - The first step is to identify a document collection of your choice that you want to index. You might want to consider the Shakespeare dataset that you find in the Elasticsearch tutorial as it is mainly text but also has some internal structure - a perfect use case for Elasticsearch. A perhaps more interesting approach would be to index a Web site such as that of the University of Essex. In that case you will need to employ a crawler that collects the data and passes it over to Elasticsearch. One possibility is to use Apache Nutch for that. Alternatively, you could use the Python Scrapy framework, the GNU wget tool, or tap into an RSS feed using Elasticsearch's own logstash. Note that a largely relational database such as the dataset you used in the first lab is not appropriate for this assignment as it does not offer you much scope to explore the different text processing steps.
  • Searching - Once you have indexed the collection you want to be able to search it. You can do that on the command line but it would be much better to have an interactive system. You could start with Kibana, but you are free to use other open source tools for your GUI.
  • Building a Test Collection - Imagine you would like to explore what search engine settings are most suitable for the collection you are indexing to make search as effective as possible. To start with this you should devise a small test collection that contains ten unique queries together with their expected results (for simplicity you can identify a single document in the collection that is relevant for each of your chosen test queries). If you use the University of Essex example as suggested earlier, you might want to use some of the frequent queries that I presented to you in last week's lecture and then identify relevant matching documents. Make sure your queries are as representative as possible for the chosen collection, in other words, given the collection one might expect the queries to be submitted.
  • Evaluation - Once you have a test collection you can explore different search engine settings to see what effect they have on the evaluation results. To do that you need to identify a suitable metric (MAP, for example). You can then vary different parameters. You could for example change the pre-processing pipeline by comparing a system that uses stemming with one that does not. However, this will require you to re-index the collection. Alternatively, you might want to try different retrieval models such as Boolean versus TF.IDF.
  • Engineering a Complete System - The final system should have control over all the individual components so that as the final result we have a complete search engine.

2. For the Documentation part, it should be written in Microsoft word and should include the following:

  • Instructions for running your system
  • Screenshots illustrating the functionality you have implemented
  • Design and design decisions of your overall architecture
  • A description of the document collection you have chosen
  • The actual ground truth data that make up your test collection (i.e. queries with their matching documents)
  • A short description and motivation of your evaluation methodology
  • Evaluation results
  • Discussion of your solution focusing on functionality implemented and possible improvements and extensions.

The document report should be written in the order above.

The report does not need to be long as long as it addresses all the above points.

Attachment:- Assignment Files.rar

Reference no: EM132252625

Questions Cloud

What is the landrum-griffin act : What is the Landrum-Griffin Act and what was the purpose of it?
Duty to continue paying its employees : If a hotel is not able to reopen while it is being rebuilt after the damage from the storm, does the hotel have a duty to continue paying its employees?
How services retailing differs from goods retailing : Explain how services retailing differs from goods retailing.
Discuss about the evaluation of the literature : At the conclusion of this project, the student will be able to apply evidence-based research steps and processes required as the foundation.
Building of a search engine use of Elasticsearch : CE306 - Information Retrieval Assignment - Elasticsearch & Evaluation, University of Essex, UK. Building of a search engine use of Elasticsearch
What is the new rule regarding exports to hong kong : Which of the following is true about Rule 1 of the ITAR? What is the new rule regarding exports to Hong Kong?
Why do marketing professionals care about and participate : Why do marketing professionals care about and participate in supply chain decisions?
Outlined in an operational plan : Discuss the details that should be outlined in an operational plan.
Real options at intel : The computer chip manufacturing industry is highly dynamic and complex. This case examines how Intel has exercised real options to help

Reviews

len2252625

3/10/2019 10:15:32 PM

This assignment task involves the building of a search engine. It involves the use of Elasticsearch. Elasticsearch will serve as the backend search engine. In this assignment task, either of two document collections could be used. One makes use of a Shakespeare dataset that can be found in an Elasticsearch tutorial. The other makes use of the University of Essex website. The document report should be written in the order above. The questions for this Assignment task are attached in a PDF file along with this submission. The name of the file is 'Assignment Questions'.

len2252625

3/10/2019 10:15:26 PM

The report does not need to be long as long as it addresses all the above points. You may work in pairs. Both members of a pair will get the same mark unless there is reason to do otherwise. If you do work in a pair, then please make sure that you both submit an assignment and that this will be identical for both of you. Software - The backend search engine to be used is Elasticsearch. Apart from that you are free to write code in any language of your choice and employ any open source tool that you find suitable. Submission - The assignment, which counts for 20% of the overall mark, should be submitted as a single zip file via the electronic submission system. The guidelines about late assignments are explained in the students’ handbook.

Write a Review

Web Project Questions & Answers

  Evaluating an ecommerce website

Create a check list that contains key point for evaluating an ecommerce website - Write a short, reflective report about website

  Gpc and runtime magic quotes

Create a script that lets you know whether Zeus or Helios has the GPC and Runtime Magic Quotes turned on or off. The output should have appropriate labels that define what output signified and should display 'ON' or 'OFF' depending on the setting.

  Creating functions through conditional operator

Use the conditional operator and the cal_days_in_month function, determine the number of days in the current month and output to browser whether it is normal month or a leap month.

  Web development projects with database

Since the vast majority of web-development projects involve a database, do you think that computational activities should be performed there, or do you think they belong in the XML page or stylesheet?

  Comparing shelf software packages

Required assistance with comparing and contrasting two main off the shelf software packages that could be implemented in an organization.

  Web based scams

Web phishing, pharming and vishing are popular web based scams. Talk about currently used tools and recommended measures to defeat this kind of attacks efficiently?

  Explanation of contextual links

The most powerful hypertext capabilities is the the contextual link. Wikipedia . com is a great example of a site that utilizes contextual links.

  How architectural and protocol changes occur

Discuss how architectural and protocol changes happen, the administrative organization that oversees the technical development of the Internet,

  Traditional approaches for training professionals

Webinars and other web conferencing techniques have proved most beneficial for the provision of affordable quality corporate training.

  Internet for business

Discuss how can a business use the Internet and give at least three examples with web links demonstrating your answer.

  It influences the behavior of organizations

Information technology influences the behavior of organizations. Name one effect of Information technology implementation and long-term usage you suppose having a positive contribution and one having a negative consequence.

  Importance of a guided navigation system

Explain the use and importance of a guided navigation system and shopping cart for a website designed for e-commerce and business purpose.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd