CE706 Information Retrieval Assignment

Assignment Help Other Subject
Reference no: EM132801237

CE706 Information Retrieval - University of Essex

Scenario: In response to the COVID-19 pandemic, the White House and a coalition of leading research groups have prepared the COVID-19 Open Research Dataset (CORD-19) . CORD-19 is a resource of over 181,000 scholarly articles, including over 80,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses. This freely available dataset is provided to the global research community to apply recent advances in information retreival and other AI techniques to generate new insights in support of the ongoing fight against this infectious disease. There is a growing urgency for these approaches because of the rapid acceleration in new coronavirus literature, making it difficult for the medical research community to keep up.

Your task

This task comes in stages. Marks are given for each stage. The stages are as follows:
• Indexing (20%) The first step for you will be to obtain the dataset. Once you have done so upload a sample of 1000 articles with full text to Elasticsearch (the simplest thing is to use the first 1000 documents). You will work with the metada.csv file provided by the challenge.
• Sentence Splitting, Tokenization and Normalization - The next step should be to transform the input text into a normal form of your choice. This should include the identification of sentences, bullet points and cells in tables.
• Selecting Keywords - One aim of your system is to identify the words and phrases in the text that are most useful for indexing purposes. Your system should remove words which are not "useful". E.g. very frequent words or stopwords. You should also identify phrases suitable as index terms. Apply tf.idf as part of your selection and weighting step.

• Stemming or Morphological Analysis - Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way, e.g.bus and busses refer to exactly the same thing even though they are different words.
• Searching (10%) Once you have indexed the collection you want to be able to search it. You can do that on the command line, but it would be much better to have an interactive system. You could start with Kibana for that but you are free to use other open source tools for your Graphical User Interface(GUI). Note that the each article in the collection contains different fields. Make sure that a user can decide which field to search (Hint:one of the fields is the publication date of the article).
• Engineering a Complete System - The final system should allow a user to have control over all the individual components, so inthe final result we will have a complete search engine, not disperate code.
You will have noticed that the percentages above only add up to 80%. This is because one of the important aspects of the project is that your work should be well documented and your code well commented. 20% of your mark will come from this. The report should contain:
• Instructions for running your system
• Screenshots illustrating the functionality you have implemented
• Design and design decisions/justifications of your overall architecture
• A description of the document collection you have chosen
• Discussion of your solution focussing on functionality implemented and possible improvements and extensions.

Attachment:- Information Retrieval.rar

Attachment:- Metadata.rar

Reference no: EM132801237

Questions Cloud

Define organizational behavior : Define organizational behavior. Describe how different components of organizational behavior are used within an organization.
Determine the amount of deposits in transit : Patry Corp. deposits all receipts intact and makes all payments by cheque. Determine the amount of deposits in transit and outstanding cheques at May 31
What is the Accounting Department cost : The Maintenance Department's costs of $300,000 are allocated on the basis of machine hours. What is the Accounting Department cost
How cash should be distributed during the entire course : Partners A, B, C and D share profits in the ratio of 3:3:1:1, respectively. How cash should be distributed during the entire course of liquidation
CE706 Information Retrieval Assignment : CE706 Information Retrieval Assignment Help and Solution, University of Essex - Assessment Writing Service - growing urgency for these approaches
HI6025 Accounting Theory and Current Issues Assignment : HI6025 Accounting Theory and Current Issues Assignment Help and Solution, Holmes Institute - Assessment Writing Service
Understanding individual behavior in a social context : Social psychology is about understanding individual behavior in a social context. Social psychologists, therefore, deal with the factors that lead us to behave
Receptive fields of cat optic nerve and lgn neurons : What new properties were associated with the discovery of these receptive fields? How did these properties require that the definition of receptive field be cha
Importance of a multicultural perspective in crisis interven : Give two examples of crisis situations in which an understanding of another culture will enable you to more effectively respond.

Reviews

len2801237

2/18/2021 4:24:13 AM

It's a technical work please give to good Writer Please see the below information. 1.. Topic- CORD-19 Word Count – No particular Count

Write a Review

Other Subject Questions & Answers

  Can you imagine your life without technology

What role does digital technology play in your life and what impacts do you think it has on you?

  What would you discourage parents from doing

How would you explain to parents and other important adults what early literacy means for young children from birth to age 5?

  What are some keywords for your industry

What are some keywords for your industry? What can you do to help develop your brand personality and why is entertainment an important part of your content?

  What are the main responsibilities of managers

What are the main responsibilities of managers

  Despite mexico and russia having similar histories

Despite mexico and russia having similar histories, both were governed by democratic parties, etc

  Consequences of the action

Kant's ethics believes upon the intent of an action as solely determinate of its moral worth, regardless of the consequences of the action.

  Identify the areas of multicultural and social justice

Identify the areas of Multicultural and Social Justice Counseling Competencies in which you think you need to grow the most

  Prepare a speech about the legal immigration

Prepare a speech about the Legal immigration is important to the United States economy

  Dangers of overidentifying with the persona

1. Out of all the archetypes discussed, which do you think best reflects yourself as of this moment? Explain how.

  Discuss importance of building relationships with families

Analyze which parenting styles these strategies will align with and provide a rationale as to why. Describe at least one resource, for each of the three strategies you shared, that you will use in your work with children and their families.

  Reflect on the appropriate clinical guidelines

Reflect on the appropriate clinical guidelines. Think about a treatment and management plan for the patient.

  The vietnam war

Describe the connection between student unrest and the Vietnam War, noting how each affected the other. Explain the political and social outcomes of the end of the Vietnam War.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd