How you download your datasert and index

Assignment Help Other Subject
Reference no: EM132812912

Topic- CORD-19

This task comes in stages. Marks are given for each stage. The stages are as follows:

- Indexing The first step for you will be to obtain the dataset. Once you have done so upload a sample of 1000 articles with full text to Elasticsearch (the simplest thing is to use the first 1000 documents). You will work with the metada.csv file provided by the challenge.
- Sentence Splitting, Tokenization and Normalization The next step should be to transform the input text into a normal form of your choice. This should include the identification of sentences, bullet points and cells in tables.
- Selecting Keywords One aim of your system is to identify the words and phrases in the text that are most useful for indexing purposes. Your system should remove words which are not "useful". E.g. very frequent words or stopwords. You should also identify phrases suitable as index terms. Apply tf.idf as part of your selection and weighting step.

• Stemming or Morphological Analysis Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way, e.g.bus and busses refer to exactly the same thing even though they are different words.
• Searching Once you have indexed the collection you want to be able to search it. You can do that on the command line, but it would be much better to have an interactive system. You could start with Kibana for that but you are free to use other open source tools for your Graphical User Interface(GUI). Note that the each article in the collection contains different fields. Make sure that a user can decide which field to search (Hint:one of the fields is the publication date of the article).
• Engineering a Complete System The final system should allow a user to have control over all the individual components, so inthe final result we will have a complete search engine, not disperate code.
You will have noticed that the percentages above only add up to 80%. This is because one of the important aspects of the project is that your work should be well documented and your code well commented. 20% of your mark will come from this. The report should contain:
• Instructions for running your system
• Screenshots illustrating the functionality you have implemented
• Design and design decisions/justifications of your overall architecture
• A description of the document collection you have chosen
• Discussion of your solution focussing on functionality implemented and possible improvements and extensions.

Assigment 1
Instructions for running your system (Engineering a Complete System)
Include here instructions to run your system and control each individual component. You may include screenshots to clarify.

Indexing
Include here the details of how you download your datasert and index it including any issue that you had and how did you face it. Explain which documents have you selected for your experiments.. You may include screenshots to clarify.

Sentence Splitting, Tokenization and Normalization
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Selecting Keywords
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Stemming or Morphological Analysis
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Searching
Include here the details of how you did this step including any issue that you had and how did you face it. You may include screenshots to clarify.

Attachment:- Information Retrieval.rar

Reference no: EM132812912

Questions Cloud

Find what is the net present value of the investment : Find What is the net present value (rounded to the nearest thousand) of the investment assuming the required rate of return is 10 percent?
Research use for chatbot : Research a use for a chatbot. What are the limitations? What are the benefits? Can an organization rely solely on a chatbot? Explain why or why not?
Explain the concept of going concern : Explain the concept of going concern, and list four indicators that can suggest that the going concern assumption may be at risk
Analyse the importance of tax policy to the development : Analyse the importance of tax policy to the development of Malaysian economy. Support your answer with relevant provisions of the ITA.
How you download your datasert and index : Stemming or Morphological Analysis Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way
How you are going to evaluate the recommendations : Discuss how you are going to evaluate the recommendations rather than just saying you will evaluate them. What will you take into consideration?
Plot the supply and demand curves : Suppose that the supply curve for private schoolteachers is Ls = 20,000 + 350W, and the demand curve for such schoolteachers is Ld = 100,000 - 150W, where L = t
Two main considerations in bankruptcy law : Why bankruptcy laws are important for the proper working of the price mechanism? What are the two main considerations in any bankruptcy law?
What would recommend matt do in order to minimize : Assuming Matt is flexible with the timing of his charitable contributions, what would you recommend Matt do in order to minimize his tax liability?

Reviews

len2812912

3/2/2021 12:48:54 AM

It's a technical work please give to good Writer Hi Please see the below information. 1.. Topic- CORD-19 2. Subject- Information Retrieval 3. Level of study- PG

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd