How you download your datasert and index

Assignment Help Other Subject
Reference no: EM132812912

Topic- CORD-19

This task comes in stages. Marks are given for each stage. The stages are as follows:

- Indexing The first step for you will be to obtain the dataset. Once you have done so upload a sample of 1000 articles with full text to Elasticsearch (the simplest thing is to use the first 1000 documents). You will work with the metada.csv file provided by the challenge.
- Sentence Splitting, Tokenization and Normalization The next step should be to transform the input text into a normal form of your choice. This should include the identification of sentences, bullet points and cells in tables.
- Selecting Keywords One aim of your system is to identify the words and phrases in the text that are most useful for indexing purposes. Your system should remove words which are not "useful". E.g. very frequent words or stopwords. You should also identify phrases suitable as index terms. Apply tf.idf as part of your selection and weighting step.

• Stemming or Morphological Analysis Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way, e.g.bus and busses refer to exactly the same thing even though they are different words.
• Searching Once you have indexed the collection you want to be able to search it. You can do that on the command line, but it would be much better to have an interactive system. You could start with Kibana for that but you are free to use other open source tools for your Graphical User Interface(GUI). Note that the each article in the collection contains different fields. Make sure that a user can decide which field to search (Hint:one of the fields is the publication date of the article).
• Engineering a Complete System The final system should allow a user to have control over all the individual components, so inthe final result we will have a complete search engine, not disperate code.
You will have noticed that the percentages above only add up to 80%. This is because one of the important aspects of the project is that your work should be well documented and your code well commented. 20% of your mark will come from this. The report should contain:
• Instructions for running your system
• Screenshots illustrating the functionality you have implemented
• Design and design decisions/justifications of your overall architecture
• A description of the document collection you have chosen
• Discussion of your solution focussing on functionality implemented and possible improvements and extensions.

Assigment 1
Instructions for running your system (Engineering a Complete System)
Include here instructions to run your system and control each individual component. You may include screenshots to clarify.

Indexing
Include here the details of how you download your datasert and index it including any issue that you had and how did you face it. Explain which documents have you selected for your experiments.. You may include screenshots to clarify.

Sentence Splitting, Tokenization and Normalization
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Selecting Keywords
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Stemming or Morphological Analysis
Include here the details of how you did this step including any issue that you had and how did you face it. Present examples for each of the aspects where this step went well. Also include examples for when it when wrong and how you could solve it. You may include screenshots to clarify.

Searching
Include here the details of how you did this step including any issue that you had and how did you face it. You may include screenshots to clarify.

Attachment:- Information Retrieval.rar

Reference no: EM132812912

Questions Cloud

Find what is the net present value of the investment : Find What is the net present value (rounded to the nearest thousand) of the investment assuming the required rate of return is 10 percent?
Research use for chatbot : Research a use for a chatbot. What are the limitations? What are the benefits? Can an organization rely solely on a chatbot? Explain why or why not?
Explain the concept of going concern : Explain the concept of going concern, and list four indicators that can suggest that the going concern assumption may be at risk
Analyse the importance of tax policy to the development : Analyse the importance of tax policy to the development of Malaysian economy. Support your answer with relevant provisions of the ITA.
How you download your datasert and index : Stemming or Morphological Analysis Writing word stems to the database rather than words allows to treat various inflected forms of a word in the same way
How you are going to evaluate the recommendations : Discuss how you are going to evaluate the recommendations rather than just saying you will evaluate them. What will you take into consideration?
Plot the supply and demand curves : Suppose that the supply curve for private schoolteachers is Ls = 20,000 + 350W, and the demand curve for such schoolteachers is Ld = 100,000 - 150W, where L = t
Two main considerations in bankruptcy law : Why bankruptcy laws are important for the proper working of the price mechanism? What are the two main considerations in any bankruptcy law?
What would recommend matt do in order to minimize : Assuming Matt is flexible with the timing of his charitable contributions, what would you recommend Matt do in order to minimize his tax liability?

Reviews

len2812912

3/2/2021 12:48:54 AM

It's a technical work please give to good Writer Hi Please see the below information. 1.. Topic- CORD-19 2. Subject- Information Retrieval 3. Level of study- PG

Write a Review

Other Subject Questions & Answers

  How would you maintain a professional demeanor

Knowing that clients might react negatively to your work with them may cause anxiety, frustration, and even anger.

  Write paragraph about the four myths of dementia

Write paragraph about what you learned from the video - The Four Myths of Dementia | Kate Irving | TEDxDCU by TEDx Talks

  Examine specific actions that leadership of selected

1.assess the positive and negative effects that peace and war respectively have on the distribution of foreign aid in

  Describe hobbes argument for why we should be moral

Describe Hobbes argument for why we should be moral. Remember to emphasize his account of human nature and the argument of human nature and the argument he gives for why we would agree to erect an absolute sovereign?

  Propose a new product or service for the new company

Propose a new product or service for the new company division. The division should be customer-focused with an innovative mission statement.

  Please tell us who was hurt by alexs actions

Please tell us who was hurt by Alex's actions. Also tell us how they were hurt. How would you feel if you were the person or one of the persons who were hurt?

  How many fathers are best for a child

What adaptive/evolutionary explanations do anthropologists give for the Barì family structure, as provided in "How Many Fathers Are Best for a Child??" Are there any similar family dynamics in our society?

  Explain how the literature demonstrates the significance

Before developing a new drug, pharmaceutical companies research products that are currently on the market. In their research process, corporations may ask questions such as, "What are current health needs, and how could a new drug address these ne..

  Ad hominem-appeal to false authority-faulty emotional appeal

TYPES OF FALLACIES: Ad hominem, appeal to false authority, faulty emotional appeal, hasty generalization, post hoc, two wrongs make a right, straw man, slippery slope, non sequitur, begging the question, appeal to tradition, false dilemma, and ad ..

  Evaluate the financial performance

Financial Analysis for Managers (ECM05EKM) - Evaluate the performance of business units using both financial and non-financial measures.

  What is the difference between quantitative and qualitative

What is the difference between quantitative and qualitative research in terms of: Overall world view, Specific types of research designs, and Methods of sampling.

  Develop and implement new method to screen potential entrant

Assume that you are a program coordinator for the U.S. Department of Homeland Security, and you have been asked to participate in developing a strategic.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd