What is tokenization

Assignment Help Other Subject
Reference no: EM13762014

1. Why don't we use grep for information retrieval?

2. Why don't we use a relational database for information retrieval?

3. Google does not always interpret the query as a boolean conjunction of its terms. Give examples.

4. What is a term-document incidence matrix?

5. In constructing the index, which step is most expensive/complex?

6. Complex Boolean retrieval systems like Westlaw use many operations that go beyond strictly Boolean

7. operators. Name some of them.

8. Define the number of types/tokens in a sentence.

9. An IR system can normalize terms by defining equivalence classes. E.g., "suit" and "suits" could be

10. in an equivalence class. What is the limitation of this model in IR?

11. What is tokenization?

12. Give an example in English were tokenization is nontrivial

13. What is a stop list?

14. What is lemmatization? Give an example.

15. What is stemming? Give an example that is not also a lemmatization example.

16. Name a particular stemmer.

17. Give an example of a pair of words that a typical stemmer would put in one equivalence class and we

18. would expect improved performance of the IR system.

19. Give an example of a pair of words that a typical stemmer would put in one equivalence class and we

20. would expect decreased performance of the IR system.

21. Name two data structures that support phrase queries.

22. Name a data structure that supports proximity queries.

23. Which data structures are typically used for locating the entry for a term in the dictionary?

24. Which data structure is best used for locating the entry for a term in the dictionary if the collection

25. is static?

26. Which data structure is best used for locating the entry for a term in the dictionary if prefix search

27. must be supported?

28. Which special strings are stored in the permuterm index for the word "car"?

29. What sequence of letters is looked up in the permuterm index for the following wildcard queries?

30. X, X*, *X, *X*, X*Y

31. What is the difference between the regular inverted index used in IR and the k-gram index?

32. Give an example of a query that cannot be corrected using isolated-word spelling correction.

33. Define Levenshtein edit distance.

34. Define Damerau-Levenshtein edit distance.

35. Give the formula for Zipf's law.

36. Give the formula for Heaps' law.

37. What is the feast or famine problem?

38. Define the Jaccard coefficient

39. What is the bag of words model?

40. What is the advantage of idf weighting compared to inverse-collection-frequency weighting?

41. What is the tf-idf weight of term t in document d?

42. What is the relationship between term frequency and collection frequency?

43. Why don't we use Euclidean distance of tf-idf vectors to rank documents with respect to a query?

44. Write down the formula for cosine similarity between query q and document d.

45. Explain the notation ddd.qqq

46. What is the advantage of pivot normalization compared to regular cosine normalization?

47. What is document-at-a-time processing?

48. What index organization does document-at-a-time processing require?

49. What is term-at-a-time processing?

50. What data structure does term-at-a-time processing require that document-at-a-time processing does

51. not require?

52. What is a tiered inverted index?

53. Name two criteria that can be used for deciding as to whether to put a document d in tier 1 of a tiered

54. index.

55. Name three criteria for evaluating a search engine.

56. What are the components of an information retrieval benchmark?

57. What is the difference between the concepts of query and information need?

58. Define precision

59. Define recall

60. Define F1

61. What is the harmonic mean of two numbers?

62. Why is F1 defined as the harmonic mean?

63. What is an easy way of maximizing the recall of an IR engine?

64. What is an easy way of maximizing the precision of an IR engine?

65. What is a precision-recall curve?

66. An evaluation benchmark ideally should tell us for any document-query pair whether the document is

67. relevant to the query. Why is Cranfield the only collection that actually satisfies this desideratum?

68. Define the kappa measure

69. What is the minimum and maximum of the kappa measure?

70. What is the significance of kappa being less than / greater than 0?

71. What is A/B testing?

72. What does marginal relevance measure?

73. What distinguishes a dynamic from a static summary?

74. What is a simple heuristic for computing a dynamic summary if you can display n characters?

Reference no: EM13762014

Questions Cloud

Technical paper on database : This Final Project Paper must be a technical paper on Database topics that have been pre-approved in earlier weeks by your instructor, using knowledge gained in the course. A general paper written for the common IT user or general reader is NOT ac..
Explain meticulous comprehension and organization of syntax : Explain meticulous comprehension and organization of syntax and mechanics, such as spelling and grammar. Written work contains no errors and is very easy to understand.
Experience using excel prior : Discuss your experience in using Excel prior to taking this course.
The district ese department has opened three pre : Additionally, the district ESE department has opened three pre-K handicapped programs at Happy Trails Elementary, serving the needs of the Pre-K handicapped population in the area. The Pre-K students are primarily classified as Speech/Language Impair..
What is tokenization : What is tokenization, Give an example in English were tokenization is nontrivial
What is the expected return : What is the expected return for a that has $200 in stock A with an expected return of 15%, $300 in stock B with an expected return of 10%, and $500 in stock C with an expected return of 25%?
Transmit an uncompressed grayscale image : Determine the total time it takes to transmit an uncompressed grayscale image (with 8 bits/pixel) from a screen with a resolution of 1,280 × 840 pixels using each of the following media:
List decisions that firms have to make in global outsourcing : List the major decisions that firms have to make in global outsourcing and indicate how firms decide whether or not they should engage in global outsourcing
How to allow mobile access to organizational system : How to allow mobile access to organizational system users (employees, contractors, and business partners), Use of cloud resources such as processing and data storage outside the organization's physical locations

Reviews

Write a Review

Other Subject Questions & Answers

  Scathingly critical english author

________________ was the scathingly critical English author who wrote Domestic Manners of the Americans.  The wave of religious revivalism that swept much of America in the early nineteenth century was known as_____________________.

  Explain how your final paper will investigate the problem

imagine that you are a researcher and are conducting a study to investigate a community problem. using behavioral

  Compare development and empowerment

Compare (similarities and differences) development and empowerment, and provide at least one specific example of each. Explain a view of leadership authority that you think would be most likely to empower others. Explain how and why.

  What is possible break-point between these two cities

Kansas city and des moines are 200 miles apart. kansas city has a population of about 450,000, qhile the population of des moines is 200,000. given reilly's law, what is a possible break-point between these two cities?

  Explain why pre- and post-questioning are important

a. create a presentation e.g. a multimedia presentation a website or another document of your choice suggested length

  Explain how formal organizations

Explain how formal organizations have evolved over the past century. What differences were there in organizations a century ago, compared to today's organizations? What are the current trends in formal organizations?

  Mullins was the factory manager at the acme widget works

Mullins was the factory manager at the Acme Widget Works located in an employment at will state.  The company came under new management and Mullins was reassigned as Director of Mfg, a newly created position. His actual duties could be performed in l..

  Redress from the united states may be secured in

Redress from the united states may be secured in

  Disadvantages of a freely floating exchange rate

What are the disadvantages of a freely floating exchange rate that led countries to managed float system?

  The role of the house rules committee

The role of the house rules committee is played in the senate by the

  Clinical manifestations during menopause

The patient states she has had hot flashes, vaginal dryness, and increased emotional lability. Explain why the patient would have such clinical manifestations throughout menopause.

  Why is it important to maintain a key control system

Why is it important to maintain a key control system within the casino floor and what are some of the keys used within a casino and each key's purpose?

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd