Reference no: EM133053125
1) The standard solution of today's search engines is based on certain indexing techniques. How would knowledge have to be structured according to the ideas of Vanevar Bush or Ted Nelson if their central ideas were reality? Develop a vision.
2) To build a keyword index for a search engine, only those words are extracted from all documents that are also content-bearing. For this purpose, the indexing extracts from all documents the words that also carry content. Give examples of words that do not carry content and are also called stop words.
3) Your boss says to you: "We need a knowledge management system. Every employee can store his or her knowledge in it for others. I think the 100,000 Stutz for Confluence is money well spent, and I'm always happy to invest for something good." - What specifically would you point to now?
4) Your boss says to you: "Last year we had a Facebook shitstorm. - Very unpleasant... - If you don't notice it early, you can't counteract it. What can you do to notice something like that earlier, and could you automate it?- I heard that this programming language Python has a Facebook interface. Is it possible to have something programmed there? - We have hired a computer scientist who knows Python" - What specifically would you point out now?
5) Given the following sentence:
"The moon is not made of green cheese, but of other substances".
Perform tokenising and lemmatisation and specify result words (terms) that you would store in the keyword index (e.g. in the database table "Term").
6) Given the following sentence:
"The University of Applied Sciences Graubünden is located in Chur and Jürg Kessler is its rector."
Show what a named entity recognition procedure would produce as a result here.
7) You are employed in the IT department of a corporation that operates 300 DIY stores in Europe. The management asks you for a proposal for the following problem: "There are 20,000 products in the portfolio, and we have to keep documents for all these products. These documents should be accessible via intranet at every info point. Mostly they are PDF documents where the file name consists of the article number. If you don't know the article number, you don't get anywhere. A search engine for all PDF documents would be great. We can't give that to Google. Are there solutions that computer scientists can make work internally?" - Make a suggestion