Reference no: EM133703615
Lesk Algorithm for Word Sense Disambiguation:
In this question, you will implementing Simplified Lesk algorithm for Word Sense Disambiguation task.
1. Load SemCor corpus using NLTK1 with semcor.sents(). Similarly, load WordNet model in NLTK2 as import wordnet as wn. Randomly select 50 sentences and store the sentences (sents()) and their corresponding tagged version (tagged_sents()) as data and labels for first 2 models.
2. Our first model for word sense disambiguation is Most Frequent Sense model, in which, as the name suggests, we choose most frequent sense for each word from the senses in a labelled corpus. For wordnet, this corresponds to the first sense in synset(). Using synset() and definition(), find the sense for each word. Evaluate and report the results using precision, recall and F-score.
3. Our second model is Simplified-Lesk algorithm as follows:
Here, Compute Overlap method calculates the number of words overlapping in the context (sentence) and the definition of the word from wordnet excluding the stopwords. The sense with largest overlap is chosen.
4. Evaluate and report the results with tags from the dataset using precision, recall and F-score.