Reference no: EM131559466
Spam Detection for Text Messages
Background: Text Message Spam are unsolicited text messages (SMS), especially advertising, directed at mobile phones or smartphones. As the popularity of mobile phones surged in the early 2000s, frequent users of text messaging began to see an increase in the number of unsolicited (and generally unwanted) commercial advertisements being sent to their telephones through text messaging (SMS). This can be particularly annoying for the recipient because, unlike in email, some recipients may be charged a fee for every message received, including the spam messages.
In this challenge, we ask you to complete the analysis of what type of text messages are likely to be spam. In particular, we ask you to apply the tools of machine learning to predict which messages in a corpus are spam. You may treat it as a classic case-study of Binary Classification.
Problem statement: Use as training set the labeled (good/spam) text messages provided in the text file to build a robust tree-based binary classifier that is capable of distinguishing spam text messages from regular ones.
Dataset: Note that the training set of labeled text messages is structured as follows, where the first element is the label, either good or spam, and then the text message is posted as raw text.
good Go until jurong point, crazy.. Available only in bugis n great world la e buffet... good Ok lar... Joking wif u oni...
spam Free entry in 2 a wkly comp to win FA Cup final tkts 21st May 2005. good U dun say so early hor... U c already then say...
good Nah I don't think he goes to usf, he lives around here though
spam FreeMsg Hey there darling it's been 3 week's now and no word back!
good Even my brother is not like to speak with me. They treat me like aids patent.