What feature play important role in your prediction model

Assignment Help Other Subject
Reference no: EM133770271

Assessment topic
Data understanding and knowledge-based schemes

Task details:

Advertisements on internet pages can be an inconvenient distraction from the actual content being conveyed by website. Moreover, some advertisements can lure you into fake website with malicious viruses.

In this assessment you will be working on a dataset that represents a set of possible advertisements on internet pages. The original dataset has been sourced from UC Irvine Machine Learning Repository (Kushmerick, 1998). The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text. The goal is to predict whether an image is an advertisement ("ad") or not ("nonad"). The data has missing values which needs to be handled.

You will need to prepare the data for mining and perform an exploratory data analysis. The data mining task is to predict whether an image is an advertisement ("ad") or not ("nonad"). An explicit training/test split is not provided so you need to determine a reasonable way of assessing performance. The dataset also has several hundred features (attributes). You need to perform feature reduction to significantly reduce the number of features. Implement at least two different classifiers.

Challenges: There is an imbalance of the number of data per each class. Also, the number of attributes is very high compared to the size of the dataset hence feature reduction is. One or more of the three continuous features have missing data.

The main goal of this project is to build a machine learning model that, given a set of suitable features, will predict whether the image is an advertisement or not. You may limit the initial set of features to features that encode the geometry of the image (if available) as well as phrases occuring in the URL and the image's URL. A truncated file is available on Moodle. You will still need to perform feature reduction on this dataset.

Dataset Description
There are four image features:
height: continuous
width: continuous
aratio: aspect ratio. continuous.
local: image location 0,1.
There are 457 features for url terms and the values are 0 or 1.
There are 495 features for origurl terms and the values are 0 or 1.

Students are recommended to use the following report structure that address the marking rubric criteria:
Introduction: introduces the case study and the objective

Data understanding: Visualize the data. Explain the data. What preparation methods are required? What feature play important role in your prediction model?

Model implementation and evaluation: show the screenshot of your implemented models followed by the explanation. How would you improve the performance of your models? Compared the models against each other.

Insights: Discuss the output that you receive for the implemented models. Explain the knowledge and insight about the examples in the dataset.

Conclusion: Conclude the model implementation and the results you receive from them. Highlight the key points in the insights.

References: follow a consistent format and use SISTC recommended referencing style (refer to unit outline)

Reference no: EM133770271

Questions Cloud

Company needs to order special chemical : A company needs to order a special chemical that will be sold during the upcoming season. Only one order can be made before the season begins.
Digitally disrupted business industry : In the age of Generative AI, how can the Australia Higher Education sector meet the future workforce needs of a digitally disrupted business industry
Hockey stick forecast is problematic : A hockey stick forecast is problematic because it tends to be overly optimistic and often lacks a realistic foundation, leading to unrealistic expectations
Explain how your chosen open source software disrupted : Briefly explain how your chosen open source software disrupted the marketplace using the distinctions we've learned in previous Modules.
What feature play important role in your prediction model : Data understanding and knowledge-based schemes - Visualize the data. Explain the data. What preparation methods are required? What feature play important role
Which zone has the highest rates of crime-delinquency : Summarize the various concentric zones of the map of Chicago as discussed on pp. 109-110. Which zone has the highest rates of crime/delinquency and why?
Commanders critical information requirements : Commanders Critical Information Requirements (CCIRs) are continually reviewed and updated or deleted as required
Evaluate the characteristics and attitudes : Evaluate the characteristics and attitudes of various leadership theories by filling out the chart. For each theory,
Describe a time you learned or practiced doing gender : PSY 200- SOC 200- Describe a time when you learned new information or new behaviors through operant. Describe a time you learned or practiced doing gender.

Reviews

Write a Review

Other Subject Questions & Answers

  Cross-cultural opportunities and conflicts in canada

Short Paper on Cross-cultural Opportunities and Conflicts in Canada.

  Sociology theory questions

Sociology are very fundamental in nature. Role strain and role constraint speak about the duties and responsibilities of the roles of people in society or in a group. A short theory about Darwin and Moths is also answered.

  A book review on unfaithful angels

This review will help the reader understand the social work profession through different concepts giving the glimpse of why the social work profession might have drifted away from its original purpose of serving the poor.

  Disorder paper: schizophrenia

Schizophrenia does not really have just one single cause. It is a possibility that this disorder could be inherited but not all doctors are sure.

  Individual assignment: two models handout and rubric

Individual Assignment : Two Models Handout and Rubric,    This paper will allow you to understand and evaluate two vastly different organizational models and to effectively communicate their differences.

  Developing strategic intent for toyota

The following report includes the description about the organization, its strategies, industry analysis in which it operates and its position in the industry.

  Gasoline powered passenger vehicles

In this study, we examine how gasoline price volatility and income of the consumers impacts consumer's demand for gasoline.

  An aspect of poverty in canada

Economics thesis undergrad 4th year paper to write. it should be about 22 pages in length, literature review, economic analysis and then data or cost benefit analysis.

  Ngn customer satisfaction qos indicator for 3g services

The paper aims to highlight the global trends in countries and regions where 3G has already been introduced and propose an implementation plan to the telecom operators of developing countries.

  Prepare a power point presentation

Prepare the power point presentation for the case: Santa Fe Independent School District

  Information literacy is important in this environment

Information literacy is critically important in this contemporary environment

  Associative property of multiplication

Write a definition for associative property of multiplication.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd