What feature play important role in your prediction model

Assignment Help Other Subject

Reference no: EM133770271

Assessment topic
Data understanding and knowledge-based schemes

Task details:

Advertisements on internet pages can be an inconvenient distraction from the actual content being conveyed by website. Moreover, some advertisements can lure you into fake website with malicious viruses.

In this assessment you will be working on a dataset that represents a set of possible advertisements on internet pages. The original dataset has been sourced from UC Irvine Machine Learning Repository (Kushmerick, 1998). The features encode the geometry of the image (if available) as well as phrases occuring in the URL, the image's URL and alt text, the anchor text, and words occuring near the anchor text. The goal is to predict whether an image is an advertisement ("ad") or not ("nonad"). The data has missing values which needs to be handled.

You will need to prepare the data for mining and perform an exploratory data analysis. The data mining task is to predict whether an image is an advertisement ("ad") or not ("nonad"). An explicit training/test split is not provided so you need to determine a reasonable way of assessing performance. The dataset also has several hundred features (attributes). You need to perform feature reduction to significantly reduce the number of features. Implement at least two different classifiers.

Challenges: There is an imbalance of the number of data per each class. Also, the number of attributes is very high compared to the size of the dataset hence feature reduction is. One or more of the three continuous features have missing data.

The main goal of this project is to build a machine learning model that, given a set of suitable features, will predict whether the image is an advertisement or not. You may limit the initial set of features to features that encode the geometry of the image (if available) as well as phrases occuring in the URL and the image's URL. A truncated file is available on Moodle. You will still need to perform feature reduction on this dataset.

Dataset Description
There are four image features:
height: continuous
width: continuous
aratio: aspect ratio. continuous.
local: image location 0,1.
There are 457 features for url terms and the values are 0 or 1.
There are 495 features for origurl terms and the values are 0 or 1.

Students are recommended to use the following report structure that address the marking rubric criteria:
Introduction: introduces the case study and the objective

Data understanding: Visualize the data. Explain the data. What preparation methods are required? What feature play important role in your prediction model?

Model implementation and evaluation: show the screenshot of your implemented models followed by the explanation. How would you improve the performance of your models? Compared the models against each other.

Insights: Discuss the output that you receive for the implemented models. Explain the knowledge and insight about the examples in the dataset.

Conclusion: Conclude the model implementation and the results you receive from them. Highlight the key points in the insights.

References: follow a consistent format and use SISTC recommended referencing style (refer to unit outline)

Reference no: EM133770271

Questions Cloud

Company needs to order special chemical : A company needs to order a special chemical that will be sold during the upcoming season. Only one order can be made before the season begins.

Digitally disrupted business industry : In the age of Generative AI, how can the Australia Higher Education sector meet the future workforce needs of a digitally disrupted business industry

Hockey stick forecast is problematic : A hockey stick forecast is problematic because it tends to be overly optimistic and often lacks a realistic foundation, leading to unrealistic expectations

Explain how your chosen open source software disrupted : Briefly explain how your chosen open source software disrupted the marketplace using the distinctions we've learned in previous Modules.

What feature play important role in your prediction model : Data understanding and knowledge-based schemes - Visualize the data. Explain the data. What preparation methods are required? What feature play important role

Which zone has the highest rates of crime-delinquency : Summarize the various concentric zones of the map of Chicago as discussed on pp. 109-110. Which zone has the highest rates of crime/delinquency and why?

Commanders critical information requirements : Commanders Critical Information Requirements (CCIRs) are continually reviewed and updated or deleted as required

Evaluate the characteristics and attitudes : Evaluate the characteristics and attitudes of various leadership theories by filling out the chart. For each theory,

Describe a time you learned or practiced doing gender : PSY 200- SOC 200- Describe a time when you learned new information or new behaviors through operant. Describe a time you learned or practiced doing gender.

User Account

All Pages