Classification and regression tree

Assignment Help Financial Accounting
Reference no: EM1388126

1 Competitive Auctions on eBay.com. The eBayAuctions contains information on 1972 auctions transacted on eBay.com during May-June 2004. The goal is to use these data to build a model that will classify competitive auctions from noncompetitive ones.

A competitive auction is defined as an auction with at least two bids placed on the item auctioned. The data include variables that describe the item (auction category), the seller (his/her eBay rating), and the auction terms that the seller selected (auction duration , opening price, currency, day-of-week of auction close). In addition, we have the price at which the auction closed. The goal is to predict whether or not the auction will be competitive.

Data Preprocessing. Create dummy variables for the categorical predictors. These include Category (18 categories), Currency (USD, GBP. Euro), EndDay (Monday- Sunday), and Duration (1, 3, 5, 7, or 10 days). Split the data in to training and validation datasets using a 60% : 40% ratio.

a. Fit a classification tree using all predictors using the best pruned tree. To avoid overfitting, set the minimum number of observations in a leaf node to 50. Also. set the maximum number of levels to be displayed at seven (the maximum allowed in XLMiner). To remain within the limitation of 30 predictors, combine some of the categories of categorical predictors. Write down the results in terms of rules.

b. Is this model practical for predicting the outcome of a new auction?

c. Describe the interesting and uninteresting information that these rules provide.

d. Fit another classification tree ( using the best-pruned tree, with a minimum number of observations per leaf node = 50 and maximum
allowed number of displayed levels), this time only with predictors that can be used for predicting the outcome of a new auction. Describe the resulting tree in terms of rules. Make sure to report the smallest set of rules required for classification.

e. Plot the resulting tree on a scatterplot: Use the two axes for the two best (quantitative) predictors. Each auction will appear as a point, with coordinates corresponding to its values on those two predictors. Use different colors or symbols to separate competitive and noncompetitive auctions. Draw lines (you can sketch these by hand or use Excel) at the values that create splits. Does this splitting seem reasonable with respect to the meaning of the two predictors? Does it seem to do a good job of separating the two classes?

f. Examine the lift chart and the classification table for the tree. What can you say about the predictive performance of this model?

g. Based on this last tree, what can you conclude from these data about the chances of an auction obtaining at least two bids and its relationship to the auction settings set by the seller (duration, opening price. ending day, currency)? What would you recommend for a seller as the strategy that will most likely lead to a competitive auction?

9.2 Predicting Delayed Flights. The file FlightDelays.xls contains information on ail commercial flights departing the Washington, D.C., area and arriving at New York during January 2004. For each flight there is information on the departure and arrival airports, the distance of the route, the scheduled time and date of the flight, and so on. The variable that we are trying to predict is whether or not a flight is delayed. A delay is defined as an arrival that is at least 15 minutes later than scheduled.

Classification and Regression Tree

Data Processing. Create dummies for day of week, carrier, departure airport, and arrival airport.

This will give you 17 dummies. Bin the scheduled departure time into 2- hour bins (in XLMiner use Data Utilities > Bin Continuous Data and select 8 bins with equal width). After binning DEP _TIME into 8 bins, this new variable should be broken down into 7 dummies (because the effect will not be linear due to the morning and afternoon rush hours). This will avoid treating the departure time as a continuous predictor because it is reasonable that delays are related to rush-hour times. Partition the data into training and validation
sets.

a. Fit a classification tree to the flight delay variable using all the relevant predictors. Do not include DEP_TI ME (actual departure time) in the model because it is unknown at the time of prediction (unless we are doing our predicting of delays after the plane takes off, which is unlikely). In the third step of the classification tree menu, choose:

• "Maximum number levels to be displayed = 6".
• Use the best pruned tree without a limitationon the minimum number of observations in the final nodes.

Express the resulting tree as a set of rules.

b. If you needed to fly between DCA and EWR. on a Monday at 7 AM. would you be able to use this tree? What other information would you need? Is it available in practice? What information is redundant?

c. Fit another tree, this time excluding the day-of-month predictor. (Why?) Select the option of seeing both the full tree and the best pruned tree. You will find that the best pruned tree contains a single terminal node.

i. How is this tree used for classification? (What is the rule for classifying?)
ii. To what is this rule equivalent?
iii. Examine the full tree. What are the top three predictors according to this tree?
iv. Why, technically, does the pruned tree result in a tree with a single node?
v. What is the disadvantage of using the top levels of the full tree as opposed to the best pruned tree?
vi. Compare this general result to chat from logistic regression in the example in Chapter 10. What are possible reasons for the classification tree's failure to find a good predictive model?

9.3 Predicting Prices of Used Cars (Regression Trees). The file ToyotaCorolla.xls contains the data on used cars (Toyota Corolla) on sale during late summer of 2004 in The Netherlands. It has 1436 observations containing details on 38 attributes, including Price, Age, KM, HP, and other specifications. The goal is to predict the price of a used Toyota Corolla based on its specifications. (The example in Section 9.8 is a subset of this dataset.)

Data Preprocessing. Create dummy variables for the categorical predictors (Fuel Type and Color). Split the data into training (50%), validation (30%), and test (20%) datasets.

a. Run a regression tree (RT) using the prediction menu in XLMiner with the out- put variable Price and input variables Age_08_0-L KM, FueLType, H

P, Automatic, Doors, Quarterly_ Tax, Mfg_Guarantee, Guarantee _ Period, Airco, Automatic_Airco, CD_ Player, Powered _ Windows, Sport_ Model, and Tow_ Bar. Normalize the variables. Keep the minimum number of observations in a terminal node to 1 and the scoring option to Full Tree, to make the run least restrictive.

b. Which appear to be the three or four most important car specifications for predicting the car's price?

Reference no: EM1388126

Questions Cloud

The level of ethnocentricity of an international firm : The level of ethnocentricity of an international firm business leader might influence his or her decisions relevant to cross-cultural issues
Chromosome from a single origin of replication : Suppose that the each replication fork moves at a rate of 500 base pairs per second, how long would it take to replicate the E. coli chromosome from a single origin of replication?
Overview of respiratory system : Working in health care industry, a medical professional should be familiar with the different systems in the body. The respiratory system is one such system that you might encounter when working as a medical billing and coding specialist.
Experienced economic growth for the factor selected : Select one that you believe to be the most influential and explain why. Provide specific examples of one or more countries that have experienced economic growth for the factor selected.
Classification and regression tree : Describe the interesting and uninteresting information that these rules provide and what are possible reasons for the classification tree's failure to find a good predictive model?
Value of research in healthcare administration : Why does a professional in the field of healthcare/healthcare administration require to know about research? What impact may research have in healthcare environment?
Determine the function of the protein : Discover a protein that is released from the cell and explain its role in the body. What is the function of the protein? How is it related to exocytosis?
Illustrate out the definition of abortion : Critically illustrate out the definition of abortion? What was the significance of the Roe v. Wade decision by the Supreme Court?
Claiming the population mean : A random sample of 58 fish is taken from the tank. Let x be the mean sample length of these fish. What is the probability that x is within 0.5 inch of the claimed population mean? (Round your answer to four decimal places.)

Reviews

Write a Review

Financial Accounting Questions & Answers

  Calculate the merchandise inventory values

Calculate the merchandise inventory values

  Audit report and financial statements for the system

Audit report and financial statements for the system

  Computation of bank reconciliation statements

Computation of Bank reconciliation Statements - Prepare a schedule showing how much the cashier embezzled.

  Determine the npv for the purchase

Determine the NPV for the purchase, lease without the service contract, and the lease with the service contract.

  Prepare needed journal entries for 2014 and 2015

Prepare needed journal entries for 2014 and 2015. Be sure to indicate whether each entry should be made to an unrestricted or temporarily restricted fund. You need not, thus, record the indirect costs themselves.

  Evaluate the net present value

Evaluate the net present value. (Negative amount should be shown by a minus sign. Round discount factor(s) to 3 decimal places, other intermediate evaluations and final answer to the nearest whole dollar.)

  Sale on the financial statements

Sale on the financial statements What should Milley do?

  Accrual accounting and cash flow analysis

Find a journal or news article that explains why both accrual accounting and cash flow analysis are required to understand a company? Briefly describe the article. What would you have added to improve the analysis?

  Activity based cost analysis

Activity based cost analysis - Were your results the typical pattern for an activity-based costing analysis? Explain.

  Find sources of revenue tenet healthcare

Tenet Healthcare and HCA Holdings Inc. are major competitors in the healthcare industry.

  How much does sara include in her gross income

The church did not keep a record of the amounts given nor the contributors, but the minister estimates that these gifts amount $10,000 in the current year. How should he treat these gifts?

  Computation of cash flow from financing activities

Computation of cash flow from financing activities using given data and Given the following financial statements for ACME Corporation, and assuming that ACME paid a common dividend of $45,000 in 2004, what is the company\'s financing cash flow for ..

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd