Summarize the concerns expressed by this contrarian

Assignment Help Management Information Sys
Reference no: EM131816838

Data Mining in the Real World

"I'm not really a contrarian about data mining. I believe in it. After all, it's my career. But data mining in the real world is a lot different from the way it's described in textbooks. "There are many reasons it's different. One is that the data are always dirty, with missing values, values way out of the range of possibility, and time values that make no sense.

Here's an example: Somebody sets the server system clock incorrectly and runs the server for a while with the wrong time. When they notice the mistake, they set the clock to the correct time. But all of the transactions that were running during that interval have an ending time before the starting time. When we run the data analysis, and compute elapsed time, the results are negative for those transactions. "Missing values are a similar problem.

Consider the records of just 10 purchases. Suppose that two of the records are missing the customer number and one is missing the year part of transaction date. So you throw out three records, which is 30 percent of the data. You then notice that two more records have dirty data, and so you throw them out, too. Now you've lost half your data. "Another problem is that you know the least when you start the study.

So you work for a few months and learn that if you had another variable; say the customer's Zip code, or age, or something else, you could do a much better analysis. But those other data just aren't available. Or, maybe they are available, but to get the data you have to reprocess millions of transactions, and you don't have the time or budget to do that. "Overfitting is another problem, a huge one. I can build a model to fit any set of data you have. Give me 100 data points and in a few minutes, I can give you 100 different equations that will predict those 100 data points. With neural networks, you can create a model of any level of complexity you want, except that none of those equations will predict new cases with any accuracy at all. When using neural nets, you have to be very careful not to overfit the data.

"Then, too, data mining is about probabilities, not certainty. Bad luck happens. Say I build a model that predicts the probability that a customer will make a purchase. Using the model on new-customer data, I find three customers who have a .7 probability of buying something. That's a good number, well over a 50-50 chance, but it's still possible that none of them will buy. In fact, the probability that none of them will buy is .3 × .3 × .3, or .027, which is 2.7 percent. "Now suppose I give the names of the three customers to a salesperson who calls on them, and sure enough, we have a stream of bad luck and none of them buys. This bad result doesn't mean the model is wrong. But what does the salesperson think? He thinks the model is worthless and can do better on his own.

He tells his manager who tells her associate, who tells the Northeast Region, and sure enough, the model has a bad reputation all across the company. "Another problem is seasonality. Say all your training data are from the summer. Will your model be valid for the winter? Maybe, but maybe not. You might even know that it won't be valid for predicting winter sales, but if you don't have winter data, what do you do?

"When you start a data mining project, you never know how it will turn out. I worked on one project for 6 months, and when we finished, I didn't think our model was any good. We had too many problems with data:

wrong, dirty, and missing. There was no way we could know ahead of time that it would happen, but it did. "When the time came to present the results to senior management, what could we do? How could we say we took 6 months of our time and substantial computer resources to create a bad model? We had a model, but I just didn't think it would make accurate predictions.

I was a junior member of the team, and it wasn't for me to decide. I kept my mouth shut, but I never felt good about it. Fortunately, the project was cancelled later for other reasons. "However, I'm only talking about my bad experiences. Some of my projects have been excellent. On many, we found interesting and important patterns and information, and a few times I've created very accurate predictive models. It's not easy, though, and you have to be very careful. Also, lucky!"

Discussion Questions

1. Summarize the concerns expressed by this contrarian.

2. Do you think the concerns raised here are sufficient to avoid data mining projects altogether?

3. If you were a junior member of a data mining team and you thought that the model that had been developed was ineffective, maybe even wrong, what would you do? If your boss disagrees with your beliefs, would you go higher in the organization? What are the risks of doing so? What else might you do?

Reference no: EM131816838

Questions Cloud

Prepare flexible budgets for the company at sales volumes : Pebco Company's 2011 master budget included the following fixed budget report. Prepare flexible budgets for company at sales volumes of 14,000 and 16,000 units
What can an organization do to protect itself : What can an organization do to protect itself against accidental losses due to semantic security problems?
Compute the loss on disposal of the original machine : Suppose the Oak Street TCBY manager replaces the original machine. Compute the "loss on disposal" of the original machine.
Discuss which role you think would more challenging and why : Discuss the key differences between internal and external consultants. Discuss which role you think would be more challenging and why.
Summarize the concerns expressed by this contrarian : Summarize the concerns expressed by this contrarian. Do you think the concerns raised here are sufficient to avoid data mining projects altogether?
How do results for your company compare to industry averages : Calculate the following ratios for the most recent two years and comment on the results of your ratio analysis
Which of the given unit costs is most accurate : Unit Costs Brandon Company produces and sells a product that has variable costs of $8 per unit and fixed costs of $250,000 per year.
Explore two journal articles regarding a contemporary crisis : Explore two journal articles regarding a contemporary crisis. Relevant articles must be dated after 2001 (to present). What were the challenges faced?
Does the difference appear to be major or minor : a. Use software to access the student survey data, and report the mean hours of sleep for on-campus and for offcampus students.

Reviews

Write a Review

Management Information Sys Questions & Answers

  Discuss some time management challenges that you seen

Discuss some time management challenges you seem to face on a regular basis. List possible solutions, and/or ask your peers .

  How can the streaming audio business model be successful

Will the streaming audio approach change the music industry once more? How?How can the streaming audio business model be successful?

  How does a resource starved business unit build a plan

Research on why your company or organization needs more user education about security.Where does that begin?

  Write about the poor it-related business continuity practice

Do some research and write a paper examples of poor IT-related business continuity practices... where hardware failures or natural disasters have created significant disruptions in a business (profit loss, customer dissatisfaction, etc). You shou..

  Inferiority of competing productsconsider this situation

inferiority of competing productsconsider this situation joseph had an idea that he could speed up a floor buffing

  Feature that is provided by google doc

Analyze Google Docs and discuss some of the feature that is provided by Google Doc

  List and define at least five functions of the system

List and define at least 5 functions of the system. What IT staff will be required to supprt the system, and what business staff will be required and if any staff will need to be re-allocated or layed-off as a result of the system implementation.

  Create advanced formulas and build macros

Theresa Pratt works in the business office at Mills College, a technical college in Council Project Goal Bluffs, Iowa. Theresa has asked you to help her work with table data, create advanced formulas, and build macros

  What action the organization should take

Mark works for the Wonder Mattress Company in the IS department. He is in charge of looking at new technologies that come on the market and determining if they can be used for the company. An explanation of what the organization's responsibility sh..

  Research a multinational corporation then answer the

research a multinational corporation. then answer the following questionshow do the changing technology and the falling

  How will use a relational database to store data

Multidimensional Analysis and Data Versus Information - How will use a relational database to store data, be specific.

  Describe why they use this philosophy

Value chain model - Describe why they use this philosophy with any resources used to help explain it.

Free Assignment Quote

Assured A++ Grade

Get guaranteed satisfaction & time on delivery in every assignment order you paid with us! We ensure premium quality solution document along with free turntin report!

All rights reserved! Copyrights ©2019-2020 ExpertsMind IT Educational Pvt Ltd