Reference no: EM13822107
I am Abe Gong. I work at a tech company that does consumer electronics. I am a data scientist. So, I combine software engineering with statistics and math to build better product and make better decisions internally. There are three classic measures of central tendency: the mean, the median, and the mode. Each of those is a way to basically measure the average of your data, what's the middle? Where is the center? The mean is the one that most people are the most familiar with. It's, you add up all the points and then you divide by the total number of points and that gives you the mean, the average. The median is a little bit different, basically what you're saying is, let's line up every example we have and then we'll move in from the edges until we meet the point that's exactly in the middle. Whichever that one is, that's the median. The mode, you can imagine some distribution of data, the mode is the highest point on that. The answer category or the thing that had the most answers. So, all of those tend to be close to the middle, but they're always a little bit different. For most statistics, you use the mean, and if you only look at the mean, then you might miss out on a lot of the rest of the information that's there. A lot of real world data, though, is skewed, so you'll have a long tail in one direction or the other. Income is a really classic example of that. You have a few Bill Gates and Warrant Buffett's out there who are much richer than everybody else, so it's a very skewed distribution. In that case, those few people who are way out at the extreme, drag the mean up. So you can imagine the number of times that you check your email in a day, right? And it might be that the average number of times people check their email is say, I don't know, maybe 50 times a day? But some people check it a lot more, some people check it a lot less. The standard eviation tells you how wide that distribution is. So, when we say 50 times a day, is it really everybody checks it about 50 times a day, or are there some people that check it 100 and some people check it zero? You don't know that unless you know how wide the distribution is. So the two measures of variability that you see a lot, are really standard deviation and variance. Those are actually pretty much the same number The standard deviation is the square root of the variance, so if you know one, you know the other. You could think of standard deviation, like, think of when you're driving down the highway, you know, the speed limit says 65, but not everybody is driving exactly 65. Some people are over that, some people over in the right lane are under that. And essentially, the standard deviation, if you were to clock a lot of cars going past, to measure their speeds and then draw a graph of that, the standard deviation would tell you how dispersed is that graph. How wide is the distribution? If the standard deviation were low, that would mean everybody is driving just about 65 miles an hour, and a little over and a little under, but not much. If the standard deviation were very high, that would mean we've got some people who were pretty much stopped, and other people going 150 miles an hour. In the company I work for, we know that people tend to use our product less on holidays. So we tried an experiment where we basically sent a message to some people and nudged them and said, hey, why don't you use our thing on Thanksgiving a little bit more? Like most people don't, but you know, give it a shot, see if you can buck the trend. And when we did that, we found that the people who we sent that email to, actually did use our product more on Thanksgiving, than other people. And that's pretty cool, but if you just compared the means, you just compared the middle of the distribution, that doesn't give you enough information on its own. You've still got to go back and compare the standard deviations, see how wide those bell curves are, and basically see if the bell curve of people who didn't get the email and the bell curve of the people who did get the email, see how much those overlap. If the overlap is small, then you can prove that the intervention really did make a difference.