I came across a few mentions of population vs sample. If someone states sigma vs. And how does one determine population vs sample? Could you not consider everything just a sample? Seems like it could be very subjective. The question leads to the great confusion between n and n As I remember from my Statistics class back in , there were two different formulae that were introduced for the standard deviation. One of the fomulae had n in the denominator, and the other had n The first formula was called the population standard deviation, and the second was called the sample standard deviation.
Quick note to the wise, if you wish to sound erudite and wicked smart, then spell the plural of "formula" with an "e" at the end. I also recommend spelling "gray" with an e: grey. The Brits just naturally sound smart. Population standard deviation. Below we have the formula for the "population" standard deviation. Formula for population standard deviation. You subtract the mean from all the samples, and square each difference. Squaring them does two things. First, it makes them all positive.
After all, we want to count a negative deviation the same as a positive deviation, right? Second, it gives more weight to the larger deviations. Do you really want to give more weight to the larger deviations? I dunno. Maybe you don't. For some purposes, it might be better to take the absolute value, rather than the square.
More exact corrections are shown here: en. What if it overestimates? Show 1 more comment. Dror Atariah 2 2 silver badges 15 15 bronze badges. Why is it that the total variance of the population would be the sum of the variance of the sample from the sample mean and the variance of the sample mean itself?
How come we sum the variances? See here for intuition and proof. Show 4 more comments. I have to teach the students with the n-1 correction, so dividing in n alone is not an option. As written before me, to mention the connection to the second moment is not an option. Although to mention how the mean was already estimated thereby leaving us with less "data" for the sd - that's important. Regarding the bias of the sd - I remembered encountering it - thanks for driving that point home.
In other words, I interpreted "intuitive" in your question to mean intuitive to you. Thank you for the vote of confidence :. The loose of the degree of freedom for the estimation of the expectancy is one that I was thinking of using in class. But combining it with some of the other answers given in this thread will be useful to me, and I hope others in the future.
Show 3 more comments. You know non-mathers like us can't tell. I did say gradually. Mooncrater 2 2 gold badges 8 8 silver badges 19 19 bronze badges. Any way to sum-up the intuition, or is that not likely to be possible? I'm not sure it's really practical to use this approach with your students unless you adopt it for the entire course though. Mark L. Stone Mark L. Stone I am unhappy to see the downvotes and can only guess that they are responding to the last sentence, which could easily be seen as attacking the O.
Richard Hansen Richard Hansen 1 1 silver badge 3 3 bronze badges. Dilip Sarwate Dilip Sarwate Ben Ben B Student B Student. Even though the equation is interesting, I don't get how it could be used to teach n-1 intuitively? This shows the sleight-of-hand that has occurred: somehow, you need to justify not including such self-pairs. Because they are included in the analogous population definition of variance, this is not an obvious thing.
Vivek Vivek 1 1 silver badge 8 8 bronze badges. Laurent Duval Laurent Duval 2, 1 1 gold badge 19 19 silver badges 33 33 bronze badges. Indeed, you seem to use "sample variance" in the sense of a variance estimator , which is more confusing yet.
Sahil Chaudhary Sahil Chaudhary 4 4 bronze badges. So the value you compute in step 2 will probably be a bit smaller and can't be larger than what it would be if you used the true population mean in step 1.
To make up for this, divide by n-1 rather than n. But why n-1? If you knew the sample mean, and all but one of the values, you could calculate what that last value must be.
Statisticians say there are n-1 degrees of freedom. Statistics books often show two equations to compute the SD, one using n, and the other using n-1, in the denominator. Some calculators have two buttons. The n-1 equation is used in the common situation where you are analyzing a sample of data and wish to make more general conclusions. Well, first of all, we denote it with the Greek letter mu.
And we essentially take every data point in our population. So we take the sum of every data point. So we start at the first data point and we go all the way to the capital Nth data point.
So every data point we add up. So this is the i-th data point, so x sub 1 plus x sub 2 all the way to x sub capital N. And then we divide by the total number of data points we have. Well, how do we calculate the sample mean? Well, the sample mean-- we do a very similar thing with the sample. And we denote it with a x with a bar over it. And that's going to be taking every data point in the sample, so going up to a lower case n, adding them up --so these are the sum of all the data points in our sample-- and then dividing by the number of data points that we actually had.
Now, the other thing that we're trying to calculate for the population, which was a parameter, and then we'll also try to calculate it for the sample and estimate it for the population, was the variance, which was a measure of how dispersed or how much of the data points vary from the mean. So let's write variance right over here. And how do we denote any calculate variance for a population? Well, for population, we'd say that the variance --we use a Greek letter sigma squared-- is equal to-- and you can view it as the mean of the squared distances from the population mean.
But what we do is we take, for each data point, so i equal 1 all the way to n, we take that data point, subtract from it the population mean. So if you want to calculate this, you'd want to figure this out. Well, that's one way to do it. We'll see there's other ways to do it, where you can calculate them at the same time. But the easiest or the most intuitive is to calculate this first, then for each of the data points take the data point and subtract it from that, subtract the mean from that, square it, and then divide by the total number of data points you have.
Now, we get to the interesting part-- sample variance. There's are several ways-- where when people talk about sample variance, there's several tools in their toolkits or there's several ways to calculate it. One way is the biased sample variance, the non unbiased estimator of the population variance.
And that's denoted, usually denoted, by s with a subscript n. And what is the biased estimator, how we calculate it? Well, we would calculate it very similar to how we calculated the variance right over here. But what we would do it for our sample, not our population.
So for every data point in our sample --so we have n of them-- we take that data point. And from it, we subtract our sample mean.
0コメント