We use the F Test to compare the variances (and hence, the standard deviations) of two data sets. We begin by assuming that the variances are statistically same for both data sets. This doesn’t mean that the variances of the two sample data sets we’re looking at must be exactly the same, but rather that they are close enough to each other that we can safely assume that the larger populations from which the samples we taken have essentially the same variances.
So our default hypothesis (written as H0, and referred to as the “null-hypothesis” or “h-naught”) is that variances are statistically the same. Of course, we also must have an alternate hypothesis (written as H1, sometimes called “h-one”). The alternate hypothesis can take one of three forms:
- The variances are not the same (this is the most general form, and the default)
- The variance of Data Set 1 is greater than the variance of Data Set 2
- The variance of Data Set 1 is less than the variance of Data Set 2
Let’s look at a specific example to illustrate how this works. A software sales company has two offices (one in Springfield and one in Fairview) that must each complete a series of standard work steps to close out completed customer sales. They have found that the time required to complete the work varies, and Sales Manager wants to determine if one office is more consistent than the other. The manager gathers historical data from the sales database: for the previous month, they find 17 completed requests processed by the Springfield office and 15 completed requests processed in Fairview.
To run the analysis, the Sales Manager starts by clicks the “Compare data sets” button on the SuperEasyStats ribbon and selects “F Test” from the menu of options. They then see the following dialog box:
Notice that the default hypothesis is fixed. We always assume that the variances are the same unless proven otherwise. Now they must choose they alternate hypothesis. Because the Sales Manager has no particular reason to think that one office has a larger or smaller variance than the other, all they are really trying to determine is of the variances are different. So for their alternate hypothesis they select the default: that the variance of Data Set 1 (“Springfield”) does not equal that of Data Set 2 (“Fairview”).
They won’t be willing to reject the default hypothesis unless they reach a certain confidence level that the variances really are different. The key question they need to answer is: how confident do they need to be?
That’s where alpha comes in. Alpha is a decision rule for rejecting the default hypothesis. If the Sales Manager wants to be 95% confident that the data sets have different variances before rejecting H0, then they need to set alpha to 0.05 (the default value). But if they wanted to be even more confident (say, 99%), they would need to set the alpha to 0.01. In this case the Sales Manager just needs to be 95% sure that the variances are difference before accepting that as the case, so they leave the default value in place.
Once the hypotheses are selected and the alpha is set, the manager clicks the “Create data entry sheet” button. On the data entry sheet, they enter the data collected from the sales database (the numbers represent days required to complete the standard work). Below you can see the data entry sheet as filled in by the Sales Manager.
In the upper left corner the hypotheses are shown. Below that is the alpha setting. In the center are columns where the Sales Manager entered their raw data. To the right of those columns are summary boxes showing the variance and standard deviation of each data set.
They key result is the p-value shown in the box labeled “F Test Results”. Notice that the p-value is 0.032. Since that is less than the Sales Manager’s chosen alpha value (0.05), they have met their threshold. They can be 96.78% certain that the alternate hypothesis is true, and remember, the alternate hypothesis is that the variances between the offices are truly different.
