F test: Theory, Solved Example and Demonstration in Agri Analyze

 The blog discuss in details about theory of F test, its use cases, solved example (manually) and a demonstration using online tool Agri Analyze (Reading time 10 min) 

Introduction

The F-test is a statistical method used to compare the variances of two samples or the ratio of variances across multiple samples. It assesses whether the data follow an F-distribution under the null hypothesis, assuming standard conditions for the error term (ε). The test statistic, denoted as F, is commonly used to compare fitted models to determine which best represents the underlying population. F-tests are frequently employed in models fitted using least squares. The test is named after Ronald Fisher, who introduced the concept as the "variance ratio" in the 1920s, with George W. Snedecor later naming the test in Fisher’s honor.

Definition

An F-test uses the F-statistic to evaluate whether the variances of two samples (or populations) are equal. The test assumes that the population follows an F-distribution and that the samples are independent. If the F-test yields statistically significant results, the null hypothesis of equal variances is rejected; otherwise, it is not.

Use of F-Test in Statistics

The F-test is a statistical tool used to compare variances and determine if there are significant differences between two populations or samples. It is commonly applied in regression analysis, statistical inference, model fitting, and analysis of variance (ANOVA) to identify the best-fitting statistical model or assess differences across groups.

  • Regression analysis
  • Statistical inference
  • Model fitting
  • Analysis of variance (ANOVA)

Assumptions

  • Independence: The observations within each group must be independent, meaning there should be no relationship between observations across samples.
  • Normality: Data in each group should follow a normal distribution. For large sample sizes, this assumption can be relaxed based on the Central Limit Theorem.
  • Homogeneity of variances: The variances across groups being compared should be approximately equal.

Important Notes on the F-Test

  • The F-test is used to assess whether the variances of two populations are equal by comparing them using an F distribution.
  • The F-test statistic is calculated as F=σ12σ22F = \frac{\sigma_1^2}{\sigma_2^2}, where σ12\sigma_1^2 and σ22\sigma_2^2 are the sample variances.
  • The null hypothesis is evaluated using a critical value, which determines whether to reject the hypothesis.
  • A common application of the F-test is the one-way ANOVA, which assesses variability between group means and within group observations.

Selection Criteria for σ₁² and σ₂² in an F-Test

  • For a right-tailed or two-tailed F-test, the variance with the greater value is placed in the numerator, making the sample corresponding to σ₁² the first sample. The smaller variance (σ₂²) is the denominator for the second sample.
  • For a left-tailed test, the smaller variance is in the numerator (sample 1), while the larger variance is in the denominator (sample 2).

Hypotheses
Left-Tailed Test:

  • Null Hypothesis (H₀): σ₁² = σ₂²
  • Alternative Hypothesis (H₁): σ₁² < σ₂²
  • Decision Criteria: Reject H₀ if the F-statistic < F-critical value.

Right-Tailed Test:

  • Null Hypothesis (H₀): σ₁² = σ₂²
  • Alternative Hypothesis (H₁): σ₁² > σ₂²
  • Decision Criteria: Reject H₀ if the F-statistic > F-critical value.

Two-Tailed Test:

  • Null Hypothesis (H₀): σ₁² = σ₂²
  • Alternative Hypothesis (H₁): σ₁² ≠ σ₂²

Procedure for Conducting an F-Test:

  1. Define Hypotheses

    • Null Hypothesis (H₀): The variances of the groups are equal.
    • Alternative Hypothesis (H₁): The variances of the groups are not equal.
  2. Collect Data
    Gather sample data from the groups being compared.

  3. Calculate Sample Variances
    For each group, compute the sample variance (S²) using the formula:

    S2=(xixˉ)2n1S^2 = \frac{\sum (x_i - \bar{x})^2}{n - 1}

    where xix_i represents the observations, xˉ\bar{x} is the sample mean, and nn is the sample size.

  4. Calculate F-Statistic
    Compute the F-statistic as follows:

    F=S12S22F = \frac{S_1^2}{S_2^2}

    where S12S_1^2 is the larger variance and S22S_2^2 is the smaller variance.

  5. Determine Degrees of Freedom
    Calculate degrees of freedom for each group:

    df1=n11(numerator)\text{df}_1 = n_1 - 1 \quad \text{(numerator)}
       df2=n21(denominator)\text{df}_2 = n_2 - 1 \quad \text{(denominator)}
  6. Find the Critical Value
    Using an F-distribution table, locate the critical value for your chosen significance level (e.g., α=0.05) based on df1 and df2.

  7. Make a Decision

    • If F>Critical ValueF > \text{Critical Value}, reject the null hypothesis, indicating significant differences in variances.
    • If FCritical ValueF \leq \text{Critical Value}, fail to reject the null hypothesis, suggesting no significant variance differences.
  8. Conclusion

    • Reject the null hypothesis if FF exceeds the critical value, indicating significant variance between groups.
    • Fail to reject the null hypothesis if FF is less than or equal to the critical value, implying insufficient evidence for variance differences.

Example: -

Life expectancy in 9 regions of Brazil in 1900 and 11 regions of Brazil in 1970 was as given in the table below:

Region

Life expectancy (year)

1900

1970

1

42.7

54.2

2

43.7

50.4

3

34.0

44.2

4

39.2

49.7

5

46.1

55.4

6

48.7

57.0

7

49.4

58.2

8

45.9

56.6

9

55.3

61.9

10

 

57.5

11

 

53.4

We aim to determine whether the variation in life expectancy across different regions in 1900 and 1970 is the same. Assuming the populations in 1900 and 1970 follow normal distributions, N(μ1,σ12)N(\mu_1, \sigma_1^2) and N(μ2,σ22)N(\mu_2, \sigma_2^2), the hypotheses can be formulated as:

  • Null Hypothesis H0H_0: σ12=σ22\sigma_1^2 = \sigma_2^2 (the variances are equal)
  • Alternative Hypothesis H1H_1: σ12σ22\sigma_1^2 \neq \sigma_2^2 (the variances are different)

The F-test is applied to evaluate these hypotheses.

  1. Calculate Sample Variances:

    S12=18(i=19x1i2(i=19x1i)29)S_1^2 = \frac{1}{8} \left( \sum_{i=1}^9 x_{1i}^2 - \frac{\left( \sum_{i=1}^9 x_{1i} \right)^2}{9} \right)S12=18(18527.7840529)=302.788=37.848S_1^2 = \frac{1}{8} \left( 18527.78 - \frac{405^2}{9} \right) = \frac{302.78}{8} = 37.848
    S22=110(j=111x2j2(j=111x2j)211)S_2^2 = \frac{1}{10} \left( \sum_{j=1}^{11} x_{2j}^2 - \frac{\left( \sum_{j=1}^{11} x_{2j} \right)^2}{11} \right) S22=110(32799.91598.5211)=236.0710=23.607S_2^2 = \frac{1}{10} \left( 32799.91 - \frac{598.5^2}{11} \right) = \frac{236.07}{10} = 23.607
  2. Calculate the F-Statistic:

    F=S12S22=37.84823.607=1.603F = \frac{S_1^2}{S_2^2} = \frac{37.848}{23.607} = 1.603
  3. Conclusion: The critical values from the F-distribution table at α=0.05\alpha = 0.05 for a two-tailed test with degrees of freedom (8, 10) are F0.025=3.85F_{0.025} = 3.85 and F0.975=0.233F_{0.975} = 0.233. Since the calculated F-value (1.603) is less than 3.85 and greater than 0.233, we fail to reject the null hypothesis. This indicates that there is no significant difference in the variances of life expectancy between 1900 and 1970 across the regions of Brazil.   

F test Demonstration in Agri Analyze


Video Demo for F-test in Agri Analyze

Sample Data File: The data is same as shown in the above example. File Link
Step1: Prepare data file and save in a csv format

Step2: Register on Agri Analyze (Only first time) Link

Step3: Go to Analytical Tool -> Hypothesis testing -> F-test Link


Step4: 
  • Upload the file 
  • Select level of significance (for 5% write 0.05) 
  • Variable name: Life Expectancy
  • Category Type: Time Frame
Step5: Click on Submit and Download

Output





Other Related Topics:




The blog is written with great effort and due research by Uttam Baladaniya (PhD Scholar, Department of Agricultural Statistics, Anand Agricultural University)


Comments

Popular posts from this blog

Estimation of Genetic Parameters in Plant Breeding: Theory, Example and Demonstration in AgriAnalyze tool

RCBD analysis in R along with LSD and DNMRT test