The Student’s t-distribution: Small Sample Solution

Student’s t-distribution (t-distribution) is a well-known sampling distribution that allows us to do the statistical process in limited conditions. It helps when we found a problem to determine the parameter value.

Suppose we are about to draw a random sample of a set of observations from the normally distributed population.

One problem that often arises is that we do not know the true value of the population standard deviation. Therefore, we cannot calculate the data in the normal distribution distribution.

Is the data we use still worth testing if it does not has the standard deviation of parameter?

What if our research uses only a small sample so that it is difficult to meet the assumption of a normal distribution?

Could we do hypothesis testing or even create a confidence interval?

Keep reading!

Did you know that t-distribution was founded to control the quality of beer production?

What is student’s t-distribution?

The student’s t-distribution (t-distribution) is a probability distribution used for statistical testing with relatively small sample conditions.

The t distribution appears to complete various statistical tests to estimate unknown parameters (such as standard deviation of the population).

By using t-distribution, we can use the sample distribution (which is statistic) to change the standard deviation of the population (which is a parameter).

But, there is the risk!

Because sample have different sets of values, the standard deviation value also tends to always change and get wider from sample to sample.

If the standard deviation is known, researchers will tend to use the normal distribution as a more valid approach.

The t-distribution is used in small-sample statistics. In this case, expertise agreed to vote that small sample size is under 30 units. It is also coherent with Central Limit Theorem.

There are other opinions, but this is the most used one. This distribution is very interesting because it can systematically explain scientific phenomena using small samples.

The t-distribution is another family of the normal distribution that looks similar to it but shorter and flatter.

Smaller sample size means flatter curve and the larger the sample size is, the more the t-distribution looks like a standard normal distribution (Z-distribution).

Confidence intervals and hypothesis testing are the two types of statistical processes that are most often used in testing the t distribution.

History of t-distribution

T-distribution found by William Sealy Gosset. He was born on June 13, 1876 in Catenbury, England. Gosset studied at Winchester College before studying chemistry and mathematics at New College, Oxford.

After graduating in 1899, he joined the Arthur Guinness and Son brewery. Here, Gosset learned a lot and applied the statistics he had.

In fact, William Goset, Fisher and Pearson had a good relationship. Gosset also often conducts joint research with Pearson. In 1906-1907, Gosset decided to conduct some research at Pearson’s laboratory in London.

Gosset often conducts research using a small number of samples. Unfortunately, the use of this small sample number does not appeal to Pearson. Gosset published a pair of papers in Biometric in 1908. He did it using pseudonym “student”.

Previously, this distribution was used to calculate the quality of beer production in Guinness. Guinness was afraid that the t-test was can be used to control of beer quality.

That’s why the formula also is known as “student’s t-distribution. Fisher appreciated Gosset’s discovery in compiling formulas in statistical tests for these small samples.

It was he who later introduced the t-student statistical formula to the world because it was considered a breakthrough in the world of research.

Characteristic of student’s t-distribution

Well, the t-distribution is similar to Z-distribution. We can find the exact value by looking at every observation point. Just like when we transform Z-score into Z-value.

But, it has a different shape. It has a basic bell shape with an area of 1 under it. Like the standard normal distribution, the mean is zero, but its standard deviation is proportionally larger compared to the Z-distribution.

Every number of samples of t-distribution has a unique shape. There are tables for samples of 2, 3 until the number of samples reaches 29.

The greater the number of samples, the t-distribution will be closer to the shape and value of the normal distribution.

Take a look at the chart below!

Each distribution t has degrees of freedom. This affects the critical value in each distribution. The degrees of freedom is determined by the number of samples.

Mathematically, the degrees of freedom is n-1. If you have 5 samples, you use a t-distribution with 4 degrees of freedom, denoted.t4.

The formula of the t-distribution

There are 2 common formula of t-distribution. It can be modified as long according to the case.

1. If the standard deviation of the population is known. The t-formula:

t-formula-of-unknown-standard-deviation

2. If the standard deviation of the population is unknown, with the assumption the sample has a standard normal distribution (i.e. normal distribution with expected mean value =0 and variance value=1):

t-formula-of-known-standard-deviation
standard-deviation-of-sample-and-population

Finding the critical value and p-value of t-distribution

P-value is the probability of the result of the test statistic. It has different value for each distribution. The same test statistic probably has a larger p-value on a t-distribution than Z-distribution.

This is because the distribution tail area of t-distribution is fatter than the Z distribution. That is the risk of using small samples.

P-value could be found using t-distribution table. Remember, use n-1 degrees of freedom.

students-t-distribution-table

How to read the t-distribution table:

  1. Determine the degree of freedom
  2. Determine the level of significance
  3. Look at the rows and columns in the t-distribution table

For example, you are using 20 samples and a significance level of 5 %. If you are using two-tailed test, the t-value is = t(0.025,19)= 2,0932

Sometimes, p-value in t distribution called t-value. It has the same meaning.

Distribution t also has a value or critical area. If the p-value or test statistic fall inside the critical area, it means we successfully reject the null hypothesis.

If not, we fail to reject the null hypothesis. It does not mean that the null hypothesis was true. It’s just we do not have any proof or lack of sample to reject it.

The t-distribution and hypothesis tests

In general, hypothesis testing procedures for distribution t and z distribution are almost the same. It’s just that the t distribution is specifically used for the number of samples of less than 30 units.

Follow these steps below to do a t-distribution hypothesis test:

1. Define the null hypothesis and alternative hypothesis

2. Define the significant level. You can use 1 percent, 5 percent, or other value. Check the critical value at the t-student’s table. Define the rejection condition!

3. Find the statistic test by using the t-test formula.

4. Find the p-value and determine whether the p-value falls within the rejection area or not

5. Make a decision, whether reject null hypothesis or fail to reject the hypothesis. Then, make the conclusion.

The student’s t-distribution example

There are 3 types of cases that are usually solved in the t distribution:

1. One sample t-test

2. Paired sample t-test

3 One sample independent t-test

In this article, we would like to discuss the one sample t-test!

There is an opinion saying that the average IQ needed to be able to continue studying at a top university is 140. A student is interested in conducting research on this matter.

Based on the survey results using 25 samples, the average IQ of students is 135. Assuming the standard deviation of the sample is 5.

Do the hypothesis test (The significance level is 5 percent)!

Answer:

1. Define the null hypothesis and alternative hypothesis

2. Significance level (alpha) = 5 %

significance-level-of-t-distribution

3. Find the statistic test by using the t-test formula.

statistic-test-by-using-the-t-test-formula

4. P-value (t-value) = -5

5. Make the decision!

decision-making-of-hypothesis-testing

It means, an opinion saying that the average IQ needed to be able to continue studying at a top university is 140 is not true. Because the p-value falls in the lower area, we also can conclude that students who have IQ under 140 also have the opportunity to study at a top university.

Confidence interval in student’s-distribution

Can we make a confidence interval in student’s t distribution?

The answer is YES. The t distribution can also be used to create confidence intervals such as distribution Z. If the number of samples is small and the standard deviation of the population is unknown, we can use values from t-table to help calculate confidence intervals.

For example, let’s use the example above. As is known, the average sample value of the student IQ is 135. With a confidence level of 95 percent, then make a confidence interval.

Now, let we try to solve it!

confidence-interval-of-students-t-distribution

With a confidence level of 95 percent, it can be concluded that the average IQ of students studying at top universities is between 132.93 and 137.06.

Summary

The student’s t-distribution is a statistical method used in testing small samples and the population standard deviation is unknown.

With the t distribution, researchers can still do hypothesis testing and determine confidence intervals.

There are 3 general cases in the t-distribution:

1. One sample t-test

2. Paired sample t-test

3. One sample independent t-test

Software such as SPSS is very helpful to do the calculation, use that!

The t-Distribution is an interesting discovery that helps a lot of researchers in solving cases. But, in my opinion, don’t depend too much on this method. In theory, the more the number of samples, the better the results of the estimation and the test statistics produced.

Use the distribution t if the number of samples and conditions that are available are not possible to get a large number of samples.

Do not hesitate to leave a comment!