The confidence interval is an interval that used to estimate population parameters based on the samples using certain statistical methods. In this interval, there is an unknown actual parameter value from a population.
Of course, the results will vary and we need to measure the variability to make a better estimate. Usually, the measure of this variability is called the margin of error.
This value is very important because it determines the confidence interval. Plus or minus the variability values with the used statistic will produce a range of values that contain the parameter. In other words, it called a confidence interval.
For example, a political consulting institute conducted a survey to find out the electability level of presidential candidates (parameter) who will fight for the presidency in general elections using a random sample of 100,000 respondents (statistical method).
About 59 percent of respondents choose “A” candidate, plus or minus 5 percent. It means the percentage of people who vote for the “A” candidate is somewhere between 54 percent and 64 percent.
The confidence interval cannot be separated when using inferential statistics. Take a deep breath for this whole explanation.
- Why Confidence Interval is Very Important?
- Confidence Interval for Population Mean and Proportion
- Confidence Interval for The Difference of Two Means
- Confidence Interval for The Difference of Two Proportions
- How to interpret the Confidence Interval Properly?
Why Confidence Interval is Very Important?
The margin of error a form of tolerance that might occur in statistics. Basically, samples are taken randomly to produce a value that can represent the parameter. But maybe, the measurement will be different if you use another sample taken randomly. Also, this will cause different calculations.
Therefore, confidence intervals are a solution to ensure that the statistics we produce are accurate in a certain range (which is the margin of error) and can describe the population properly.
The purpose of using a confidence interval is to have a margin of error as small as possible. The narrower the interval, the more precise the statistics are.
There are three factors affect the margin of error; the confidence level, the sample size and the variability in the population.
1. Confidence Level
The confidence level is the level of confidence in a range of confidence intervals. The most common confidence level for statistical measurement is 90%, 95%, and 99 %. You can also use the percentage you want but these are what statisticians recommend the most.
In calculating the means and proportions, the confidence interval is obtained from the subtraction and addition of standard errors (whose values are obtained from the confidence level) to the standard normal distribution value (denoted by z*). The higher the confidence level, the greater the standard error required.
2. Sample Size and The Factor of Confidence Interval
The sample size and margin of error have the opposite relationship. If the sample size increases, the margin of error decreases. It explains that the bigger sample we use, the more accurate the results are going to be.
To make an efficient and proper sample size, you need a certain margin of error before you start to research. You can use this formula, the get a better sample size!
Do not forget to round up your results to find the exact numbers of sample sizes. No matter what the decimal is, always round it up. For example, the sample size is 200.04, then the sample you need is 201 units.
When using a sample or percentage statistic, there is a different way to get the figure the margin of error. Take 1 and divide by square root of n (the sample size). I know it’s kind of quick and tricky. But this is allowed.
Example: There are 100 million young people in the United States. If we’d like to have a narrow confidence interval, how many proper samples do we need?
Answer: If you want to put 200 samples, then you’ll find:
It means the margin of error is 0.07 or 7 percent. We have to plus or minus the statistic value with 0.07, is quite large as an error. Now, try to have 2000 samples.
It means the margin of error is 0.02 or just about 2percent. We have to plus or minus the statistic value with 0.02. I think it gives better accuracy when you think the population size is so huge.
Of course, increasing the number of samples will have a positive effect on your estimation. But, do not forget, high sample means high cost, lot of time, and more strength to prepare. Choose the best confidence level that suits your condition so you will not be exhausted by it.
One more, there is something called nonsampling error. It means, the more sample you have, the more bias it would be. There is a possibility of consciously or unconsciously error in the data collecting process. Choose wisely. I thought 10 percent is already good enough for smallest margin of error. What do you think?
Variability (standard deviation) also takes parts in sample size. Take a look at the formula below :
Based on the formula, standard error has an influence on variability. Increasing variability will affect the increase of standard error. But, this increase can be offset by increasing the number of samples.
Confidence Interval for Population Mean and Proportion
When numerical data (such as height, weight, income, price) is being measured, people often want to estimate the average or mean for the population. It’s simple! Just estimate it by using the sample mean plus or minus a margin of error. The result is the confidence interval for a population mean ( ).
The formula for a Confidence Interval (CI) for a population mean is :
If the sample is too small (less than 30). We have to use t-distribution with n-1 degrees of freedom instead of z*
Example: Suppose we are conducting a household income survey for 10,000 respondents. One result indicator is the poverty rate. Based on our research, we found there are 3000 poor people in the city. Assume the standard deviation is 500. With a 95 percent confidence level, describe the confidence interval poverty rate?
Conclusion: With a confidence level of 95 percent, the poverty rate in the city is about 2920 to 3009 people.
Now, let’s talk about the confidence interval for categorical data.
Example of categorical data is opinion, preferences, habits, etc. Usually, we estimate it by making proportion by certain characteristic or criteria. For example, the percentage of people who like spicy or no, the proportion of early morning workers, and many others.
The goal is to estimate the population proportion using a sample proportion plus and minus a margin of error.
The formula for population proportion (p) is :
Example: suppose we want to estimate the percentage of customers who likes spicy noodle. Let say, we want a 95 percent confidence interval.
We have 100 random samples as the respondent of a customer satisfaction survey. There are 44 customers who like spicy noodles. Find the confidence interval of customers who like spicy noodles!
Conclusion: With a confidence level of 95 percent, the percentage of customers who like spicy noodles is about 0.39 to 0.49 percent.
Confidence Interval for The Difference of Two Means
Sometimes, the purpose of the survey or research is to compare two different populations. We can compare males and females, businessmen and employees, and others.
If we are comparing numerical data of two populations, the mean takes part in it. For example, we may compare the mean of different height between the rich kid and poor kid, the difference in average income between male and female, etc.
We estimate the difference between two population means by taking a sample from each population and using the difference between the two sample means, and then subtract or add it with a margin of error. The result is a confidence interval for the difference between two population means.
The formula is:
Suppose we will compare the weight of children who drink milk regularly with children who never drink milk. Based on 100 samples, children who drink milk regularly have an average bodyweight of 52kg with a population standard deviation of 4kg.
Meanwhile, based on 120 children who never drank milk, the average body weight was 47kg with a standard deviation of 3kg. With a confidence level of 95 percent, what is the value of the confidence interval between children who drink milk regularly and children who never drink milk?
x = children who drink milk
y = children who never drink milk
Conclusion: With a confidence level of 95 percent, the difference in the weight of children who drink milk regularly with children who never drink milk is about 9.32 kg to 10.28 kg.
Confidence Interval for The Difference of Two Proportions
In this case, we are comparing two categorical variables (such as comparing male to females regarding opinion about the presidential election) and estimate the difference between two population proportions.
We do this by taking the difference in their own sample proportions from each population, plus and minus it with a margin of error. The result is the confidence interval for the difference between two population proportions.
The formula is:
Example: Suppose from 120 samples, there is 0.4 percent of woman who likes math. Also, by 150 samples, there are 0.3 man who likes math. With a 95 percent confidence level, please find the confidence interval for the difference of two proportions!
Conclusion: With a confidence level of 95 percent, the different proportions between the man who likes math and woman who likes math is about 1.90 percent to 2.02 percent.
Note: You can finish the calculations by using the decimal form. But, you can use a percentage form to make the result is easier to understand.
How to interpret the Confidence Interval Properly?
The main purpose of the confidence interval presents a range of values containing the population parameter. Why? Because exactly there is no one really know what the population parameter is.
If we are using different groups of the sample by the same population, we might get the different values of the parameter.
For example, we are measuring the mean of student’s height in a school.
Based on the 100 samples, there is no guarantee that all random samples that you take will represent the population.
Maybe, you take the short one too much, or the tall student too much, perhaps. Random and independent samples make everything possible, even affecting the statistical process.
Sometimes, it will make a bias estimation. Moreover, the statistics do not contain the parameter values from the population.
These are the benefits of confidence intervals. Suppose the resulting statistics are too far from the parameters, there is at least 5 percent of those values that are close to the parameters (in case we are using 95 percent of confidence level).
Or, you can imagine the confidence interval like this story. Consider that you are randomly taking 100 samples of student’s scores over and over again.
Then, you made a confidence interval from its results each time. You are pretty sure that 95 percent of those intervals would be right. And you just hope that one interval you have made is one of them, and it contains the population parameter.
So, it would be wise to every conclusion that you have made always say “With 95 percent (or whatever the alpha you want) confidence level, I can say that xxx of xxx (fill it with your own conclusion).
Making a confidence interval is one of the most important statistical processes. It represents how we conduct a survey, and disseminate it to the people. Statistics without The main factor that affects the confidence interval is sample selection.
Choosing the right (of course random and independent, also) sample size will make your estimation contain the parameter value. So, be careful of it!