What is hypothesis testing in data science?

Hypothesis testing is a statistical technique used to evaluate hypotheses about a population based on sample data. In data science, hypothesis testing is an essential tool used to make inferences about the population based on a representative sample. In this blog, we will discuss the key aspects of hypothesis testing, including null hypothesis, alternate hypothesis, significance level, type I and type II errors, p-value, region of acceptance, typical steps involved in p-value approach, and key terms around type I error and type II error.

Table of Contents

What is hypothesis testing in data science or AI or ML or Statistics?

Hypothesis testing is a statistical technique used to test whether a claim or hypothesis about a population is true or not. Imagine that you have a hypothesis that eating breakfast helps students perform better in school. To test this hypothesis, you would collect data from a sample of students and compare their performance in school to whether they ate breakfast or not. You would then compare the results to what would be expected by chance, assuming the null hypothesis is true, which means there is no difference in performance between those who ate breakfast and those who didn’t. If the results show a statistically significant difference, you can reject the null hypothesis and accept the alternative hypothesis, which in this case is that eating breakfast does, in fact, help students perform better in school. Hypothesis testing is important in many fields, including science and medicine, to help make decisions based on empirical evidence.

Hypothesis testing is a statistical tool that helps to test assumptions or claims about a population based on a sample of data. The process of hypothesis testing involves formulating a null hypothesis and an alternative hypothesis, selecting an appropriate test statistic, choosing a level of significance, calculating the p-value, and making a decision whether to reject or fail to reject the null hypothesis based on the p-value. Hypothesis testing is widely used in various fields, such as social sciences, medicine, engineering, and economics, to evaluate research questions and determine the statistical significance of results. It is a crucial component of data analysis that can help to make sound decisions based on empirical evidence.

What is hypothesis testing in data science? — Photo by Tara Winstead on Pexels.com

What are the null hypothesis and alternative hypothesis with examples?

Hypothesis testing involves comparing two statements, the null hypothesis and the alternative hypothesis. The null hypothesis is the starting assumption we make about a population, which suggests there is no difference or relationship between two or more variables being tested. For instance, if we want to test whether there is a difference in the average height between boys and girls, the null hypothesis would state that there is no difference in height between the two groups.

On the other hand, the alternative hypothesis is a statement that assumes there is a difference or relationship between the variables being tested. In this example, the alternative hypothesis would propose that there is a difference in height between boys and girls.

To evaluate which hypothesis to accept or reject, we collect data from a sample and compare it to what would be expected by chance, assuming the null hypothesis is correct. If the results reveal a statistically significant difference, we reject the null hypothesis and accept the alternative hypothesis. However, if there is insufficient evidence to dismiss the null hypothesis, we fail to reject it. To sum up, the null hypothesis is the original assumption we test, and the alternative hypothesis is the statement we attempt to validate.

The null hypothesis is a statement of no effect or no difference between two groups. This statement is usually the starting point in hypothesis testing, and we assume it to be true. For example, if we want to test the efficacy of a new drug, the null hypothesis would state that the drug has no effect.

The alternate hypothesis is the opposite of the null hypothesis. It is the statement we want to test, and it suggests that there is an effect or difference between two groups. For example, if we want to test the efficacy of a new drug, the alternate hypothesis would suggest that the drug is effective.

Here are some more examples to help illustrate the concept of null and alternative hypotheses:

Example 1: Null hypothesis: There is no difference in exam scores between students who study alone versus students who study in groups. Alternative hypothesis: Students who study in groups score higher on exams than students who study alone.

Example 2: Null hypothesis: There is no difference in sales between two different advertising campaigns. Alternative hypothesis: One advertising campaign leads to more sales than the other.

Example 3: Null hypothesis: There is no relationship between the amount of sleep a person gets and their cognitive performance. Alternative hypothesis: The amount of sleep a person gets is positively correlated with their cognitive performance.

What is the significance level in hypothesis testing with examples?

In hypothesis testing, the significance level is a number that we pick before we do a test. It helps us know how sure we are about our results. It’s like saying, “I’m only going to say something is true if I’m really sure.”

For example, if we’re trying to find out if girls are taller than boys on average, we might set a significance level of 0.05. This means we’re only going to say girls are taller if we’re at least 95% sure that’s true. If our test gives us a result that is less than 95% sure, we won’t say for sure that girls are taller.

The significance level is like a safety net to make sure we’re not jumping to conclusions without enough evidence. It helps us be more careful and confident about what we say is true based on the data we have.

In hypothesis testing, the significance level is the probability threshold that we set for rejecting the null hypothesis. The significance level, also known as alpha (α), is usually chosen before the test is conducted and is typically set at 0.05, meaning that we are willing to accept a 5% chance of making a type I error, which is the probability of rejecting the null hypothesis when it is actually true.

For example, let’s say we’re testing the hypothesis that boys and girls have different average heights. We would start by setting a significance level of 0.05. If the results of our study indicate that there is a statistically significant difference in height between boys and girls, with a p-value less than 0.05, we would reject the null hypothesis and conclude that there is a significant difference in height between the two groups. However, if the p-value is greater than 0.05, we would fail to reject the null hypothesis and conclude that there is insufficient evidence to support the claim that there is a difference in height between boys and girls.

In summary, the significance level is the probability threshold we set for rejecting the null hypothesis, and it helps us determine how confident we are in our conclusions based on the data we have collected. It is a critical component of hypothesis testing and helps us make informed decisions based on empirical evidence.

The significance level is the probability of making a type I error, or the probability of rejecting the null hypothesis when it is actually true. It is denoted by alpha (α) and is typically set to 0.05 or 0.01.

What are Type I and Type II Errors in hypothesis testing with examples?

When we do a test to see if something is true or not, sometimes we can make mistakes. There are two kinds of mistakes we can make:

The first mistake is called a Type I error. This happens when we say something is true, but it’s really not true. It’s like thinking you found a diamond in the sand, but it’s just a piece of glass.

The second mistake is called a Type II error. This happens when we say something is not true, but it’s actually true. It’s like thinking there are no more cookies in the cookie jar, but there are still some left.

We use some special words to describe these mistakes. Type I errors are also called false positives, and Type II errors are also called false negatives.

To make things even more confusing, we use a letter called beta (β) to talk about the chance of making a Type II error. But don’t worry too much about that for now. Just remember that sometimes we can make mistakes when we do tests, and we have special names for those mistakes.

Type I error occurs when we reject the null hypothesis when it is actually true. It is also known as a false positive. Type II error occurs when we fail to reject the null hypothesis when it is actually false. It is also known as a false negative. The probability of making a type II error is denoted by beta (β).

We also have something called power. Power is like the opposite of a false negative. It’s the chance that we’ll figure out something is true, if it really is true.

Another thing we think about is the sample size. That means how many things we’re looking at to figure out if something is true or not.

And finally, we think about the effect size. That’s how big the difference is between two things we’re looking at. If the difference is really big, we might be more likely to find out if something is true or not.

So basically, when we do hypothesis testing, we try to figure out if something is true or not. But sometimes we make mistakes. We also think about how many things we’re looking at, how big the difference is between them, and how likely we are to figure out if something is true if it really is true.

What is the p-value in hypothesis testing with examples?

When we do a test to see if something is true or not, we get a number called the p-value. The p-value tells us how likely it is that we got the result we did just by chance.

Here’s an example: Let’s say we’re trying to find out if cats are smarter than dogs. We do a test and get a p-value of 0.03. This means there’s only a 3% chance that we got our result just by chance.

We also have something called a significance level, which is like a safety net to make sure we’re not jumping to conclusions without enough evidence. It’s like saying, “I’m only going to say something is true if I’m really sure.”

If the p-value is less than the significance level, we can say that we’re really sure our result is true, and we reject the idea that cats and dogs are equally smart. But if the p-value is greater than the significance level, we don’t have enough evidence to say for sure that cats are smarter than dogs.

So the p-value is a number that helps us decide if our test is giving us good evidence or if it’s just a fluke.

The p-value is the probability of obtaining a test statistic as extreme as the observed one, assuming that the null hypothesis is true. If the p-value is less than the significance level, we reject the null hypothesis. If the p-value is greater than the significance level, we fail to reject the null hypothesis.

What is the Region of Acceptance in hypothesis testing with examples?

When we do a test to see if something is true or not, we use a number called the test statistic. This number tells us how different our result is from what we would expect if the null hypothesis were true.

Sometimes, we get a test statistic that is in a certain range, and we say that it’s not different enough from what we would expect to be sure that our result is true. This range is called the region of acceptance.

For example, let’s say we’re trying to find out if a new medicine helps people sleep better. We do a test and get a test statistic of 1.2. We look at our chart and see that the region of acceptance is from -1.96 to 1.96. This means that our test statistic of 1.2 is in this range, so we can’t be sure that the medicine really works.

The region of acceptance is like a zone of uncertainty. It’s saying, “We’re not sure if this result is really different from what we would expect.”

The complement of the region of acceptance is called the critical region. This is the range of values that is different enough from what we would expect that we can be pretty sure our result is true.

So the region of acceptance is just the opposite of the critical region. If our test statistic is in the region of acceptance, we can’t say for sure if our result is true or not. But if it’s in the critical region, we can be pretty sure that it is.

The region of acceptance is the range of values of the test statistic that leads to failing to reject the null hypothesis. It is the complement of the critical region.

Typical Steps Involved in P-Value Approach

What are the 5 steps in hypothesis testing?

When we do a hypothesis test using the p-value approach, there are some typical steps we follow to find out if our hypothesis is true or not. Here are the steps we usually follow:

We start by stating what we think is true (the null hypothesis) and what we think might be true instead (the alternate hypothesis).
Then we pick a number called the test statistic. This number helps us figure out if our result is different enough from what we would expect to be sure it’s true.
Next, we calculate something called the p-value. This is like a score that tells us how likely it is that our result is true, based on the test statistic.
After that, we compare the p-value to another number called the significance level. If the p-value is smaller than the significance level, it means our result is very unlikely to be a coincidence, so we can say it’s true. If the p-value is bigger than the significance level, it means our result might just be a coincidence, so we can’t be sure it’s true.
Finally, we make a decision about whether our hypothesis is true or not, based on the comparison we made in step 4. We also explain what our results mean in real-life terms.

So basically, when we do a hypothesis test using the p-value approach, we follow these steps to figure out if our hypothesis is true or not.

Conclusion

To summarize, hypothesis testing is a critical technique in data science that helps us draw conclusions about the population based on sample data. By setting up the null and alternate hypotheses, selecting an appropriate significance level, and computing the p-value, we can make informed judgments about whether to accept or reject the null hypothesis. It’s essential to recognize the kinds of errors that can happen during hypothesis testing and to choose a suitable sample size and effect size to reduce these errors.

Check out the table of contents for Product Management and Data Science to explore those topics.

Curious about how product managers can utilize Bhagwad Gita’s principles to tackle difficulties? Give this super short book a shot. This will certainly support my work.