Hypothesis testing is an inferential statistical technique that determines if a certain condition is true for the population. Hypothesis test studies two opposing hypotheses about a population.
Alternative hypothesis (H1)
- A statement that has to be concluded as true.
- It’s a research hypothesis
- It needs significant evidence to support the initial hypothesis or assumption.
- If the alternative hypothesis garners strong evidence, reject the null hypothesis.
Null hypothesis (H0)
- A statement of “no effect” or “no difference”.
- It’s the logical opposite of the alternative hypothesis.
- It indicates that the alternative hypothesis is incorrect.
- Weak evidence of alternative hypotheses indicates that the null hypothesis has to be accepted.
Hypothesis Testing – Error Types
The below table explains how a decision can be made based upon the hypothesis testing.
Type I Error – Rejects the null hypothesis when it is true. The probability of making a Type I error is represented by alpha.
Type II Error – Fails to reject the null hypothesis, when it is false. The probability of making type II error is represented by beta.
Probability Value – The probability of observing extreme values or more extreme than the one observed. Calculated from collected data.
Hypothesis Testing – Process
There are four steps to the hypothesis testing process: –
Step 1: – The first step is to set the hypothesis. The hypothesis could be null or alternative. The null hypothesis or H0 states that a population parameter is equal to a value. The alternative hypothesis or H1 states that the population parameter is different from the value of the population parameter in the null hypothesis. The alternative hypothesis is what is believed to be true or is to be proven true.
Step 2: – The next step is to set alpha or choose a significant level for the population.
Step 3: – The third step is to collect the sample from the population, which represents the characteristics of the population.
Step 4: – The final step is to compare the P-value and alpha.
Note: – We can reject the null hypothesis if P-value is less than alpha and fail to reject the null hypothesis if the P-value is greater than or equal to alpha.
Perform Hypothesis Testing
This example shows how clinical trials can be analyzed. Suppose a pharmaceutical company wants to compare a medicine at manufacturers with that of a competitor’s medicine. Then hypothesis testing can be a method it adopts.
The null hypothesis would be that both the medicines are equally effective.
The alternative hypothesis would be that the two medicines are not equally effective.
There are three types of data on which you can perform hypothesis testing.
- Continuous data – It evaluates the mean, median, standard deviation or variation. If you take the same example to test the efficacy of medicine. Take the temperature of every person in the sample after three hours of administering the medicine. This would be referred to as continuous data.
- Binomial data – It evaluates the percentage, general classification of data. When data is divided into two categories obtained by normal data. Supposing the same population were asked if they’re fever had subsided and the answers could be yes or no. Then the percentage who said yes should match the null hypothesis.
- Poisson data – It evaluates the rate of occurrence or frequency. If the sample is asked about how many times in a month they use the medicine, the rate of frequency is recorded and is then compared to the null hypothesis where the rate should be less than a certain number.
Types of variables
Let’s understand the different types of variables to analyze categorical data.
Nominal Variables have values with no logical ordering. They are independent of each other. Sequence does not matter.
Ordinal Variables have values in logical order. However, the relative distance between the two data values is not clear.
Association indicates that two variables are associated or independent of each other. For example, in the first dataset weather does not affect the train schedule, but in the second dataset it does.
Variables have dependencies and one changes if the other changes.
Chi-Square Test
It is a hypothesis test that compares the observed distribution of your data to an expected distribution of data. The test is applied usually when there are two categorical variables from a single population. It’s used to determine whether there is a significant association between the two variables.
Chi-square test is used to test the independence or association between categorical variables.
Test of Association:
To determine whether one variable is associated with a different variable. For example, determine whether the sales for different cell phones depends on the city or country where they are sold.
Test of Independence:
To determine whether the observed value of one variable depends on the observed value of a different variable. For example, determine whether the color of the car that a person chooses is independent of the person’s gender.
Correlation Matrix
A correlation matrix is a square matrix that compares a large number of variables. A correlation matrix is expressed in the form of “nxn” matrix when we compare “n” variables. The covariance is calculated by getting the sample variance between the variables in the question.
A correlation coefficient measures the extent to which two variables tend to change together. The coefficient describes both the strength and the direction of the relationship.
Pearson Product Moment Correlation: It evaluates the linear relationship between two continuous variables. A relationship is linear when a change in one variable is associated with a proportional change in the other variable.
Spearman rank order correlation: It evaluates the monotonic relationship between two continuous or ordinal variables. In a monotonic relationship, the variables tend to change together though not necessarily at a constant rate. The spearman correlation coefficient is based on the ranked values for each variable rather than the raw data.
Inferential Statistics
It uses a random sample from the data to make inferences about the population. This is a valuable method when each and every member of the population cannot be studied.
Inferential statistics can be used only under the following conditions: –
- You have a complete list of the members of the population.
- You draw a random sample from this population.
- Using a pre-established formula, you determine that the sample size is large enough.
Inferential statistics can be used even if your data does not meet these criteria.
- It can help determine the strength of the relationships within your sample.
- If it is very difficult to obtain a population list and/or draw a random sample, then you do the best you can with what you have.
Applications of Inferential statistics
Inferential statistics has its uses in almost every field such as business, medicine, data science, and so on.
- It is an effective tool for forecasting.
- It is used to predict future patterns.