Pearson correlation coefficient
- Details
- Category: Uncategorised
- Published: Wednesday, 27 November 2024 08:04
- Written by Super User
- Hits: 109
What is Pearson Correlation?
Correlation between sets of data is a measure of how well they are related. The most common measure of correlation in stats is the Pearson Correlation. The full name is the Pearson Product Moment Correlation (PPMC). It shows the linear relationship between two sets of data. In simple terms, it answers the question, Can I draw a line graph to represent the data? Two letters are used to represent the Pearson correlation: Greek letter rho (ρ) for a population and the letter “r” for a sample.
How to Find Pearson’s Correlation Coefficients
By Hand
Example question: Find the value of the correlation coefficient from the following table:
Subject | Age x | Glucose Level y |
---|---|---|
2 | 21 | 65 |
4 | 42 | 75 |
6 | 59 | 81 |
Step 1: Make a chart. Use the given data, and add three more columns: xy, x2, and y2.
Subject | Age x | Glucose Level y | xy | x2 | y2 |
---|---|---|---|---|---|
2 | 21 | 65 | |||
4 | 42 | 75 | |||
6 | 59 | 81 |
Step 2: Multiply x and y together to fill the xy column. For example, row 1 would be 43 × 99 = 4,257.
Subject | Age x | Glucose Level y | xy | x2 | y2 |
---|---|---|---|---|---|
2 | 21 | 65 | 1365 | ||
4 | 42 | 75 | 3150 | ||
6 | 59 | 81 | 4779 |
Step 3: Take the square of the numbers in the x column, and put the result in the x2 column.
Subject | Age x | Glucose Level y | xy | x2 | y2 |
---|---|---|---|---|---|
2 | 21 | 65 | 1365 | 441 | |
4 | 42 | 75 | 3150 | 1764 | |
6 | 59 | 81 | 4779 | 3481 |
Step 4: Take the square of the numbers in the y column, and put the result in the y2 column.
Subject | Age x | Glucose Level y | xy | x2 | y2 |
---|---|---|---|---|---|
2 | 21 | 65 | 1365 | 441 | 4225 |
4 | 42 | 75 | 3150 | 1764 | 5625 |
6 | 59 | 81 | 4779 | 3481 | 6561 |
Step 5: Add up all of the numbers in the columns and put the result at the bottom of the column. The Greek letter sigma (Σ) is a short way of saying “sum of” or summation.
Subject | Age x | Glucose Level y | xy | x2 | y2 |
---|---|---|---|---|---|
2 | 21 | 65 | 1365 | 441 | 4225 |
4 | 42 | 75 | 3150 | 1764 | 5625 |
6 | 59 | 81 | 4779 | 3481 | 6561 |
Σ | 247 | 486 | 20485 | 11409 | 40022 |
Step 6: Use the following correlation coefficient formula.
The answer is: 2868 / 5413.27 = 0.529809 From our table:
- Σx = 247
- Σy = 486
- Σxy = 20,485
- Σx2 = 11,409
- Σy2 = 40,022
- n is the sample size, in our case = 3
The correlation coefficient =
-
- 3(20,485) – (247 × 486) / [√[[6(11,409) – (2472)] × [6(40,022) – 4862]]]
= 0.5298 The range of the correlation coefficient is from -1 to 1. Our result is 0.2649 or 26.49%, which means the variables have a low positive correlation.
Types of correlation coefficient formulas.
There are several types of correlation coefficient formulas. One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:
Types of correlation coefficient formulas.
There are several types of correlation coefficient formulas. One of the most commonly used formulas is Pearson’s correlation coefficient formula. If you’re taking a basic stats class, this is the one you’ll probably use:
Two other formulas are commonly used: the sample correlation coefficient and the population correlation coefficient.