Pearson Correlation along with p values and fancy graphs in R
The blog explains correlation analysis in R (Reading time 10 min.)
For data click here and for R-script click here
After that new dialogue box appears, click on "browse" and select your file and click on "import".
b$p
gives p-value
In comparison to pervios function here the values of correlation are proportional to their size. The larger the value the greater the size.
detach("package:psych", unload=TRUE)
This will unload the package "psych"
detach("package:PerformanceAnalytics", unload=TRUE)
This will unload the package "PerformanceAnalytics"
For data click here and for R-script click here
The Pearson correlation coefficient is a measure of the strength of a
linear association between two variables and is denoted by r. The value of r ranges between -1 to +1. Let's see how to calculate correlation, the test of significance and fancy graphics to explain the relationship between variables in R.
Step-I: Import the data
In the II quadrant click on import data and select "For Excel".
After doing this step the "iris" data gets imported in the system and can be seen in Global Environment.
Step-II: Load the script which you had downloaded.
Let's understand the script step by step.
Calculating the correlation and p values
#Gives structure of the data
str(iris)
str(iris)
tibble [150 x 5] (S3: tbl_df/tbl/data.frame)
$ Sepal.Length: num [1:150] 5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
$ Sepal.Width : num [1:150] 3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
$ Petal.Length: num [1:150] 1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
$ Petal.Width : num [1:150] 0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
$ Species : chr [1:150] "setosa" "setosa" "setosa" "setosa" ...
We can see from output that the first 4 variables are numeric (num) and the last one is character (chr).
The code below will install the package named "psych". Make sure that system is connected to the internet.
install.packages("psych")
Note: This line is run for once only. This will install "psych" in system. Next time when we need psych we just load that package.
library("psych")
This line will load the package psych
b<- corr.test(iris[1:4])
The corr.test function saves correlation (r), t values of t-test (t), p-values (p) and standard error (se) in the variable named "b".
The corr.test function saves correlation (r), t values of t-test (t), p-values (p) and standard error (se) in the variable named "b".
"<-" is used to assign the output to variable b
iris[1:4] is used because we want the first four variables as the fifth variable is a character in nature.
b$r
gives a correlation matrix. "$" is used to extract components stored in "b"
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 1.0000000 -0.1175698 0.8717538 0.8179411
Sepal.Width -0.1175698 1.0000000 -0.4284401 -0.3661259
Petal.Length 0.8717538 -0.4284401 1.0000000 0.9628654
Petal.Width 0.8179411 -0.3661259 0.9628654 1.0000000
b$t
gives t statistic value
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length Inf -1.440287 21.646019 17.296454
Sepal.Width -1.440287 Inf -5.768449 -4.786461
Petal.Length 21.646019 -5.768449 Inf 43.387237
Petal.Width 17.296454 -4.786461 43.387237 Inf
b$p
gives p-value
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 0.000000e+00 1.518983e-01 5.193337e-47 9.301992e-37
Sepal.Width 1.518983e-01 0.000000e+00 1.353994e-07 8.146457e-06
Petal.Length 1.038667e-47 4.513314e-08 0.000000e+00 2.805002e-85
Petal.Width 2.325498e-37 4.073229e-06 4.675004e-86 0.000000e+00
b$se
gives standard error
Sepal.Length Sepal.Width Petal.Length Petal.Width
Sepal.Length 0.00000000 0.08162941 0.04027317 0.04728953
Sepal.Width 0.08162941 0.00000000 0.07427301 0.07649200
Petal.Length 0.04027317 0.07427301 0.00000000 0.02219237
Petal.Width 0.04728953 0.07649200 0.02219237 0.00000000
sink("correlation.doc")
print(b)
sink()
print(b)
sink()
The sink function will store the "print(b)" into a document file named "correlation" (You can change the name as per your wish). The document gets stored working directory which can be obtained by using getwd().
Visualization of correlation
pairs.panels(iris[,-5],pch = 21,stars = T)
iris[,-5] is used as we don't want the fifth variable which is a character in nature
pch=21 will give circles in the scatters plot (You can try different numbers)
stars=T will five stars in the figure to indicate significant or non-significant
One can save the plot in image or pdf form by using the Export option of plot section
The upper triangle of the matrix shows correlation values, diagonal shows the distribution of the variable and the lower triangle of the matrix shows the scatter distribution of variables.
install.packages("PerformanceAnalytics")
The above code will install "PerformanceAnalytics". (Internet on!)
The above code will install "PerformanceAnalytics". (Internet on!)
Run the code one time only.
require("PerformanceAnalytics")
This function is similar to library(). This will load the package.
chart.Correlation(iris[1:4], histogram = TRUE, pch = 100)
In comparison to pervios function here the values of correlation are proportional to their size. The larger the value the greater the size.
detach("package:psych", unload=TRUE)
This will unload the package "psych"
detach("package:PerformanceAnalytics", unload=TRUE)
This will unload the package "PerformanceAnalytics"
Bored of reading? Tune to video!
Hope you like this blog and use the new learnings in publication as well as day to day life.
Happy Learning!
Happy Learning!
To learn more about Agricultural Statistics follow my youtube channel
If you find this blog post and youtube video useful than please support my content. This will help me to bring more such content.
Topics you might be interested in:
Degrees of freedom
Principles of designs of experiments-II: Randomization
Principles of design of experiments-III: Local control
Principles of design of experiments-III: Local control
Very nice content! Got to learn something new.
ReplyDeleteThis blog helped a lot
ReplyDelete