• KAFM
  • Contact us
  • E-Submission
ABOUT
ARTICLE CATEGORY
BROWSE ARTICLES
AUTHOR INFORMATION

Articles

Commentary

Comments on Statistical Issues in November 2015

Korean Journal of Family Medicine 2015;36(6):357-358.
Published online: November 20, 2015

Department of Biostatistics, The Catholic University of Korea College of Medicine, Seoul, Korea.

Copyright © 2015 The Korean Academy of Family Medicine

This is an open-access article distributed under the terms of the Creative Commons Attribution Non-Commercial License (http://creativecommons.org/licenses/by-nc/3.0/) which permits unrestricted noncommercial use, distribution, and reproduction in any medium, provided the original work is properly cited.

  • 4,359 Views
  • 14 Download
prev
In this section, we explain the definition and solution to avoid the multi-collinearity in multivariate analysis, which appeared in the articles titled, "Time to first cigarette and hypertension in Korean male smokers" and "Barrier factors to the completion of diabetes education in Korean diabetic adult patients: Korea National Health and Nutrition Examination Surveys 2007-2012", published in September 2015 by Lee et al.1) and by Kim et al.,2) respectively.
Multi-collinearity indicates that independent (explanatory) variables are not mutually independent, but have some linearly correlated relationship in multiple (logistic) regression analysis. It is foredoomed that some degree of association will exist among the independent variables in the multivariate analysis. However, when the degree of association between independent variables is extremely high, some coefficients or their standard errors cannot be correctly calculated (estimated); that is, the phenomena are such that no coefficients can be obtained, or extremely large standard errors in the analysis results might occur. In these cases, we say, "We could not obtain proper estimates from the multivariate model due to the multi-collinearity (near-linear dependency)."3)
The most popular measure used to check for multi-collinearity is the variance inflation factor (VIF). The VIF of independent variable (xj) is defined as follows: VIFj=(1-Rj2)-1, where Rj2 is the coefficient of determination obtained when xj is regressed on all the remaining independent variables. If xj is nearly orthogonal to the remaining independent variables, Rj2 is small and VIFj is close to unity, while if xj is nearly dependent on some subset of the remaining independent variables, Rj2 is near unity and VIFj is large. Practical experience indicates that if any of the VIFs exceeds 5 or 10, it is a sure sign that the associated regression coefficients are poorly estimated because of multicollinearity.4)
The simplest and most intuitive method to avoid the multi-collinearity in analysis is using only independent variables with low correlation to each other. Firstly, these independent variables could be chosen by subjective method. For instance, when a researcher has to choose between Body Mass Index (BMI) and body weight, and his/her initial intention focused on BMI, then the independent variable should be the former, regardless of the variable which has higher correlation with the dependent variable.
Secondly, from the statistical point of view, a researcher can select the variable having the highest correlation with the dependent variable. The simplest way is to compare the values of correlation between the competing independent variables with a dependent variable. Also, the easiest method for selecting variables without multi-collinearity, is applying a stepwise method in the variable selection for multivariate statistical analysis programs. However, if a researcher only based on statistical methods for variable selection, its final result would be far from the original intention of the researcher, or clinically unexplainable. On the other hand, ridge regression can handle analysis for the independent variables with multi-collinearity. However, ridge regression is generally not used because the estimated coefficients are biased and its method is not easy to understand.

CONFLICT OF INTEREST: No potential conflict of interest relevant to this article was reported.

  • 1. Lee S, Jang M, Noh HM, Oh HY, Song HJ, Park KH, et al. Time to first cigarette and hypertension in Korean male smokers. Korean J Fam Med 2015;36:221-226. PMID: 26435812.
  • 2. Kim HT, Lee K, Jung SY, Oh SM, Jeong SM, Choi YJ. Barrier factors to the completion of diabetes education in Korean diabetic adult patients: Korea National Health and Nutrition Examination Surveys 2007-2012. Korean J Fam Med 2015;36:203-209. PMID: 26435809.
  • 3. Park YG. Comments on statistical issues in January 2015. Korean J Fam Med 2015;36:42-43. PMID: 25780515.
  • 4. Montgomery DC, Peck EA, Vining GG. Introduction to linear regression analysis. Hoboken (NJ): John Wiley & Sons Inc.; 2006.

Figure & Data

References

    Citations

    Citations to this article as recorded by  

      Download Citation

      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:

      Include:

      Comments on Statistical Issues in November 2015
      Korean J Fam Med. 2015;36(6):357-358.   Published online November 20, 2015
      Download Citation
      Download a citation file in RIS format that can be imported by all major citation management software, including EndNote, ProCite, RefWorks, and Reference Manager.

      Format:
      • RIS — For EndNote, ProCite, RefWorks, and most other reference management software
      • BibTeX — For JabRef, BibDesk, and other BibTeX-specific software
      Include:
      • Citation for the content below
      Comments on Statistical Issues in November 2015
      Korean J Fam Med. 2015;36(6):357-358.   Published online November 20, 2015
      Close
      Comments on Statistical Issues in November 2015
      Comments on Statistical Issues in November 2015
      TOP