Abstract
We read with great interest the article by Li et al. The results are illuminating for understanding the relationship among Hsa-circRNA11783-2, coronary artery disease (CAD) and type 2 diabetes mellitus (T2DM). However, from our perspective, the bioinformatics analyses need further context as the statistics for differential fold changes in expression data are not explained fully. The authors seem to use unadjusted p values for detecting differentially expressed circular RNA (differentially expressed genes (DEGs)) between the control, CAD and T2DM group. Due to the high false positives caused by a large number of probes and multiple comparisons, it seems essential to analyse microarray data properly to reach a reliable result by a statistical method. Only selecting circular RNA with greater than two fold change with unadjusted p values < 0.05 in expression is not reliable and suitable for high-level microarray analysis.
We read with great interest the article by Li et al. 1 The results are illuminating for understanding the relationship among Hsa-circRNA11783-2, coronary artery disease (CAD) and type 2 diabetes mellitus (T2DM). However, from our perspective, the bioinformatics analyses need further context as the statistics for differential fold changes in expression data are not explained fully. For example, the authors seem to use unadjusted p values for detecting differentially expressed circular RNA (differentially expressed genes (DEGs)) between the control, CAD and T2DM group. Due to the high false positives caused by a large number of probes and multiple comparisons, it seems essential to analyse microarray data properly to reach a reliable result by a statistical method. Only selecting circular RNA with greater than twofold change with unadjusted p values < 0.05 in expression is not reliable and suitable for high-level microarray analysis. In our opinion, there is nothing wrong with using a Student’s t-test for large multivariate data as long as it was corrected, for example, using Benjamini and Hochberg which was admittedly important but easy to be omitted.
We would like to suggest using specialized high-level microarray analysis such as Limma (linear models for microarray analysis), 2 commonly used for statistical testing and analysis of differential expression data using linear models, and choosing more than 1.5-fold expression changes and false discovery rate (FDR) < 0.05 as the cutoff is an appropriate and conservative approach to obtain DEGs, in our view. Moreover, significant analysis of microarray 3 is another and considerable non-parametric statistical algorithm; a twofold expression change and FDR < 0.1 is a rational cutoff to obtain DEGs.
Although the authors appear to perform additional analysis such as quantitative polymerase-chain-reaction (Q-PCR) or other methodology for DEGs in order to verify the results of the microarray experiments from our reading of the article, it is impractical using Q-PCR or other technology to verify all DEGs. Choosing the optimal statistical approach 4 and obtaining accurate and convincing results of DEGs analysis are basis for further data analysis. We welcome the authors to offer further explanation of their data analysis and experimental approach. We suggest transcriptomics data-intensive research would benefit from these considerations and innovations in statistical and data analytical approaches.
Footnotes
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship and/or publication of this article.
