This article is concerned with a subset of numerically stable and scalable algorithms useful to support computationally complex psychometric models in the era of machine learning and massive data. The subset selected here is a core set of numerical methods that should be familiar to computational psychometricians and considers whitening transforms for dealing with correlated data, computational concepts for linear models, multivariable integration, and optimization techniques.
AbadiM.AgarwalA.BarhamP.BrevdoE.ChenZ.CitroC.ZhengX. (2015). TensorFlow: Large-scale machine learning on heterogeneous systems. https://www.tensorflow.org/(Software available from tensorflow.org)
2.
AbramowitzM.StegunI. A. (Eds.). (1965). Handbook of mathematical functions with formulas, graphs and mathematical tables. Dover Publications, Inc.
3.
AckerbergD. (2009). A new use of importance sampling to reduce computational burden in simulation estimation. Quant Mark Econ, 7, 343–376.
4.
AllenN.DonoghueJ.SchoepsT. (2001). The NAEP 1998 technical report (Tech. Rep.). U.S. Department of Education. Office of Educational Research and Improvement. National Center for Education Statistics. https://nces.ed.gov/nationsreportcard/pubs/main1998/2001509.asp
5.
AnderssonB.XinT. (2021). Estimation of latent regression item response theory models using a second-order Laplace approximation. Journal of Educational and Behavioral Statistics, 46(2), 244–265.
ChalmersR. P. (2012). mirt: A multidimensional item response theory package for the R environment. Journal of Statistical Software, 48(6), 1–29.
16.
ChowdharyK.SalloumM.DebusschereB.LarsonV. E. (2015). Quadrature methods for the calculation of subgrid microphysics moments. Monthly Weather Review, 143(7), 2955–2972.
17.
CizekP.CizkovaL. (2004). Iterative methods for solving linear systems. In GentleJ.HärdleW.MoriY. (Eds.), Handbook of computational statistics (pp. 120–126). Springer.
18.
CohenJ. D.JiangT. (1999). Comparison of partially measured latent traits across nominal subgroups. Journal of the American Statistical Association, 94(448), 1035–1044.
19.
DavisT.HagerW. (2005). Row modifications of a sparse Cholesky factorization. SIAM J. Matrix Analysis Applications, 26, 621–639.
20.
DeMarsC. E. (2005). Scoring subscales using multidimensional item response theory models. Poster presented at the annual meeting of the American Psychological Association.
21.
DempsterA. P.LairdN. M.RubinD. B. (1977). Maximum likelihood from incomplete data via the EM algorithm. Journal of the Royal Statistical Society: Series B, 39, 1–38.
22.
DoranH. (2014). Methods for incorporating measurement error in value-added models and teacher classifications. Statistics and Public Policy, 1(1), 114–119.
23.
DoranH.BaileyP.BuehlerE.joo LeeS. (2021). Dire: Linear regressions with a latent outcome variable [Computer software manual]. https://CRAN.R-project.org/package=Dire(R package version 1.0.3)
24.
DoranH.BatesD.BlieseP.DowlingM. (2007). Estimating the multilevel rasch model: With the lme4 package. Journal of Statistical Software, 20(2), 1–18.
25.
EddelbuettelD.BalamutaJ. J. (2018). Extending R with C++: A brief introduction to Rcpp. The American Statistician, 72(1), 28–36.
FerrandoP.Lorenzo-SevaU. (2016). A note on improving EAP trait estimation in oblique factor-analytic and item response theory models. Psicologica, 37(2), 235–247.
28.
FisherR. A. (1925). Theory of statistical estimation. Mathematical Proceedings of the Cambridge Philosophical Society, 22(5), 700–725.
29.
FletcherR. (1987). Practical methods of optimization (Second ed.). John Wiley & Sons.
30.
GelmanA.CarlinJ. B.SternH. S.RubinD. B. (2004). Bayesian data analysis (2nd ed. ed.). Chapman and Hall/CRC.
31.
GenzA.KassR. E. (1997). Subregion-adaptive integration of functions having a dominant peak. Journal of Computational and Graphical Statistics, 6(1), 92–111.
32.
GinerG.SmythG. K. (2016). Statmod: Probability calculations for the inverse Gaussian distribution. The R Journal, 8(1), 339–351.
33.
GolubG.WelschJ. (1969). Calculation of Gauss quadrature rules. Mathematics of Computation, 23, 221–230.
34.
HaoJ.HoT. K. (2019). Machine learning made easy: A review of scikit-learn package in python programming language. Journal of Educational and Behavioral Statistics, 44(3), 348–361.
35.
HedgesL. V.HedbergE. C. (2007). Intraclass correlation values for planning group-randomized trials in education. Educational Evaluation and Policy Analysis, 29(1), 60–87.
InnesM.EdelmanA.FischerK.RackauckasC.SabaE.ShahV. B.TebbuttW. (2019). A differentiable programming system to bridge machine learning and scientific computing. http://arxiv.org/abs/1907.07587
LairdN. M.WareJ. H. (1982). Random-effects models for longitudinal data. Biometrics, 38(4), 963–974.
45.
LangeK. (1999). Numerical analysis for statisticians. Springer New York.
46.
LeeS. X.LeemaqzK. L.McLachlanG. J. (2016). A simple parallel EM algorithm for statistical learning via mixture models. In 2016 International Conference on Digital Image Computing: Techniques and Applications (dicta) (pp. 1–8). https://doi.org/10.1109/DICTA.2016.7796997
47.
LesaffreE.SpiessensB. (2001). On the effect of the number of quadrature points in a logistic random-effects model: An example. Journal of the Royal Statistical Society. Series C (Applied Statistics), 50(3), 325–335.
LiuQ.PierceD. A. (1994). A note on Gauss–Hermite quadrature. Biometrika, 81(3), 624–629.
50.
LockwoodJ. R.McCaffreyD. F. (2020). Recommendations about estimating errors-in-variables regression in stata. The Stata Journal, 20(1), 116–130.
51.
McCullochC. E.SearleS. R. (2001). Generalized, linear, and mixed models. John Wiley and Sons.
52.
McLeanR. A.SandersW. L.StroupW. W. (1991). A unified approach to mixed linear models. The American Statistician, 45(1), 54–64.
53.
MislevyR. J. (1984). Estimating latent distributions. Psychometrika, 49(3), 359–381.
54.
MurakiE.BockR. D. (1999). Parscale: IRT item analysis and test scoring for rating-scale data [Computer software manual]. Chicago, IL.
55.
NabL.van SmedenM.KeoghR. H.GroenwoldR. H. (2021). Mecor: An R package for measurement error correction in linear regression models with a continuous outcome. Computer Methods and Programs in Biomedicine, 208, 106238.
NaylorJ. C.SmithA. M. (1982). Applications of a method for the efficient computation of posterior distributions. Journal of the Royal Statistical Society Series C-applied Statistics, 31, 214–225.
PinheiroJ. C.BatesD. M. (1995). Approximations to the log-likelihood function in the nonlinear mixed-effects model. Journal of Computational and Graphical Statistics, 4(1), 12–35.
60.
PustejovskyJ. E.TiptonE. (2018). Small-sample methods for cluster-robust variance estimation and hypothesis testing in fixed effects models. Journal of Business & Economic Statistics, 36(4), 672–683.
61.
QuarteroniA.SaleriF.GervasioP. (2010). Scientific computing with MATLAB and octave (3rd ed.). Springer.
62.
Rabe-HeskethS.SkrondalA.PicklesA. (2002). Reliable estimation of generalized linear mixed models using adaptive quadrature. The Stata Journal, 2(1), 1–21.
63.
RijmenF. (2009). Efficient full information maximum likelihood estimation for multidimensional IRT models (Tech. Rep.). Educational Testing Service. https://files.eric.ed.gov/fulltext/ED505564.pdf(Research Report No. RR-09-03)
64.
RobitzschA. (2021). A note on a computationally efficient implementation of the EM algorithm in item response models. Quantitative and Computational Methods in Behavioral Sciences, 1(1), 1–16.
65.
RosseelY. (2021). Evaluating the observed log-likelihood function in two-level structural equation modeling with missing data: From formulas to R code. Psych, 3(2), 197–232.
66.
RuderS. (2016). An overview of gradient descent optimization algorithms. ArXiv, abs/1609.04747.
67.
SAS Documentation 14.2. (n.d.). Estimating fixed and random effects in the mixed model [Computer software manual]. (Online Help Manual).
68.
SearleS. (1982). Matrix algebra useful for statistics. John Wiley and Sons.
69.
ShamirO.ZhangT. (2013). Stochastic gradient descent for non-smooth optimization: Convergence results and optimal averaging schemes. In DasguptaS.McAllesterD. (Eds.), Proceedings of the 30th international conference on machine learning (Vol. 28., pp. 71–79). PMLR. https://proceedings.mlr.press/v28/shamir13.html
Sohl-DicksteinJ.PooleB.GanguliS. (2014). Fast large-scale optimization by unifying stochastic gradient and quasi-newton methods. In Proceedings of the 31th international conference on machine learning (Vol. 32). June 22–24, 2014, Bejing, China. http://proceedings.mlr.press/v32/sohl-dicksteinb14.pdf
TengS.-H. (2016). Scalable algorithms for data and network analysis. Foundations and Trends in Theoretical Computer Science, 12(1–2), 1–274.
76.
TengS.-H. (2018). Scalable algorithms in the age of big data and network sciences: Characterization, primitives, and techniques. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining (pp. 6–7). Association for Computing Machinery. https://doi.org/10.1145/3159652.3160602
77.
ToulisP.AiroldiE. M. (2015). Scalable estimation strategies based on stochastic approximations: Classical results and new insights. Statistics and Computing, 25, 781–795.
TuerlinckxF.RijmenF.VerbekeG.De BoeckP. (2006). Statistical inference in generalized linear mixed models: A review. The British Journal of Mathematical and Statistical Psychology, 59, 225–55.
80.
von DavierM. (2016). High-performance psychometrics: The parallel-e parallel-m algorithm for generalized latent variable models. ETS Research Report Series, 2016, 1–11.
81.
WoodburyM. A. (1950). Inverting Modified Matrices. In KuntzmannJ. (Ed.). Princeton University.
82.
ZhangF. (1999). Matrix theory: Basic results and techniques. Springer.
83.
ZhouJ.WeiW.ZhangR.ZhengZ. (2021). Damped newton stochastic gradient descent method for neural networks training. Mathematics, 9(13), 1533.