Abstract
Although in recent years extensive studies have focused on the estimation of human energy expenditure (EE), two issues still remain to be addressed. First, EE differs substantially between the elderly and young people, but existing models may not be appropriate for analyzing both population types. Furthermore, most recent studies still rely on hand-crafted features or shallow models that fail to capture the spatio-temporal dependencies of sensor signals. In this paper, a deep learning model is presented for accurately estimating EE for both populations. In addition, the impact of sensor number and placement on EE estimation is investigated. A multi-branch spatio-temporal stream network is proposed, which fuses sensor signals effectively and achieves significantly improved EE estimation accuracy. The proposed model utilizes two attention-enhanced modules to fuse two data streams: a temporal stream that captures long-range dynamics in heart rate and motion data, and a spatial stream that manifests multi-scale correlations across sensors. Experiments conducted on two public datasets demonstrate that the proposed method achieves the state-of-the-art performance. Moreover, fusing data from several accelerometers improves EE estimation, yet using only the two best-performing placements already yields satisfactory accuracy. The proposed model enhances EE estimation for both populations through the efficient extraction of comprehensive spatio-temporal features.
Keywords
Get full access to this article
View all access options for this article.
