Sage Journals: Discover world-class research

Abstract

Since its introduction, non-negative matrix factorization (NMF) has been a popular tool for extracting interpretable, low-dimensional representations of high-dimensional data. However, several recent studies have proposed replacing NMF with autoencoders. The increasing popularity of autoencoders warrants an investigation on whether this replacement is in general valid and reasonable. Moreover, the exact relationship between non-negative autoencoders and NMF has not been thoroughly explored. Thus, a main aim of this study is to investigate in detail the relationship between autoencoders and NMF. We define a non-negative linear autoencoder, AE-NMF, which is mathematically equivalent with convex NMF, a constrained version of NMF. The performance of NMF and the non-negative linear autoencoder is compared within the context of mutational signature extraction from simulated and real-world cancer genomics data. We find that the reconstructions based on NMF are more accurate compared with AE-NMF, while the signatures extracted using both methods exhibit comparable consistency and performance when externally validated. These findings suggest that AE-NMF, the linear non-negative autoencoders investigated in this article, do not provide an improvement of NMF in the field of mutational signature extraction. Our study serves as a foundation for understanding the theoretical implication of replacing NMF with non-negative autoencoders.

1. INTRODUCTION

Non-negative matrix factorization (NMF) is a popular tool for unsupervised learning (Lee and Seung, 1999). In NMF, a non-negative data matrix is factorized into a product of two non-negative matrices of lower dimension: a basis matrix consisting of basis vectors and a weight matrix consisting of the basis vector’s weights for each observation in the data matrix. NMF has gained a strong footing in various scientific fields due to its high interpretability (Alexandrov et al., 2013; Fang et al., 2018; Ozer et al., 2022). Specifically, NMF has proven to be a useful tool to derive mutational signatures from cancer genomics data.

In mutational signature analysis, it is typically assumed that all somatic mutations in a cancer genome are caused by mutagenic processes that leave a characteristic pattern of mutations in the genome. These patterns are denoted mutational signatures. Several signatures have been identified and linked to different mutagenic processes such as ultraviolet light exposure and tobacco smoking (Nik-Zainal et al., 2015). Alexandrov et al. (2013) proposed using NMF on mutational count data from cancer genomes to decipher the mutational signatures of the processes the patients have been exposed to throughout the development of the disease. NMF has since then been the dominating model for mutational signature extraction (Alexandrov et al., 2020; Blokzijl et al., 2018; Islam et al., 2022). When extracting mutational signatures with NMF, the data matrix consisting of a number of patients’ mutational profiles is decomposed into a matrix representing the signatures of mutagenic processes (basis vectors) and an exposure matrix dictating the number of mutations that can be attributed to each specific process in the mutational profiles of each patient (weight matrix).

In this study, we analyze the frequency distribution of the 96 possible trinucleotides, which represent all single-base substitutions (SBS) flanked by the nucleotides immediately to their left and right, commonly referred to as the trinucleotide context.

Recently, several studies have proposed substituting NMF with non-negative autoencoders, which are increasingly popular for dimensionality reduction (Hosseini-Asl et al., 2016; Khatib et al., 2018; Lemme et al., 2012; Özer et al., 2022; Smaragdis and Venkataramani, 2017). This is also the case in mutational signature extraction (Pancotti et al., 2024; Pei et al., 2020). Pei et al. (2020) suggested using a sparse autoencoder to identify mutational signatures from cancer genomics data and generated estimates that were not only in concordance with existing literature but also correlated meaningfully with observed exogenous exposures. Pancotti et al. (2024) suggested a hybrid architecture with a deep encoding and shallow decoding to relax the assumption of linearity NMF imposes on mutational signature extraction. This trend of using non-negative autoencoders as an alternative to NMF with promising results, especially within mutational signatures, prompts the questions: What is the mathematical relation between autoencoders and NMF? And how do they compare in mutational signature extraction?

This study compares the performance of linear non-negative autoencoders to NMF in the field of mutational signatures. In particular, we compare the in- and out-of-sample reconstruction error, as well as the consistency of the extracted signatures from the ovary, prostate, and uterus tumors from the Genomics England 100,000 Genomes Project (GEL) (Turnbull, 2018; Turro et al., 2020). Moreover, we show theoretically that autoencoders can be constructed as convex NMF, a special case of NMF where the basis vectors are restricted to be convex combinations of columns in the data matrix (Ding et al., 2010). Based on this, we deduce how it impacts interpretation of the estimates generated by linear non-negative one-hidden-layer autoencoders in general and especially within the field of mutational signatures.

In Section 2, we show under what constraints autoencoders are equivalent to convex NMF and use this to establish a meaningful comparison between non-negative linear autoencoders and NMF. Section 3 compares NMF and non-negative linear autoencoders’ performance on simulated and cancer genomics data. Lastly, we discuss and conclude on the results in Sections 4 and 5.

2. METHODS

Consider a non-negative data matrix $V \in ℝ_{+}^{M \times N}$ . In this study, the aim is to decompose V into a basis matrix $H \in ℝ_{+}^{M \times K}$ and a weight matrix $W \in ℝ_{+}^{K \times N}$ , where M denotes the number of features, N denotes the number of observations, and K denotes the number of basis vectors in the latent representation of V . A schematic overview of all decompositions considered in this section can be seen in Figure 1.

FIG. 1.

Schematic representation of the composition of basis vectors and weights in NMF (top) and C-NMF and AE-NMF (bottom). NMF, non-negative matrix factorization.

2.1. Non-negative matrix factorization

NMF decomposes a matrix with non-negative entries into a matrix product of two factor matrices with non-negative entries, one containing a set of basis vectors and one containing a set of weights. The shared dimension, K, of the factor matrices, is typically chosen to be much smaller than the dimensions of the input matrix, making NMF a dimensionality reduction technique.

Standard K-dimensional NMF was introduced by Lee and Seung (1999) and aims to make a reconstruction, $\hat{V}$ , of the original data matrix by a product of two non-negative matrices: $\hat{V} = H W,$ (1)where each column in H represents a basis vector and each column in W represents each sample’s weights when being reconstructed as a linear mixture of the basis vectors.

2.2. Convex NMF

Convex NMF, as introduced by Ding et al. (2010), is a special case of NMF, where the basis vectors are constrained to be spanned by the columns of the data matrix, V , thus the data matrix is approximated by: $\hat{V} = V W_{1} W_{2},$ (2)with $W_{1}, W_{2}^{T} \in ℝ_{+}^{N \times K}$ . Defining $H : = V W_{1}$ as the matrix of basis vectors and $W : = W_{2}$ as the weight matrix gives the model formulation of NMF as defined in Equation (1). Ding et al. (2010) focus mainly on the general case where the data matrix, V , can assume values within all real numbers, but this article considers the version of convex NMF where V is constrained to be non-negative, which is contained in the general updating steps from Ding et al. (2010).

2.3. Autoencoders

Autoencoders consist of an encoder applied to the input to create a latent representation and a decoder that maps the latent representation to a reconstruction of the input (Kramer, 1991). Choosing the dimension of the latent representation to be lower than the dimension of the input makes the autoencoder a dimensionality reduction technique. A single hidden layer and fully connected autoencoder’s reconstruction, $\hat{V}$ , of a data matrix, V , is mathematically defined as: $\hat{V} = ϕ_{dec} (ϕ_{enc} (V W_{enc} + b_{enc}) W_{dec} + b_{dec}),$ (3)where $W_{enc}, W_{dec}^{T} \in ℝ^{N \times K}$ are the encoding and decoding matrices, $b_{enc} \in ℝ^{K}$ and $b_{dec} \in ℝ^{N}$ are the bias terms of the encoding and the decoding layers, and $ϕ_{enc} : ℝ^{M \times K} \mapsto ℝ^{M \times K}$ and $ϕ_{dec} : ℝ^{M \times N} \mapsto ℝ^{M \times N}$ are the activation functions which provide entry-wise modifications of the affected nodes.

In this study, we define linear autoencoders as single-hidden-layer autoencoders with identity activation functions.

2.3.1. Non-negativity

There are several methods to enforce non-negativity in autoencoders as defined in Equation (3). An overview of those considered in this article is provided in Supplementary Table S1 and elaborated on in Sections 1.3 and 2.3 in Supplementary Data. Here, non-negativity is enforced by taking the absolute value of the encoding and decoding weight matrices in the forward pass.

2.4. Mathematical equivalence and interpretation

Setting $b_{enc} = 0_{K}, b_{dec} = 0_{N}$ , and $ϕ_{enc}, ϕ_{dec} : x \mapsto x$ while constraining the weights to be non-negative in Equation (3) yields exactly the convex NMF formulation from Equation (2), where $W_{1}$ corresponds to the encoding matrix and $W_{2}$ corresponds to the decoding matrix. Thus, linear non-negative autoencoders without bias terms and convex NMF are mathematically equivalent, since convex NMF is a special case of the autoencoder. The choice between convex NMF and the autoencoder defined above reduces to the choice of how to optimize the problem, either by the multiplicative updating steps derived by Ding et al. (2010) or by the gradient descent-based additive updates of the autoencoder.

The core architecture of this autoencoder is analogous to that of Pei et al. (2020), but differs by fixing the bias terms to zero instead of estimating them through training and choosing the identity function as activation functions instead of the rectified linear unit (ReLU) for the encoding layer and softmax for the decoding layer. The proposed autoencoder differs from the autoencoder proposed by Pancotti et al., (2024) in the orientation of the input matrix and in the depth of the encoding network. The latent representation in this study corresponds to the exposure matrix rather than the signature matrix, and the latent representation in this article is derived from a shallow encoding instead of a deep, non-linear encoding. An overview of the architectural differences of autoencoders proposed for mutational signature extraction can be seen in Table 1. Although all models described in Table 1 differ at various assumptions, they do share the common trait that decoding is performed in a single layer.

Table 1.
Overview of Architectural Differences and Similarities of Autoencoders Suggested for Mutational Signature Extraction

Encoder Interpretation Decoder

Model $n_{l}$ $ϕ$ Latent layer $n_{l}$ $ϕ$

AE-NMF 1 Linear Signatures 1 Linear

Pei et al. (2020) 1 ReLU Signatures 1 Softmax

Pancotti et al. (2024) 3 3×Softplus Exposures 1 Linear

	Encoder	Interpretation	Decoder
AE-NMF	1	Linear	Signatures	1	Linear
Pei et al. (2020)	1	ReLU	Signatures	1	Softmax
Pancotti et al. (2024)	3	3×Softplus	Exposures	1	Linear

$ϕ$ , activation function; n_l, number of layers; NMF, non-negative matrix factorization.

We will use “C-NMF” to refer to convex NMF optimized with the multiplicative updating steps derived by Ding et al. (2010) and use “AE-NMF” for the class of autoencoders constructed equivalently to C-NMF.

The encoded data matrix in AE-NMF and C-NMF, $V W_{1}$ , which is a convex combination of the columns of the data matrix, is interpreted as the basis matrix, H , in conventional NMF and hereby the interpretation of $W_{2}$ corresponds to that of W in conventional NMF. Specifically, within mutational signature analysis, this means that the signature matrix is a convex combination of the patients’ mutational profiles.

Though the interpretation of AE-NMF, C-NMF, and NMF is similar, there are still considerable differences between C-NMF, AE-NMF, and standard NMF. In particular, NMF estimates $N \cdot K + K \cdot M$ parameters, whereas AE-NMF and C-NMF estimates $2 \cdot (K \cdot N)$ parameters. As $K (N + M) < 2 K N \Leftrightarrow N + M < 2 N \Leftrightarrow M < N,$ (4)

AE-NMF and C-NMF will estimate a larger number of parameters in the factor matrices than NMF when the number of observations N surpasses the number of features M, which is often the case within mutational signature extraction. Moreover, the computational complexity for training NMF with respect to the Frobenius norm is of order $n_{epochs} (NMK + K^{2} (N + M))$ , the updates for C-NMF is of order $N^{2} P + n_{epochs} (N^{2} K + K^{2} N)$ and the updates for AE-NMF is of the order $n_{epohcs} (2 N K M)$ . Thus, while NMF and AE-NMF scale linearly with the number of observations, C-NMF scales quadratically with the number of observations when optimizing with respect to the Frobenius norm. Detailed derivations of the computational complexities can be found in Section 1.1.1 in Supplementary Data (Ding et al., 2010).

3. RESULTS

For each cohort in both simulated and real cancer data, patients were divided into 30 training/test splits with the ratio of 80/20. De novo extractions by NMF and AE-NMF were performed on the training sets optimized with respect to both the Frobenius norm (mean squared error [MSE]) and the Kullback-Leibler divergence (KLD) according to the updates defined in Section 1.1 in Supplementary Data. The results generated from optimizing with respect to the MSE are reported in this section, but corresponding results for the KLD can be found in Section 2 in Supplementary Data. Refits on the test sets were performed according to Section 1.1.5 in Supplementary Data, yielding the test errors. All training was performed with a relative tolerance of $10^{- 10}$ as convergence criteria, and AE-NMF was optimized using a learning rate of $η = 10^{- 4}$ . The choice of optimizer, learning rate, and tolerance was decided based on test runs, and deemed satisfactory once we observed that all three models converged toward similar minima. The average cosine similarity (ACS) between two given signature matrices as well as the signature consistency (SC) were calculated according to the methodology described in Sections 1.1.2, 1.1.3, and 1.1.4 in Supplementary Data.

To emulate the terminology used in the cancer data analysis, the matrix of basis vectors will be denoted as signatures, and the weight matrix will be denoted as exposures.

3.1. Simulated data

The dataset analyzed in this section is “Scenario 8” from Islam et al. (2022), which consists of 1000 trinucleotide mutational profiles simulated to emulate a mixture of renal cell carcinomas and ovarian adenocarcinomas, alongside the corresponding true signature matrix and exposure matrix. There are three ground-truth signatures resembling the catalogue of somatic mutations in cancer (COSMIC) signatures SBS3, SBS5, and SBS40, which are all considered to be relatively flat and featureless signatures (Tate et al., 2019). In all analyses of this dataset, the true number of signatures, K = 3, is used.

The reconstruction loss in the training and test settings, ACS with the ground truth signatures, and the SC of all models can be seen in Table 2. These analyses reveal that while all models perform equally well in recovering the ground truth signatures with equal signature consistencies, NMF consistently performs most accurately in reconstruction of the input data. This is both the case for training and test data and for both loss functions.

Table 2.
Average Training and Test Error Between the Input and Reconstructed Data for Each Method Across the 30 Splits of the Scenario 8 Cohort (Islam et al., 2022)

Model Train loss Test loss ACS SC

MSE NMF 8.06 · 10⁻² 8.15 · 10⁻² 0.99 0.99

AE-NMF $125 \cdot 10^{- 2}$ $127 \cdot 10^{- 2}$ 0.99 0.99

C-NMF $36.8 \cdot 10^{- 2}$ $37.8 \cdot 10^{- 2}$ 0.99 0.99

KLD NMF 8.29 · 10⁻⁴ 9.66 · 10⁻³ 0.99 0.99

AE-NMF $817 \cdot 10^{- 4}$ $1590 \cdot 10^{- 3}$ 0.99 0.99

	Model	Train loss	Test loss	ACS	SC
MSE	NMF	8.06 · 10⁻²	8.15 · 10⁻²	0.99	0.99
AE-NMF	$125 \cdot 10^{- 2}$	$127 \cdot 10^{- 2}$	0.99	0.99
C-NMF	$36.8 \cdot 10^{- 2}$	$37.8 \cdot 10^{- 2}$	0.99	0.99
KLD	NMF	8.29 · 10⁻⁴	9.66 · 10⁻³	0.99	0.99
AE-NMF	$817 \cdot 10^{- 4}$	$1590 \cdot 10^{- 3}$	0.99	0.99

The lowest reconstruction error in each cohort is highlighted in bold.

ACS, average cosine similarity with true signatures; MSE, mean squared error; KLD, Kullback-Leibler divergence; SC, signature consistency.

3.2. Cancer data

Extraction of mutational signatures is performed on the trinucleotide representations of the tumor-normal whole genome sequences of the 713 ovary, 311 prostate, and 523 uterus tumors from GEL (Turnbull, 2018; Turro et al., 2020).

The number of signatures for each cohort was determined with the method described in Section 1.2 of Supplementary Data using 10 bootstrap samples. The test errors as a function of $K \in {2, \dots, 12}$ are depicted in Supplementary Figures S2 and Figure S3. Algorithm S2 in Supplementary Data yielded 3 signatures for AE-NMF, and 4 signatures for C-NMF and NMF in the ovary cohort; 4, 4, and 6 signatures for AE-NMF, C-NMF and NMF, respectively, in the prostate cohort and 4, 6, and 11 signatures for AE-NMF, C-NMF, and NMF, respectively, in the uterus cohort. Thus, the number of signatures used in the cancer data analyses is chosen using the weighted average defined in Equation (S12), to be four for the ovary cohort, five for the prostate cohort, and eight for the uterus cohort. A similar analysis for the KLD yielded five signatures for the ovary cohort, six for the prostate cohort, and nine for the uterus cohort.

3.2.1. Extraction performance

As shown in Section 2.4, C-NMF and AE-NMF are mathematically equivalent. Furthermore, since the resulting factor matrices of C-NMF and AE-NMF overlapped in the proof of concept in Section 2.1 in Supplementary Data and C-NMF and AE-NMF performed similarly in both the simulated data and in the cancer data analysis with respect to reconstruction error (Table 2, Supplementary Figs. S2 and Figs. S5) and consistency (Supplementary Fig. S7), we consider C-NMF and AE-NMF as practically equivalent. Thus, the following analyses will be performed comparing only NMF and AE-NMF. Boxplots of the training and test errors for each method and cohort are shown in Figure 2 and Supplementary Figures S5 and Figure S6. Table 3 reports the average training and test error across the 30 splits for each method, cohort, and loss function as well as the average ratio between the NMF reconstruction loss and the AE-NMF reconstruction loss.

FIG. 2.

Boxplots of the training and test MSE of 30 train/test splits of the ovary, prostate and uterus cohorts. NMF and AE-NMF errors resulting from the same splits are connected by a black line. The boxes are colored corresponding to the method used. Note that the y-axis is on log scale. MSE, mean squared error.

FIG. 3.

First and second principal component of the de novo extracted signatures from the 30 train/test splits for MSE optimized AE-NMF and NMF (columns) and each cancer type (rows). Points are colored by the PAM clustering assignment, and the cluster mediods are highlighted with a black outline. PAM, partitioning around medoids.

Table 3.

Average Training and Test Error Between the Input and Reconstructed Data for Each Method Across the 30 Splits of the Ovary, Prostate, and Uterus Cohort and the Average Ratio between the Errors

	Cohort	Split	Average error		Ratio
	Cohort	Split	NMF	AE-NMF	Ratio
MSE	Ovary	Train	5.92 · 10³	12.9 · 10³	2.17
	Ovary	Test	1510 · 10³	1720 · 10³	1.63
	Prostate	Train	1.06 · 10²	2.39 · 10²	2.24
	Prostate	Test	28.1 · 10²	31.6 · 10²	1.23
	Uterus	Train	5.33 · 10⁵	13.9 · 10⁵	2.61
	Uterus	Test	11.1 · 10⁵	25.7 · 10⁵	2.45
KLD	Ovary	Train	3.23 · 10⁰	7.13 · 10⁰	2.15
	Ovary	Test	27.2 · 10⁰	30.7 · 10⁰	1.27
	Prostate	Train	8.01 · 10⁻¹	13.7 · 10⁻¹	1.75
	Prostate	Test	24.0 · 10⁻¹	31.0 · 10⁻¹	1.24
	Uterus	Train	2.26 · 10¹	6.57 · 10¹	2.94
	Uterus	Test	4.11 · 10¹	8.26 · 10¹	1.97

The lowest reconstruction error in each cohort is highlighted in bold.

NMF consistently performs better than AE-NMF in terms of reconstructing the input data for both training and test data across all cohorts and for both loss functions. The ratios in Table 3 reveal that the difference is more expressed in the training splits than the test splits.

For each method, cohort, and loss function, the analyses yielded 30 signature sets, one from each training set. Plots of the first and second principal component of the signatures from the 30 training sets are depicted in Figure 3 for MSE and in Supplementary Figure S9 for the KLD. The consistency of the estimated signatures extracted within each method was thus based on a total of $(\begin{matrix} 30 \\ 2 \end{matrix}) = 435$ comparisons for each cohort, method, and loss function and resulted in an average SC (in Supplementary Table S2) of 0.91 for NMF and 0.85 for AE-NMF in the ovary cohort, 0.86 for NMF and 0.88 for AE-NMF in the prostate cohort, and 0.94 for NMF and 0.91 for AE-NMF in the uterus cohort. To assess whether the difference in mean consistency between NMF and AE-NMF was significant, two-sided t-tests for equal means were performed for each cohort. These resulted in a p-value of $3.7 \cdot 10^{- 18}$ for the ovary cohort, $7.0 \cdot 10^{- 10}$ for the prostate cohort, and $1.1 \cdot 10^{- 11}$ for the uterus cohort for signatures extracted with respect to the MSE. The consistencies are depicted by boxplots in Figure 4. This reveals generally comparable consistencies of NMF and AE-NMF signatures. Similar tendencies were observed for signatures extracted with respect to the KLD, these results are elaborated in Section 2.4, Supplementary Table S2, and Supplementary Figure S8.

FIG. 4.

Boxplots of the ACS between each combination of AE-NMF and NMF signatures extracted using the MSE across the 30 splits of the data matrix for the ovary, prostate, and uterus cohort. ACS, average cosine similarity.

3.2.2. Signature validation

To compare the estimated signatures with known libraries, the signatures were clustered to form sets of consensus signatures using the partitioning around medoids (PAM) algorithm (Kaufman and Rousseeuw, 1990) with the number of clusters equal to the number of signatures used in the initial extractions. The clustering is depicted in Figure 3 and Supplementary Figure S9, where the first and second principal components of all de novo signatures are depicted and the points are colored by their assigned PAM clustering. The consensus signatures were subsequently matched to the COSMIC v. 3.4 signatures (Tate et al., 2019), as well as the Signal library of mutational signatures, which is based on the publication from Degasperi et al. (2022) where mutational signatures were extracted based on the GEL cohorts, as in this study. Comparisons with COSMIC signatures were made with the full library and comparisons with the signal signatures were made with the related organ-specific signatures extracted solely on the GEL cohort. The matched signatures along with their cosine similarity can be seen in Table 4.

Table 4.
The COSMIC and Signal Signatures Matched to the Consensus Signatures Estimated with Respect to the MSE and the Corresponding Cosine Similarity

The MSE-estimated NMF and AE-NMF signatures displayed similar (with a slight advantage to NMF) average cosine similarities to the COSMIC library (0.80, 0.90, and 0.89 for NMF against 0.82, 0.86, and 0.83 for AE-NMF for the ovary, prostate, and uterus signatures, respectively), the average cosine similarities with the signal signatures display an advantage to the AE-NMF signatures (0.93, 0.83, and 0.85 for NMF against 0.93, 0.87, and 0.89 for AE-NMF for the ovary, prostate, and uterus signatures, respectively). Among the four, five, and eight consensus signatures generated for the ovarian, prostate, and uterine cancer cohorts, the matched COSMIC signatures had previously been observed in the corresponding cancer types in 1/4 of NMF and 0/4 of AE-NMF signatures in the ovarian cohort, 2/5 of both NMF and AE-NMF signatures in the prostate cohort, and 6/8 of both NMF and AE-NMF signatures in the uterus cohort. Overall both NMF and AE-NMF show a high degree of conformity with both COSMIC and signal SBS signatures, and the extracted signatures are relevant to the diagnoses in which they have been identified. In SC, conformity with COSMIC, and choosing relevant signatures, NMF and AE-NMF perform similarly, perhaps with a slight advantage to NMF. The matched COSMIC signatures for all splits before clustering for NMF and AE-NMF can be seen in Supplementary Figures S10–S15.

4. DISCUSSION

In this study, we compare NMF and AE-NMF by their ability to extract valid and consistent basis vectors and creating accurate reconstructions of the input data. We assert that such comparisons are theoretically meaningful since we demonstrate that AE-NMF and C-NMF, a constrained version of NMF, are mathematically equivalent.

The study focuses on extracting mutational signatures in the ovary, prostate, and uterus cancer genomes of Genomics England’s 100,000 Genomes cohort, optimizing both with respect to the MSE and the KLD. Across all cancer types and both loss functions NMF consistently outperformed AE-NMF in terms of reconstruction error; the differences being more expressed in the training data than in the test data. While AE-NMF constrains parameters to convex combinations of patients’ profiles, NMF can freely assume non-negative values, giving it an advantage in training set reconstruction. When reconstructing the test set, the signature matrix is fixed, and the task is, thus, identical for NMF and AE-NMF. One could expect the constrained nature of AE-NMF to regularize the signatures such that they would reconstruct the test splits better compared with NMF, but as this was not the case in this study, AE-NMF signatures may be generally less informative than the corresponding NMF signatures. The validation with existing mutational signature libraries revealed that both models recovered relevant signatures with high cosine similarity, which is interesting considering that the majority of signatures in these libraries are extracted using NMF-based methods (Alexandrov et al., 2020).

By establishing the mathematical equivalence between C-NMF and AE-NMF, we underline that it is meaningful to interpret the parameters in AE-NMF similarly to NMF. We deem this equivalence is necessary for proper comparison of NMF and non-negative autoencoders, which has been missing in previous attempts to replace NMF with autoencoders while using the same interpretation. Squires et al. (2019) also came to the conclusion that an autoencoder with the architecture of AE-NMF yields a hidden layer consisting of convex combinations of the data points but did not make the connection to convex NMF.

The architecture of AE-NMF is atypical for autoencoders by its shallow and linear nature and by transposing the input data matrix. By orienting the input matrix, V , with features (M) as rows and observations (N) as columns, the architecture will stray from how data are conventionally passed through neural networks, but this orientation is not uncommon in the literature of non-negative autoencoders (Khatib et al., 2018; Pei et al., 2020; Squires et al., 2019). One direct consequence of this orientation is that it will not be possible to fit a separate sample by passing it through the trained network, a compelling attribute of neural networks, as the trained parameters will be observation specific. Furthermore, the shallow structure of AE-NMF favors the capture of linear relationships over more complex patterns. As observed in Section 2, decoding in a single layer is the single consistent feature in the architecture of autoencoders proposed for mutational signature extraction, as it makes the decoding task similar to that of NMF.

This article defines under what exact conditions the non-negative autoencoder yields parameters that can be interpreted equivalently to those resulting from NMF but also highlights a disadvantage of the linear non-negative autoencoder used in this study in regards to reconstruction accuracy. In contrast, the autoencoder proposed by Pancotti et al., (2024) utilizes deeper extraction and conventional input orientation, which jeopardizes the link to NMF and the exact interpretation of parameters, but the increased complexity holds a potential for improved extraction performance. On the contrary, AE-NMF has an advantage over especially C-NMF when considering the computational complexity and runtime of the estimation. Section 2.4 shows that C-NMF scales quadratically in both the number of patients (N) and the number of signatures (K), NMF scales quadratically in the number of signatures (K) and linearly in the number of patients, and AE-NMF scales linearly in the number of signatures (K) and patients (N). These theoretical complexities are consistent with the runtimes observed in the bootstrap analysis for determining the number of signatures, where we observe C-NMF veering from NMF and AE-NMF from early values of K, while NMF diverges from AE-NMF at higher K’s dependent on the number of patients N.

In this study, a relative tolerance convergence criteria of $10^{- 10}$ was used for all analyses. Using such a low convergence criteria ensured that the three methods, based on two vastly different updating schemes, converged toward similar minima and established a common ground for comparison. Using a lower relative tolerance in the real data analyses was computationally infeasible. In particular, the low tolerance made C-NMF extremely time consuming in cases with many patients and/or signatures. The bootstrap method for determining the number of signatures was limited to 10 samples for the same reason. If the aim is to just extract signatures using either method, we do not necessarily recommend training with such a low tolerance.

5. CONCLUSION

This study compares NMF with linear non-negative autoencoders in mutational signature extraction, arguing that such comparisons are, in fact, theoretically relevant as this study shows that linear non-negative autoencoders and convex NMF are mathematically equivalent. This bridges a gap in the comparison of NMF and linear non-negative autoencoders by offering insights into parameter interpretation that were previously lacking. The choice between convex NMF and its autoencoder equivalent is a question of choosing between a multiplicative or gradient descent-based optimizing algorithm to solve the same optimization problem and the linear non-negative autoencoder described in this study can, thus, be used as a faster alternative to convex NMF. In the comparison with NMF, linear non-negative autoencoders exhibit higher reconstruction errors and similar consistencies when validated in external signature libraries, therefore the linear non-negative autoencoder investigated in this study is not a suitable alternative to NMF in mutational signature extraction. This study underscores the significance of methodological considerations when replacing NMF with non-negative autoencoders. On the contrary, autoencoders hold promise for modeling non-linearity, a capability absent in NMF, but, as this article asserts, such advancements are made at the cost of exact parameter interpretation.

Footnotes

ACKNOWLEDGMENT

The authors are grateful to the two anonymous reviewers for constructive and helpful comments on this article.

AUTHORS’ CONTRIBUTIONS

I.E.: Conceptualization (equal), formal analysis (lead), methodology (equal), software (lead), visualization (lead), writing—original draft (lead), writing—review and editing (equal). R.F.B.: Conceptualization (equal), formal analysis (supporting), supervision (equal), writing—original draft (supporting), writing—review and editing (equal). M.P.: Conceptualization (equal), formal analysis (supporting), writing—review and editing (equal). A.H.: Conceptualization (equal), formal analysis (supporting), software (supporting), writing—review and editing (equal). M.B.: Conceptualization (equal), formal analysis (supporting), supervision (equal), writing—original draft (supporting), writing—review and editing (equal).

CODE AND DATA AVAILABILITY

Code used for this study can be found at Github (https://github.com/CLINDA-AAU/AE-NMF) and the Genomics England WGS data used in this study was provided by Turro et al. (2020) on Zendo ().

AUTHOR DISCLOSURE STATEMENT

No competing financial interests exist.

FUNDING INFORMATION

This work was supported by the Danish Data Science Academy (DDSA-PhD-2022-005), which is funded by the Novo Nordisk Foundation (NNF21SA0069429) and VILLUM FONDEN (40516), the Novo Nordisk Foundation (NNF21OC0069105), and the Research Hive “REPAIR”, which is funded by Aalborg University Hospital.

SUPPLEMENTARY MATERIAL

References

Alexandrov

, Nik-Zainal

, Wedge

, et al. Deciphering signatures of mutational processes operative in human cancer. Cell Rep, 2013; 3(1):246–259; doi: 10.1016/j.celrep.2012.12.008

Alexandrov

, Kim

, Haradhvala

, et al.; PCAWG Consortium. The repertoire of mutational signatures in human cancer. Nature, 2020; 578(7793):94–101; doi: 10.1038/s41586-020-1943-3

Blokzijl

, Janssen

, Boxtel

, et al. MutationalPatterns: Comprehensive genome-wide analysis of mutational processes. Genome Med, 2018; 10(1):33; doi: 10.1186/s13073-018-0539-0

Degasperi

, Zou

, Amarante

, et al.; Genomics England Research Consortium. Substitution mutational signatures in whole-genome–sequenced cancers in the UK population. Science, 2022; 376(6591); doi: 10.1126/science.abl9283

Ding

, Li

, Jordan

. Convex and semi-nonnegative matrix factorizations. IEEE Trans Pattern Anal Mach Intell, 2010; 32(1):45–55; doi: 10.1109/TPAMI.2008.277

Fang

, Li

, Xu

, et al. Sparsity-constrained deep nonnegative matrix factorization for hyperspectral unmixing. IEEE Geosci Remote Sensing Lett, 2018; 15(7):1105–1109; doi: 10.1109/LGRS.2018.2823425

Hosseini-Asl

, Zurada

, Nasraoui

. Deep learning of part-based representation of data using sparse autoencoders with nonnegativity constraints. IEEE Trans Neural Netw Learn Syst, 2016; 27(12):2486–2498; doi: 10.1109/TNNLS.2015.2479223

Islam

, Díaz-Gay

, Wu

, et al. Uncovering novel mutational signatures by de novo extraction with SigProfilerExtractor. Cell Genom, 2022; 2(11); doi: 10.1016/j.xgen.2022.100179

Kaufman

, Rousseeuw

. Partitioning Around Medoids (Program PAM), chapter 2, pp. 68–125. John Wiley & Sons, Ltd, 1990; doi: 10.1002/9780470316801.ch2

10.

Khatib

, Huang

, Ghodsi

, et al. Nonnegative matrix factorization using autoencoders and exponentiated gradient descent. In: International Joint Conference on Neural Networks (IJCNN), 2018, pp. 1–8; doi: 10.1109/IJCNN.2018.8489242

11.

Kramer

. Nonlinear principal component analysis using autoassociative neural networks. AIChE Journal, 1991; 37(2):233–243; doi: 10.1002/aic.690370209

12.

Lee

, Seung

. Learning the parts of objects by non-negative matrix factorization. Nature, 1999; 401(6755):788–791; doi: 10.1038/44565

13.

Lemme

, Reinhart

, Steil

. Online learning and generalization of parts-based image representations by non-negative sparse autoencoders. Neural Netw, 2012; 33:194–203; doi: 10.1016/j.neunet.2012.05.003

14.

Nik-Zainal

, Kucab

, Morganella

, et al. The genome as a record of environmental exposure. Mutagenesis, 2015; 30(6):763–770; doi: 10.1093/mutage/gev073

15.

Pancotti

, Rollo

, Codicè

, et al. Muse-xae: Mutational signature extraction with explainable autoencoder enhances tumour types classification. Bioinformatics, 2024; 40(5); doi: 10.1093/bioinformatics/btae320

16.

Pei

, Ruifeng

, Dai

, et al. Decoding whole-genome mutational signatures in 37 human pan-cancers by denoising sparse autoencoder neural network. Oncogene, 2020; 39(27):5031–5041; doi: 10.1038/s41388-020-1343-z

17.

Smaragdis

, Venkataramani

. A neural network alternative to non-negative audio models. In: 2025 IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP). IEEE; 2017, pp. 86–90; doi: 10.1109/ICASSP.2017.7952123

18.

Squires

, Bennett

, Niranjan

. A variational autoencoder for probabilistic non-negative matrix factorisation. arXiv Preprint arXiv, 2019.

19.

Tate

, Bamford

, Jubb

, et al. COSMIC: The catalogue of somatic mutations in cancer. Nucleic Acids Res, 2019; 47(D1):D941–D947; doi: 10.1093/nar/gky1015

20.

Turnbull

. Introducing whole genome sequencing into routine cancer care: The genomics England 100,000 Genomes project. Ann Oncol, 2018; 29(4):784–787; doi: 10.1093/annonc/mdy054

21.

Turro

, Astle

, Megy

, et al.; NIHR BioResource for the 100,000 Genomes Project. Whole-genome sequencing of patients with rare diseases in a national health system. Nature, 2020; 583(7814):96–102; doi: 10.1038/s41586-020-2434-2

22.

Özer

, Hansen

, Zunner

, et al. Investigating nonnegative autoencoders for efficient audio decomposition. In 30th European Signal Processing Conference (EUSIPCO). 2022, pp. 254–258; doi: 10.23919/EUSIPCO55093.2022.9909787

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

10.80 MB

	Encoder		Interpretation	Decoder
Model	$n_{l}$	$ϕ$	Latent layer	$n_{l}$	$ϕ$
AE-NMF	1	Linear	Signatures	1	Linear
Pei et al. (2020)	1	ReLU	Signatures	1	Softmax
Pancotti et al. (2024)	3	3×Softplus	Exposures	1	Linear

On the Relation Between Linear Autoencoders and Non-Negative Matrix Factorization for Mutational Signature Extraction