Abstract
This paper presents a complete bibliometric analysis of Aitchison’s 1986 seminal book “The Statistical Analysis of Compositional Data.” We have set three objectives. The first is to analyze the academic structure of Aitchison’s 1986 book. Results reveals that although the work has received citations uninterruptedly since its publication, the number of these has increased very significantly over the past 4 years. This is due to the significant increase in the number of publications on the theme of “Compositional Data Analysis” in fields related to “geoscience” over the last few years. The second objective is to determine which main journals Aitchison’s book has been cited in. The results highlight that the main journals are indexed under the following WoS category: “Geosciences, Multidisciplinary” and “Ecology.” Of these, “Mathematical Geosciences” and “Computers, Geosciences” stand out. The third objective is to determine the main topics analyzed in the principal papers published by authors citing Aitchison’s book. Our results show that the keywords in the main papers to have cited Aitchison’s 1986 book originate from the geoscience field, since many of them are related to concepts directly linked to this field and refer to terms related to “biodiversity,” “geodiversity,” “geoheritage,” and “georesources.” Lastly, the analysis shows how the CoDA methodology is now in a phase of exponential growth, expanding to other fields. This implies that geoscience is becoming consolidated in the scientific literature as one of the branches of modern science that has given rise to a new mathematical theory of great impact.
Introduction
Compositional data are positive data that carry only relative information and in the most common situations they sum up to a constant (Filzmoser & Hron, 2009). They are frequent in geology and chemistry, for example, since total amounts are trivially related to the size of the soil or chemical sample, so that only relative importance is of interest. Using standard statistical techniques with compositional data can produce inconsistent results due to a set of undesirable problems, including the problem of spurious correlation of ratios (Pearson, 1897), dependency on scale, appearance of outliers and asymmetry, out-of-range forecasts (negative or above the constant sum) or inconsistency of the sub-composition (Aitchison, 1986). The lack of a solution to the problems inherent in compositional data led Miesch (1969) to state that the problem of the constant sum was one of the most important and most difficult problems encountered when analyzing and interpreting geochemical data. In 1986, Aitchison presented a book entitled The Statistical Analysis of Compositional Data (Aitchison, 1986), which detailed a whole set of techniques based on compositional data, the results obtained being consistent when based on solid mathematical foundations (e.g., Daunis-i-Estadella et al., 2011; Graffelman et al., 2018; Thomas & Aitchison, 2005; Tolosana-Delgado et al., 2019; Verma et al., 2006). At present, despite the proliferation of other manuals analyzing the foundations postulated in Aitchison’s work (see, e.g., Filzmoser et al., 2018; Greenacre, 2018; Pawlowsky-Glahn et al., 2015; Van Den Boogaart & Tolosana-Delgado, 2013), said foundations are still valid, and provide a solid basis for validating results, which is why Aitchison’s (1986) book is universally considered to be both essential and seminal.
Consolidation of the statistical methods compiled in Aitchison’s book for the field of geoscience (Chakraborty et al., 2020; Hron et al., 2021; Mikšová et al., 2020; Pawlowsky-Glahn & Egozcue, 2020; Pospiech et al., 2021) and the current expansion of these techniques to new scientific fields, such as chemistry, biology, medicine, psychology, education, communication, demography, geography, and other social science disciplines (Batista-Foguet et al., 2015; Belles-Sampera et al., 2016; Blasco-Duatis et al., 2018; Carreras Simó & Coenders, 2020; Carreras-Simó & Coenders, 2021; Coenders & Ferrer-Rosell, 2020; Ezbakhe & Pérez Foguet, 2020; Ferrer-Rosell et al., 2015; Kogovšek et al., 2013; Linares-Mustarós et al., 2018; Muller et al., 2018; Ortells et al., 2016; Rodrigues et al., 2011; Sanz-Sanz et al., 2018) explains the exponential growth in the number of citations of the book, while at the same time confirming the claim that geoscience is establishing itself in the scientific literature as a branch of modern science that has adopted a new mathematical theory of great impact.
This paper presents a bibliometric analysis of the aforementioned book, “The Statistical Analysis of Compositional Data,” with the aim of studying the relationships existing between this seminal publication in the field of geoscience and modern science. In the first part of this analysis, we present an overview of the academic structure of publications that have cited Aitchison’s work, while in the second part we present a study of the main journals and research topics to have been addressed in articles by authors who have cited it, with the aim of providing an answer to the following research questions (RQ):
RQ1: What is the academic structure of Aitchison’s 1986 book?
RQ2: In which main journals has Aitchison’s book been cited?
RQ3: What main topics are analyzed in the principal papers published by authors citing Aitchison’s book?
This paper makes several important contributions. First, the bibliometric analysis contributes to the growing literature of articles summarizing the achievements and trends in research fields over long periods of time. Identifying the citation structure, origins and evolution of the main topics addressed, as well as the main sources used by authors citing Aitchison’s research, will help us to determine intellectual connections in academic fields that use CoDA in their field of research (Koseoglu et al., 2019; Köseoglu et al., 2019; Shafique, 2013). In respect of this, mapping intellectual connections aids the creation of new theories and the development of existing theories, providing a glimpse of future directions that scientific research may take (Köseoglu et al., 2021). Thus, conducting an academic analysis of such developments can help researchers identify the potential impacts theories may have on society. In addition, these processes also provide valuable information for both academics and practitioners (Torraco, 2016), by providing them with a study that helps to have an outline of the status quo of CoDA, especially for those who are not very familiar with CoDA but are interested in it (Jiang & Fan, 2022). Second, the bibliometric study provides a comprehensive picture of specific research fields and allows researchers to focus on unique areas to add new results and knowledge to the literature (Ghorbani et al., 2021). Then, this paper contributes to a better understanding of the current status, development and future lines of research in the field, supporting researchers and other experts in identifying research areas, and selecting the most appropriate journals to publish their own findings (Sajovic & Boh Podgornik, 2022). Thirdly, the research conducted delves into intellectual connections across a large body of research covering various fields related to geoscience. Fourthly, the work encompasses a long time horizon, allowing researchers to obtain a complete picture of the field addressed, as well as its evolution. Fifthly, the document citation networks are analyzed and reference citation bursts detected in order to provide information on research topics and assess trends over time from different perspectives, which will be of great use for future research; in other words, this study helps to provide an orchestration of knowledge in the field. And finally, the present work focuses on documents that have passed the strict refereeing process, meaning that the results obtained are highly reliable.
Bibliometrics and Social Network Analysis
Bibliometrics entails the quantification of academic production based on certain classifications that project indirect indications on its perception (Huang et al., 2019). Multiple definitions of the term exist, although the modern version is usually attributed to Alan Pritchard. Pritchard (1969) defined bibliometrics as “the application of mathematical and statistical methods to books and other means of communication.” More recently, other authors have provided further definitions, however. For example, Zupic and Čater (2015) posited that bibliometrics constitutes a tool for evaluating the evolution of research areas based on social, intellectual, and conceptual structures. Therefore, we can assume bibliometrics is a discipline that aims to evaluate and map scientific progress through classification using statistical techniques (Diodato, 1994; Jappe, 2020; McBurney & Novak, 2002).
As for methodology, bibliometrics focuses exclusively on measuring publications. However, the term “publication” is relatively ambiguous, since, among other documents, it may include book chapters, journal articles, and proceedings in conference volumes. Therefore, before starting a bibliometric research project, it is important to clearly define what is being measured and what type of publication should serve as the basis for the bibliometric analyses to be carried out, since bibliometrics should provide information about all the key components of a research project.
Bibliometric analysis is a fundamental statistical instrument for analyzing the state of knowledge in a given scientific area, given that it measures the number of documents published and the number of citations received for those documents. In addition, bibliometrics allows the results of the analysis to be mapped through spatial visualization of the findings with respect to the structure and dynamics of scientific fields (Boyack & Klavans, 2014; Zyoud & Fuchs-Hanusch, 2020). Its main objective is to create a representation of the network structure of a research field that highlights the connections between the main journals, publications, etc. and the topics and other key features of the analyzed field (Bruns et al., 2020; Gumpenberger et al., 2012; Vogel, 2014).
A further aim of bibliometrics is to evaluate the quality of research (Bornmann & Leydesdorff, 2014; Segura-Robles et al., 2020). At present, two main methods are used to this end: a qualitative (by peers) and a quantitative review (bibliometrics) (Feng, 2020). In this respect, the former includes particular, non-quantifiable evaluations made by experienced experts, while the latter considers a publication to be more relevant the more citations it receives.
Today, new alternatives to the classic “citation” have emerged to assess the importance of a scientific document, such as libmetrics and altimetrics. Libmetrics establishes a connection between the importance of a scientific article or book and its availability in a library by measuring how often it is acquired or borrowed from the library, for example. Altimetrics generates new knowledge by combining all of the data available online and applying big data technologies. This allows for bibliometric approaches to focus on correlations rather than causalities, since it should permit the analysis of new connections that have not previously been weighted or questioned. These alternative bibliometric methods are based on free online content, most of which is taken from social networks, which complement the data offered by bibliometrics based on conventional databases, such as the Web of Science (WoS) or Scopus.
Literature Review
Analyzing an academic discipline or scientific field is common practice nowadays, since it helps researchers develop new theories and journal editors foresee research trends (Gatrell & Breslin, 2017; Post et al., 2020; Torraco, 2016; Webster & Watson, 2002). There is therefore a growing interest in and demand for investigations into the intellectual structure of research areas or scientific fields in order to highlight progress in this regard (Kunisch et al., 2018; Torma & Thøgersen, 2021).
Bibliometrics, the main objective of which is to measure scientific output (Wang et al., 2019), emerged in the early 20th century, when psychologists began to collect statistics on publications related to their field of research (Godin, 2006). However, it was the exponential growth in academic publications in the 1950s that first saw American chemist Eugene Garfield begin to evaluate and carry out systematic counts of publications based on the literature used and cited.
One application of bibliometric methods is their use as a tool to evaluate any research that has been conducted (Bornmann & Leydesdorff, 2014; Karakus et al., 2021; Moral-Muñoz et al., 2020). This is the easy part of bibliometrics, since it provides direct information and does not require assumptions for its production. Trying to assess the quality and importance of published papers is a much more complex and less obvious task, however. Researchers have essentially used two methods to carry out this type of analysis of an academic field. The first comprises a qualitative evaluation by researchers (Lopes & Martins, 2021; Zupic & Čater, 2015). This method has several drawbacks, among which can be highlighted its subjectivity and a lack of transparency, which negatively impacts on its reliability and validity (Cook et al., 1997; Szomszor et al., 2021). The second is to conduct bibliometric analyses, and more specifically, analyzing the co-citation of documents (Lopes & Martins, 2021; Zupic & Čater, 2015), which entails identifying the intellectual structure of a scientific field by means of mathematical and statistical methods (Culnan, 1986; Hou et al., 2018). This second approach is the one most used by researchers (Hota et al., 2020; Lampe et al., 2019; Zhao et al., 2018), since it allows the tracking of practically all aspects of scientific collaboration networks (Vasilyeva et al., 2021; Ye et al., 2013).
We can therefore state that bibliometrics is a discipline that aims to assess and map the progress made in scientific fields through the classification of data. This entails, among other methods, the use of statistical techniques to analyze research performance by individuals, institutions, countries, mapping the structure of the analyzed field, etc. (Karakus et al., 2021).
The discipline has since evolved and is now used to evaluate the impact of publications, journals, authors and institutions in order to determine patterns of influence (Biemans et al., 2010; Clark et al., 2014; Post et al., 2020; Sarin et al., 2018).
Bibliometric documents have expanded into several fields (Butt et al., 2021), including accounting (Merigó & Yang, 2017), computer science (Chen et al., 2020; Garousi & Fernandes, 2017), energy (Liu et al., 2020), ecology (Jankó et al., 2017; Zhang et al., 2017), health care sciences services (He, Fang, Chen, et al., 2020; He, Fang, Wang, et al., 2020), hospitality (García-Lillo et al., 2016), medicine (Fan et al., 2020), tourism (Mulet-Forteza et al., 2019), and social media (Leung et al., 2017).
There are also bibliometric works that, rather than focusing on a specific field, analyze the publications of a particular country, institution or author. Thus, for example, Salisu and Salami (2020) analyzed publications by Nigerian authors between 1901 and 2016, Ahmad et al. (2020) analyzed the performance of publications by the University of the Punjab, and Haustein and Peters (2020) analyzed the publications made by the researcher Judit Bar-llan. These are just a few examples of these types of bibliometric analyses. To the best of our knowledge, however, no bibliometric work has yet been carried out that focuses on one specific source in the bibliometric literature, making this paper a starting point for future bibliometric studies.
The results of bibliometric studies provide very useful information for policymakers and academic decision-makers in universities, research centers and governments, as they are considered reliable and relevant sources of results, and are often used to justify decisions on research policies, job offers and promotions, as well as to direct and support research projects (Bornmann & Leydesdorff, 2014; Gatrell & Breslin, 2017; Gläser & Laudel, 2015; Post et al., 2020). In addition, both public and private research funding agencies often ask researchers to either provide certain indications of quality to fund their research or to demonstrate that the research to be carried out has the potential to impact society (Bornmann, 2014; Brueton et al., 2014; Smits & Champagne, 2020). By way of example, in the United Kingdom bibliometrics has been considered for assessing the quality of research output within the country’s framework for research excellence. Finally, we would like to point out that bibliometrics can also help journal editors evaluate past publications, design new policies and make future editorial decisions.
Methodology
The statistical data used in this paper were compiled from the WoS database in November 2019. According to Merigó et al. (2015), the WoS comprises information from over 15,000 sources and 50,000,000 documents ranked according to over 250 categories and 150 research areas. It is widely considered to be the most influential in the world.
The bibliometric data used in this work were obtained as follows. First, the “Cited Reference Search” option in the WoS database was used. Subsequently, the “Cited work” option was selected and the following text entered: “The Statistical Analysis of Compositional Data.” The search yielded a total of 69 records, which were reduced to 58 once the records that do not make specific reference to the paper “The Statistical Analysis of Compositional Data by Aitchison (1986)” were eliminated. The 11 deleted records, seven primarily referred to another work by Aitchison (1982), in which he first introduced the concept of Compositional Data Analysis, although in a much shorter form and with less impact than the work published in 1986. The remaining four deleted records referred to another work by John Aitchison entitled “Logratios and natural laws in compositional data analysis,” published in the journal Mathematical Geology in 1999. Once these 11 records had been eliminated, the remaining 58 were selected, which do make explicit reference to the paper “The Statistical Analysis of Compositional Data by Aitchison (1986).” These 58 records returned a total of 2,636 papers that had cited Aitchison’s 1986 book. Finally, the number of documents was reduced to 2,426 after limiting the search to only those that had passed a strict arbitration process, including papers, reviews and letters (Merigó et al., 2019).
In the following step, the option “Cited work” was selected and the following text entered: “The Statistical Analysis of Compositional Data.” In this paper, we have considered a wide range of bibliometric methods to represent the bibliographical data analyzed. First, we considered the number of publications and citations, which are the most popular methods according to Ding et al. (2014). Whereas the number of citations generally measures influence, productivity is measured by the number of documents (Svensson, 2010). Another indicator we used here refers to the most influential keywords (Mulet-Forteza et al., 2019).
The VOSviewer Software was used to map the consulted bibliographic data for co-occurrence of author keywords and co-citations (Van Eck & Waltman, 2010, 2014). Such maps allow several aspects of a scientific field to be monitored (Noyons et al., 1999; Su et al., 2019), providing a clearer view of the results obtained (Merigó et al., 2016). Keyword co-occurrence refers to the most common keywords used to develop a research field or a scientific document (Callon et al., 1983; Ding et al., 2001; Huang et al., 2019; McCain, 1986, 1991; Zhang et al., 2019), while co-citation assumes that there is some kind of association between two documents jointly cited by a different third one (Boyack & Klavans, 2014; Hoque et al., 2021; McCain, 1990; Ramos-Rodríguez & Ruíz-Navarro, 2004; Small, 1973).
Before we could perform the graphical analysis with the VOSviewer Software, we had to clean up the data collected from the WoS. In order to carry out the co-citation analysis of journals, the names of the journals with different designations had to be unified. By way of example, data appearing under the names “j roy stat soc b” and “j roy stat soc b m,” those appearing under the names “soil sci” and “soil sci s,” or those appearing under the names “behav ecol” and “behav ecol s” were unified under the same name. Journals that changed their names during the period were also unified, such as “Mathematical Geology,” which changed its name to “Mathematical Geosciences” in 2008.
The same had to be done in the co-occurrence of author keyword analysis. In this case, keywords that appeared simultaneously in the singular and plural, such as “stream sediment” and “stream sediments” or “ternary diagram” and “ternary diagrams,” keywords that appear with or without a hyphen, such as “particle size distribution” and “particle-size distribution” or “isometric log-ratio transformation” and “isometric logratio transformation,” and keywords that are written differently in American and British English, such as “foraging behavior” and “foraging behavior,” had to be cleaned.
The combination of methods used in this paper allowed us to collect data using “full counting” and “fractional counting” methods. With the former, a publication co-authored by several researchers is assigned to each researcher with a full weight of one, while the “fractional counting” method (VOSviewer software) divides the authorship of the document among the number of authors (Mulet-Forteza et al., 2019). In this regard, it should be borne in mind that developing bibliometric networks is not a trivial process and, depending on how this is done, they can yield very different results, as Perianes-Rodriguez et al. (2016) showed for the case of journal network analysis. These authors argued in favor of the “fractional counting” method for producing bibliometric maps of journals, based on the fact it awards the same influence to each reference cited in a publication.
Thus, they considered it more reasonable to use analyses based on the idea of treating each reference cited in a publication as equally representative, as is the case using the aforementioned “fractional counting” method. Although this justification seems plausible to us, as far as we are aware, the reality is that researchers have traditionally preferred to use the “full counting” method in their bibliometric map analysis. That being said, it is not our intention to take a position in favor of one method or the other here, and we therefore provide the results using both methods, which will allow us to compare the results obtained by both systems of data collection.
Bibliometric Study of Aitchison’s (1986) Book
We will now address the different research questions posed in our paper.
Academic Structure of Aitchison’s (1986) Book
Regarding the first question (RQ1), Figure 1 presents the evolution of citations received by Aitchison’s 1986 book.

Annual number of citations received by Aitchison’s 1986 book.
The above figure shows how Aitchison’s 1986 book entitled “The statistical analysis of compositional data” has received citations uninterruptedly since 1994. It also shows how different periods can be distinguished for the number of citations received. Thus, between 1994 and 2007, with ups and downs, the average annual number of citations remained at around 42. However, from 2008 onward, the annual number of citations generally increased each year, with notable jumps in 2011 and 2015. This increase in the number of citations received by the book is due, among other aspects, to the significant increase in the number of publications on the theme of “Compositional Data Analysis” in fields related to “geoscience” over the past few years and is likely to continue in the future.
We have delved into the reasons for citing Aitchison’s 1986 book. First of all, we downloaded the documents citing this reference. In this regard, we would like to point out that we only had access to 1,529 of the 2,426 references citing Aitchison’s 1986 book, which represents 63% of the total. The remaining 897 references could not be analyzed because the databases of the universities of the different authors who have written this document do not have access to all the journals. Even so, we consider that we have analyzed a significant percentage of references that validate the comments made above.
Table 1 shows where Aitchison’s 1986 book has been cited in the paper.
Sections of a documents in which the Aitchison’s (1986) book has been cited.
Source. Authors.
Note. The same document can cite the Aitchison’s (1986) book in two or more sections. TP = total papers.
Table 1 shows how most of the documents citing Aitchison’s 1986 book have used CoDA in their methodologies. Specifically, 35% of the documents analyzed cite Aitchison’s 1986 book in the methodology, while 16% do so in the results and discussion section. It is also noteworthy that 28% of the analyzed papers cite it in the literature review, while only 12% and 9% cite it in the introduction and conclusion, respectively.
It is also interesting to carry out a temporal analysis to determine if there is a period of time after which CoDA methodology has started to be used effectively in the papers (Table 2).
Sections of a document in which the Aitchison’s (1986) book has been cited. Temporal evolution.
Source. Authors.
Note. The same document can cite the Aitchison’s (1986) book in two or more sections.
Table 2 shows how during the first period analyzed (1994–2000) the citations obtained by Aitchison’s 1986 book were concentrated in the introduction of the documents that cite it. On the other hand, during the period 2001 to 2010 it can be seen how these are distributed, in percentage terms, in a similar way between the literature review, methodology and results, and discussion sections, although the conclusion section is the one which, in percentage terms, has the highest number of citations. Finally, during the period 2011 to 2019 it can be observed, also in percentage terms, how the conclusion section loses weight when citing Aitchison’s 1986 book, while the rest of the sections increase their percentage when citing this document. All this shows that, during the last period analyzed, the CoDA is analyzed both from a literature review and from the use of this methodology, which indicates that this technique already enjoys a notable maturity and scientific applicability.
Main Journals Citing Aitchison’s (1986) Book
In this section, we will address the second question (RQ2) posed in our paper. Firstly, Table 3 shows the main journals to have most cited Aitchison’s (1986) book.
Main journals that have cited Aitchison’s (1986) book.
Source. Authors, WoS database, 1986 through November 2019.
Note. The records of the journals that have changed their name during the analyzed period have been unified under the most recent name of the journal, such as “Mathematical Geology” which changed, in 2008, its name to “Mathematical Geosciences.” R = ranking; TP = total papers; TLS = total link strength.
Table 3 shows how most of the documents that have cited Aitchison’s (1986) book are published in the Journal of Geochemical Exploration and Mathematical Geosciences, followed by Plos One and Applied Geochemistry. Table 3 also shows how the aforementioned journals, together with Ecology, Geochimica et Cosmochimica Acta, Journal of Chemical Ecology, Chemical Geology, and Evolution International Journal of Organic Evolution are the ones that present the most important strength of connections. In this regard, Figure 2 provides further details of the 500 most important connections occurring between the journals that have cited the book. Table 3 also shows that nine journals on the list are directly related to the field of “geoscience.” Specifically, here we are referring to the journals “Mathematical Geosciences,” “Computers Geosciences,” “Archaeometry,” “Journal of Archaeological Science,” “Palaeogeography Palaeoclimatology Palaeoecology,” “Catena,” “Quaternary International,” “Quaternary Science Reviews,” and “Journal of Quaternary Science.”

Co-citations of journals that have cited Aitchinson’s (1986) book. Citation threshold of 50 and showing the 500 most representative co-citation connections.
A map was conducted to reflect the main relationships established between the journals citing the book. In addition, Figure 2 shows the 500 main co-citation links between the principal journals citing the book.
Figure 2 reveals six main clusters, each represented by the same color. Larger clusters include a greater number of journals that have cited Aitchison’s book. The distance between two clusters shows the relationship of the clusters in terms of citations, where the clusters located close to each other tend to be related, and vice versa. Within one cluster, the size of a circle represents the number of times a journal has cited the book, larger circles therefore indicating journals that have cited it a greater number of times. The thickness of the curved lines between the clusters represents the number of citations between two journals, whether they belong to the same cluster or not. And finally, the name of each circle (or label) indicates the name of the journal. In this regard, it should be noted that the VOSviewer Software aims to avoid overlapping labels, meaning that the labels are not visible for some journals in Figure 2. The above description also applies to Figures 3 to 6.

Co-citations of journals that have cited Aitchinson’s (1986) document. Citation threshold of 250 and showing the 50 most representative co-citation connections.

Co-citations of journals that have cited Aitchinson’s (1986) book. Red, yellow and lilac clusters. Citation threshold of 250 and showing the 50 most representative co-citation connections.

Co-citations of journals that have cited Aitchinson’s (1986) book. Green and blue clusters. Citation threshold of 250 and showing the 50 most representative co-citation connections.

Co-ocurrence of keywords used by authors who have cited Aitchison’s (1986) book. Citation threshold of 10 and showing the 200 most representative co-citation connections.
The first cluster in Figure 2, in red, comprises 95 journals indexed mainly under the WoS categories “Geochemistry & Geophysics” and “Chemistry.” What the research carried out in these categories has in common, among other aspects, is that it usually considers a large number of variables in its analysis. In these cases, the CoDA methodology improves the results obtained from these analyses, as the proportionality features of abundance data are fully taken into account, thereby enhancing their relative multivariate behavior (Buccianti et al., 2015). These particular features made these fields pioneers in applying statistical methods based on CoDA applications, especially by members of the International Association for Mathematical Geosciences. The second cluster, in green, is composed of 94 journals indexed mainly under the WoS categories “Multidisciplinary Sciences” and “Ecology.” These fields are similar to those of the first cluster, as they are ones in which studies of different species abound and in which percentages are widely used to infer the ecological preferences found among species. In this case, CoDA allows for the elimination of inconsistencies that occur when determining percentages, which later become false correlations (Guerreiro et al., 2015).
The third cluster, in dark blue, comprises 54 journals indexed mainly under the “Statistics & Probability” and “Mathematics” categories. Logically, these categories form the central axis of Figure 2, CoDA applications being very useful in these fields for eliminating all kinds of mathematical and statistical inconsistencies that can be caused by working with percentages. The fourth cluster, in yellow, is composed of 45 journals indexed basically under the categories “Soil Sciences” and “Environmental Sciences.” The fifth cluster, in purple, is composed of 39 journals indexed mainly under the category “Ecology.” And finally, the last cluster, in blue, is composed of 33 journals indexed mainly under the category “Zoology.” These last three clusters maintain certain characteristics similar to the first two clusters described above, hence their widespread use of CoDA methodology. Therefore, we observe that Aitchison’s 1986 book has received citations in a multitude of WoS categories, although most are fields related to “geoscience.”
Since Figure 2 is very difficult to read, given the large number of journals appearing in it, we have produced a new one, Figure 3, which presents the same results, but mapped with a higher threshold in order to observe the journals that cite the book in greater detail.
In this case, Figure 3 is composed of five main clusters, which are further detailed in the following two figures.
Figure 3 distributes the journals among five clusters, the first consisting of 25 journals, the second 18, the third 11, the fourth ten and the third nine. This more detailed view provided by the previous three figures, especially Figures 4 and 5, allows us to take a closer look at the main journals to have published most work based on the CoDA methodology. This is naturally of great help to researchers who use this methodology in their publications, as they are able to relatively easily identify potential journals in which to publish their research, as well as ones they should consult to find out the recent directions taken by research based on the CoDA methodology in their fields of study. In this case, the categories most represented in the previous figure are as follows (in this order): “Geosciences, Multidisciplinary,” “Ecology,” “Geochemistry & Geophysics” and “Statistics & Probability.” Specifically, 23 journals are indexed under the first category, 19 in the second, 16 in the third and nine in the fourth. This reveals how the field of “geoscience” has become the main one to use the CoDA methodology, leading to the spread of a mathematical theory of great academic impact. All of this is evident from Table 4, which was compiled using the information available in Figure 2.
Main WoS categories that have cited Aitchison’s (1986) book.
Source. Authors based on the WoS database and the VOSviewer Software.
Note. The same journal can be indexed in two or more WoS categories.
Main Topics Citing Aitchison’s (1986) Book
The third research question will be addressed in this section, since here we will analyze the main topics of the most relevant papers published by authors who cite Aitchison’s (1986) book. Figure 6 shows a co-occurrence of keywords in the papers citing the book.
Figure 6 reads identically to the previous ones, with the following differences: in this case, the size of a term reflects the number of times the term has been cited in publications citing the book, and the distance indicates the strength of the relationship between the terms. Colors indicate groups of nearby terms in relation to co-occurrences. Finally, the strongest relationships are indicated with curved lines.
Figure 6 shows an analysis of keywords and their possible connection. Having analyzed the number of keyword occurrences, we observe that Figure 6 has nine keyword clusters. Clusters 1 and 2 are the most numerous and have 12 keywords each. In the first cluster, the words “principal component analysis,” “log-ratio” and “simplex” stand out. In Cluster 2, the most important words are “soil,” “microbiome” and “fatty acids.” Clusters 3 and 4 also coincide in terms of number, having 10 keywords each. Cluster 3 ranks third in number of citations, and has “cuticular hydrocarbon,” “sexual selection” and “mate choice” among its most important keywords. In Cluster 4, which is secondary, the words “multivariate,” “cluster analysis” and “geostatistics” stand out. Although this cluster has a high number of words, the total occurrence is relatively low. As for Cluster 5, it has nine keywords, with the main words being “geochemistry,” “provenance” and “statistics.” Cluster 6 has five keywords; despite the smaller number of words, it becomes a core cluster, since it has the term with the highest number of occurrences, “compositional data analysis.” As for Cluster 7, we find four keywords, with “habitat selection” the most important of these. Cluster 8 has three keywords, “Aitchinson geometry” being the most important, and finally, Cluster 9 has two keywords, the most important being “hymenoptera,” although only a small difference is observed between the latter two. Figure 6 also shows how many of the keywords originate from the “geoscience” field, since many of them are related to concepts directly linked to this field and refer, in turn, to terms related to “biodiversity,” “geodiversity,” “geoheritage” and “georesources” (Thomas, 2016).
Since it is difficult to observe the strength of the links between the main keywords in Figure 6, Table 5 below provides more detail of both the number of occurrences and the strength of the links among the main keywords represented in Figure 6.
Most common author keywords occurrences in journals that have cited Aitchison’s (1986) book.
Source. Authors, WoS database, 1986 through November 2019.
Note. R = ranking.
Finally, Figure 7 shows the average year of publication of the keywords appearing in Figure 6.

Co-occurrence of author keywords who have cited Aitchison’s (1986) book. Citation threshold of 10 and showing the 200 most representative co-citation connections.
If we take the years in which they are cited into account, we see how most of the keywords originated from the year 2010, coinciding with the period with the largest number of citations of Aitchison’s text. The most recent keywords that appear in Figure 7 are “drosophila serrata,” “habitat use” and “machine learning,” which shows that the book continues to generate new investigations in the “geology,” “ecology,” and “geosciences” fields, which are the research fields that initially implemented the CoDA methodology. It is therefore to be expected that many publications based on this methodology will continue to be generated in these research fields.
Conclusions
In this paper, we have carried out a bibliometric analysis of all the publications that have cited the book entitled “The Statistical-Analysis of Compositional Data” published by John Aitchison in 1986.
We have addressed all of our established aims. With regard to the first research question (RQ1), we have analyzed how the citation structure of this work has evolved. Our analysis reveals that although the work has received citations uninterruptedly since its publication, the number of citations has increased very significantly over the past 4 years. The temporal analysis also revealed that in recent years CoDA has been mostly used in a practical way in scientific papers, although it is also true that during the period 2011 to 2019 the theoretical formulation of this methodology has started to be discussed. In reference to the major journals citing Aitchison’s book (RQ2), we observe that most are indexed under the WoS categories “Geosciences, Multidisciplinary” and “Ecology.” Of these, “Mathematical Geosciences” and “Computers Geosciences” stand out. The journal co-citation network clarifies the distribution of core journals. With regard to the third research question (RQ3), our results show how the keywords in the main papers to have cited the book correspond to the year 2010, coinciding with the period that had the greatest number of citations.
Our study presents several findings that allow us to understand the evolution and advances that are taking place in the CoDA field through an analysis of the citations received by Aitchison’s (1986) book, a seminal text in both the field of geoscience and modern science. Firstly, the present work paints a collective picture of the academic structure cited in the book. Secondly, there has been a significant increase in the number of articles citing it since 2010. In analyzing these publications, we have confirmed a high level of collaboration between different research fields, which has allowed application of the CoDA methodology to spread significantly through different academic fields. Thirdly, our analysis reveals important relationships between the main journals to cite the book and others indexed in the most prestigious quartiles in the fields related to “Geochemistry & Geophysics,” “Chemistry,” “Multidisciplinary Sciences,” “Ecology,” “Statistics & Probability,” “Mathematics,” “Environmental Sciences,” “Zoology,” but especially in the field of “geoscience,” thus demonstrating the multidisciplinary nature of research using CoDA.
The co-occurrence of keyword analysis has identified research topics that have not yet been widely developed, as well as research trends that will prove useful as the amount of literature under analysis increases (Law et al., 2019). Lamberton and Stephen (2016) stated that the periodical review of the state of research makes it possible to map out the next stages of research in an innovative, relevant and rigorous way. In this regard, we believe the information contained in this paper will prove very useful for academics, since it provides them with a snapshot of the directions of foreseeable research in CoDA in the various fields that make most use of this methodology. This paper will also report on the most researched domains within the CoDA framework, thus enabling researchers to identify research gaps that will need to be filled by further studies in the future (Faruk et al., 2021). The theoretical implications of this study provide an overview of the literature on the development of CoDA research worldwide through a bibliometric analysis. This allows identification of the components of the main concept. In addition, the level of growth of the research conducted historically, the concurrence of the keywords across clusters, the leading journals and scientific collaboration on this topic of study are determined. On the other hand, the main practical implications of this study fall directly on CoDA researchers, teachers, and students. Furthermore, the methodology used in this study can be used to obtain similar results in other contexts (Quintero-Quintero et al., 2021).
The methodology used in this study presents several advantages. First, the paper shows a network map of the related journals. This allows for a more convenient tracing of the initial theoretical roots and historical context of the field. Furthermore, the keyword analysis approach adopted through temporal evolution allows researchers to follow the development of CoDA in the literature, providing opportunities to expand the current body of research (Liu et al., 2022) enabling the generation of new approaches, as well as the identification of future research trends. In summary, we have identified the main research fields and topics cited in Aitchison’s (1986) book, these being addressed through a multidisciplinary and interdisciplinary approach (Altinay & Taheri, 2019).
Our document contributes to the body of relevant literature by systematizing the CoDA literature through the application of VOSviewer software as a visualized analytical tool for bibliometric analysis, providing valuable references for researchers wishing to delve deeper into this area of knowledge. In addition, it reflects CoDA’s network maps and information tables in a more comprehensive way, providing a clear orientation to follow the development and then recognize emerging trends. Thirdly, it shows the most influential journals in the discipline, allowing researchers to perform precise journal searches. Finally, it can also guide scholars on how to approach a study involving knowledge mapping with the applicable analytical element of publications (Liu et al., 2022).
This paper has some limitations. The first concerns the database used to carry out the study, that is, the WoS database. For example, the WoS does not include all academic journals, and therefore journals included in other databases, such as those included in the “Emerging sources citation index,” have not been considered. Another limitation of this database is that it uses a “full counting” method to collect data. In order to resolve this limitation, our research also incorporated the “fractional counting” method, using the VOSviewer software to detect the co-occurrence of author keywords and co-citations of journals. The third limitation is that not all documents indexed in the WoS were considered: only those subject to a strict process of arbitration. A further limitation is that the results are dynamic and will inevitably change over time. Despite these limitations of our analysis, we consider that this paper can be regarded as an overview of the relationships that occur between Aitchison’s book and the geoscience field, as well as modern science, expanding on what is already known about the beginnings of CoDA in journals (Navarro et al., 2021).
This document also provides a starting point for future studies, as our results can be complemented by those obtained in other journals that choose to include journals appearing in the “Emerging sources citation index,” as these journals offer less experienced researchers a good opportunity to publish their results. This can also lead to the development of emerging themes and new research trends (Mulet-Forteza et al., 2019) in the field of CoDA through the comparing of results, which, in turn, can lead to the development of new conceptual frameworks (Mulet-Forteza et al., 2021). It may also prove interesting to repeat this work using several databases, and not only the WoS, in order to compare the results obtained from the use of several databases. In addition, such research could highlight some of the limitations of the different databases, some of which are discussed in this paper. Finally, it is also proposed that a literary review be conducted of the main documents that cite Aitchison’s (1986) book in the main scientific fields highlighted in this paper, as this will provide very relevant information on how scientific research in CoDA may progress in these fields.
Footnotes
Declaration of Conflicting Interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This work was supported by the Spanish Ministry of Science, Innovation and Universities/FEDER (grant number RTI2018-095518-B-C21).
