Sage Journals: Discover world-class research

Abstract

The stimuli presented in cognitive experiments have a crucial role in the ability to isolate the underlying mechanism from other interweaved mechanisms. New ideas aimed at unveiling cognitive mechanisms are often realized through introducing new stimuli. This, in turn, raises challenges in reconciling results to literature. We demonstrate this challenge in the field of numerical cognition. Stimuli used in this field are designed to present quantity in a non symbolic manner. Physical properties, such as surface area and density, inherently correlate with quantity, masking the mechanism underlying numerical perception. Different generation methods (GMs) are used to control these physical properties. However, the way a GM controls physical properties affects numerical judgments in different ways, compromising comparability and the pursuit of cumulative science. Here, using a novel data-driven approach, we provide a methodological review of non symbolic stimuli GMs developed since 2000. Our results reveal that the field thrives and that a wide variety of GMs are tackling new methodological and theoretical ideas. However, the field lacks a common language and means to integrate new ideas into the literature. These shortcomings impair the interpretability, comparison, replication, and reanalysis of previous studies that have considered new ideas. We present guidelines for GMs relevant also to other fields and tasks involving perceptual decisions, including (a) defining controls explicitly and consistently, (b) justifying controls and discussing their implications, (c) considering stimuli statistical features, and (d) providing complete stimuli set, matching responses, and generation code. We hope these guidelines will promote the integration of findings and increase findings’ explanatory power.

Keywords

numerical cognition numerosity physical properties generalizability reproducibility comparability interpretability open data open materials

Cognitive research aims to understand distinct cognitive capacities. Controlling experimental stimuli ensures their relevance to the studied capacity, regardless of the modality used, and their ecological validity (Bays et al., 2009; Blandford et al., 2008; Boring, 1954; Eriksen, 1980; Pelham & Blanton, 2007; Rangelov et al., 2022; Torday & Baluška, 2019; Waskom et al., 2019). Stimuli control is a cornerstone of experimental control, helping cognitive scientists to ensure that their results are not artifactual. Building on the foundational concept of stimuli control, we elucidate its critical role in studying visual numerical perception (numerosity). This data-driven methodological review focuses on methods to generate controlled non symbolic numerical stimuli for numerical comparisons. Yet this review’s conclusions apply to many other fields and tasks involving perceptual decisions.

Stimuli Control in Non Symbolic Numerical-Comparison Studies

Recent numerical-comparison studies have struggled to control the physical properties of non symbolic arrays and the physical properties’ correlations with the array’s quantity. An array of items can be described by its number of items or by its physical properties. The physical properties of any item array have a natural and inherent correlation with the array’s quantity (Dehaene, 1997; Leibovich et al., 2017; Mehler & Bever, 1967). Increasing or decreasing the number of items necessarily changes at least one of the array’s physical properties (see Fig. 1 and De Marco & Cutini, 2020; DeWind et al., 2015; Leibovich & Henik, 2013; Zanon et al., 2021). Consider three large apples lying on the kitchen counter. Adding a fourth large apple would increase the total surface area and circumference of the apples’ stack (array). If the fourth apple is larger than the original three apples, the average array’s surface area circumference, diameter, and so on would also increase. The addition of the fourth apple would also reduce the average distance (i.e., interdistance) between the apples. Changing the number of apples in the stack would also change density, the perimeter of the polygon surrounding all the array’s items, and so on. The physical properties discussed are only part of the properties that could be considered (see full list in Fig. 5). Importantly, property changes cannot be linearly predicted from changes in numerosity alone.

Fig. 1.

The natural association between physical properties and quantity is not a simple linear correlation.

Early studies have shown an association between performance in numerical judgments and the physical properties of the presented stimuli (French, 1953; Frith & Frith, 1972; Piaget, 1968). In the contemporary study of numerical cognition, the physical properties of a numerical array are considered in two opposing manners. Some treat them as a biasing factor, distracting researchers from the thing they want to study—the ability to judge numerosity (e.g., Halberda et al., 2008¹; Halberda & Feigenson, 2008; Piazza et al., 2004²). Others treat them as a key to understanding numerical capacity (Gebuis & Reynvoet, 2011a, 2011b; Leibovich et al., 2017). Regardless of their view of numerical cognition, researchers must address the natural correlation between the array’s physical properties and its quantity to prevent confounds. Importantly, there are abundant ways to create non symbolic numerical stimuli relying on different theoretical and methodological prisms (DeWind et al., 2015; Gebuis & Reynvoet, 2011a, 2011b; Piazza et al., 2004; Salti et al., 2017). The different ways stimuli are created change the correlations between numerical and physical properties. The different correlations between numerical and physical properties bias behavioral and neuronal results (Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis & Reynvoet, 2011a, 2011b; Kuzmina & Malykh, 2022; Leibovich et al., 2017; Salti et al., 2017; Smets et al., 2015). The different correlations between numerical and physical properties make it harder to compare different studies (Clayton et al., 2015; Smets et al., 2015) and limit the ability to examine previous studies through new theoretical prisms.

Non symbolic numerical stimuli are multifaceted and can be categorized according to different taxonomies, for example, according to a global-local categorization (e.g., Guy & Medioni, 1993; Navon, 1977) or intrinsic–extrinsic categorization (DeWind et al., 2015; Gebuis & Reynvoet, 2011a, 2011b; Salti et al., 2017). Individual physical properties can also be defined in multiple ways. For example, there are six different definitions for “density” in the literature (see Qualitative Analysis section). The diverse ways to address, control, manipulate, and analyze physical properties make it harder to achieve a common scientific ground (Stalnaker, 2002). The lack of common ground hinders efficient comparisons of results coming from different studies that used different methods to generate stimuli. Hereafter, we denote the generation methods to generate non symbolic numerical stimuli as GMs. A GM is defined as a unique algorithm for creating visual non symbolic stimuli according to predefined numerical and physical parameters. Different GMs use different experimental controls and manipulations that produce different correlations between numerical and physical properties (Gebuis & Reynvoet, 2011a; Katzin et al., 2019; Salti et al., 2017). The different correlations might bias the results of studies. Smets et al. (2015) found that two different GMs suggested by Piazza and Dehaene and by Gebuis and Reynvoet led to different Weber fractions and performance accuracy (Gebuis & Reynvoet, 2011a; Piazza et al., 2004). Simply put, the Weber fraction measures the numerical comparison acuity (Chen & Li, 2014; Leibovich et al., 2017; Smets et al., 2015). It is quantified by dividing the smallest (reliably) discriminable difference in quantity between the compared arrays by the quantity of the smaller array, used as a reference. The different correlations also lead to different congruity effects. A congruity effect is quantified by the performance difference (usually in reaction time or accuracy) between incongruent and congruent trials. In the canonical congruity effect, responses are faster and more accurate for congruent trials in which the more numerous array is also larger in its physical properties (e.g., total surface area) and responses are slower for incongruent trials in which the less numerous array is larger in its physical properties (Leibovich et al., 2017; Rousselle et al., 2004). Clayton et al. (2015) also found that two different GMs led to different accuracy and congruity effects. When the stimuli were created using Gebuis and Reynvoet’s (2011a) GM, participants exhibited the canonical congruity effect, whereas the Panamath GM (Halberda et al., 2008; Halberda & Feigenson, 2008) led to the opposite congruency effect. Using stimuli sets created by the Panamath GM, the participants’ accuracy was higher in incongruent trials than in congruent trials. When Clayton et al. (2015) compared the stimuli produced by Gebuis and Reynvoet’s GM to the Panamath GM stimuli (Halberda et al., 2008; Halberda & Feigenson, 2008), they discovered that the relationship between the arrays’ convex hull area, total surface area, and quantity were different between the two different stimuli sets. To conclude, comparing the results coming from different experiments with stimuli generated by different GMs is limited.

The variety of GMs also limits the ability to examine previous studies through new theoretical prisms. For example, Yousif and Keil (2019) suggested that arrays’ additive area, defined as the sum of the items’ height and width, has a role in numerical judgments (see Fig. 5 and Park’s, 2022, critique on the additive area, discussion section). Previous studies have not recorded data on arrays’ additive area, nor have they provided their stimuli. Therefore, it is impossible to test this idea retrospectively because there is no way to extract the relevant information from previous studies. Another example is Maldonado Moscoso et al. (2023), who proposed that symmetrical arrays are processed differently than random arrays. A symmetric array has two identical halves. Previous studies have not recorded data on arrays’ symmetry or provided sufficient details on the items’ spatial arrangement and size, nor have they provided their stimuli. Therefore, it is impossible to test how symmetry affects numerical decisions retrospectively. Another example comes from our own experience. We lately suggested that the shape of the convex hull is important for numerical judgments (Katzin et al., 2019; Shilat et al., 2021). Previous studies have not recorded data on the shape of the convex hull and have not provided their stimuli. Therefore, it is impossible to test this idea retrospectively. Science evolves, and new ideas constantly emerge. To prioritize new ideas, it is best to test them using diverse methods. Testing new ideas on previously published data could help the future scientist and enable more rapid progression.

The Current Review

The purpose of the current review is to create common ground for future studies and to provide tools to test new ideas on already existing data. We start by describing the current state of the field in terms of a common language and then map the different controls employed by the various GMs. For each GM, we examined how the physical properties were defined and how they were controlled. This was followed by a quantitative analysis of the distribution of the frequency of these different controls. Describing the field and mapping it provided a data-driven review of the different GMs.

Method

Procedure overview

We studied GMs of non symbolic numerical stimuli designed to control the physical properties of non symbolic numerical stimuli. We started by identifying and collecting the different GMs. This provided qualitative raw data on the different ways to control physical properties. Human annotators tagged the different controls in each of the identified methods, making the qualitative data suitable for quantitative analysis of the distributions and trends in the GMs’ control over physical properties. Figure 2 presents an overview of the structure and pipeline of this review. We focused on GMs published between the years 2000 and 2021. Focusing on these years allowed a broad and exhaustive look, on one hand, and on the other, narrowed it to relevant GMs used in contemporary research.

Fig. 2.

The structure and pipeline of this review. The workflow of constructing the current review started by creating a database of 32 non symbolic numerical stimuli GMs published since 2000 (see section GMs Database). Analyzing the controls imposed by these GMs allowed us to identify five overarching control types applied to 16 properties (see section Identifying the Controls). Then, we tagged the individual GMs’ controls. We performed a qualitative analysis of the GMs’ provided data types, control definitions, terminology, and reliance on theoretical justification (see section Qualitative Analysis). Building on the tagged data and qualitative analysis, we quantitatively analyzed the GMs’ control. Finally, deriving from our analysis, we provided guidelines for GMs. The inverted-pyramid visualization conveys this review’s data-driven funnel approach (Clewell & Haidemos, 1983; Jonker & Pennink, 2010; Solon, 1980). GMs = generation methods to generate non symbolic numerical stimuli.

GMs database

The identification of the reviewed GMs list started by collecting a list of initial methods based on our knowledge of the literature (N = 9). The initial search was followed by a search for additional GMs (N = 22). We used the names of the publications in the initial methods list as input for searching for additional methods. The additional GMs search was conducted in three different ways, providing data on previous and/or follow-up GMs: (a) Previous GMs were gathered by searching within the references of each of the identified GM’s publications, (b) follow-up GMs were found by performing a search for the citations of the nine initial GMs via the Google Scholar academic search engine (scholar.google.com), and (c) previous and follow-up GMs were searched via the Connected Papers platform (connectedpapers.com)—a free web tool that builds a visual network graph of the articles related to the “origin paper” that was used as the search-query input. The network graph is created by aggregating similar articles according to their semantic similarity and overlapping citations (Kaur et al., 2022). The Connected Papers platform provides data on prior and future (i.e., “derivative”) works related to the origin paper. The graph’s data are retrieved from Semantic Scholar (semanticscholar.org)—a free semantic search engine for multiple academic disciplines (Fricke, 2018; Gusenbauer, 2019; Gusenbauer & Haddaway, 2020; Jones, 2015). Finally, to provide a more exhaustive overview, we expanded our search to include the OSF preprint repository (osf.io/preprints/) and added one more GM preprint to the database. In all, our GMs database included 32 GMs that we have identified.

Provided data

To fully understand the results obtained using different GMs, there is a need to understand the difference between the different stimuli sets and compare them in the context of a theoretical perspective. To gain knowledge on the GMs’ stimuli’s reproducibility and comparability, we tagged the types of data shared by each GM. The tagging process provided information on the GMs’ stimuli sets designs, algorithms, software, stimuli image files, and participant response data.

Identifying controls

Different GMs were designed within different theoretical and methodological frameworks. Accordingly, they approach the problem of property control from different perspectives. The complexity of the different GMs makes it very hard to create a natural-language-processing algorithm that can be used to tag the controlled properties. Moreover, the diversity of contexts in which the data can be tagged makes it impossible to create an a priori set of overarching general tagging rules and requires manual tagging. Therefore, human annotators manually tagged the different controls used by the GMs. The tagging was performed separately by two independent taggers, one coauthor and one research assistant. To keep the process as data-driven as possible, controls were tagged based on explicit statements. All tags were defined as binary categories; a GM was tagged to include a control if it provided statements about that control. We designed a straightforward and parsimonious annotation scheme composed of three steps (see Fig. 3). First, annotators searched for physical properties that were stated by the authors as properties they controlled for. Second, annotators marked the properties’ definitions in GM articles and tested their consistency throughout the text. Then, the annotators searched and tagged the control types used by the GMs and the properties on which these control types were used. A definition was tagged as inconsistent if there was a discrepancy between the property definition within the article and the way this property was controlled for. We did not impose previous definitions on the identified controls and relied on the authors’ property definitions whenever they were available. Each annotator independently read all GM articles, and both compiled an initial list of possible controls. In the three times the annotators disagreed, they discussed the issue and reached a joint decision. Whenever the annotators identified a new control that did not appear in the initial list, they added it to the tagging list if and only if the following two criteria were met: (a) They could not find an equivalent to the given property under a different name, and (b) the property synonyms were not found under a thesaurus entry (Danner, 2014; Lea, 2008). For the results of the tagging process, see the Supplemental Material available online.

Fig. 3.

Annotation scheme for controlled properties. The figure illustrates the annotation scheme that was used for identifying and tagging the experimental control used by the different GMs. GMs’ control was tagged based on the publications themselves and not on a priori assumptions. GMs = generation methods to generate non symbolic numerical stimuli.

Control types

The tight interplay between physical properties and numerosity is demonstrated by how different controls over those physical properties alter measurable behavioral and neuronal responses to numerical judgments (Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis & Reynvoet, 2011a, 2011b; Kuzmina & Malykh, 2022; Leibovich et al., 2017; Piazza et al., 2004; Salti et al., 2017; Smets et al., 2015). Accordingly, it is required to design experiments in which physical properties (e.g., total surface area) will not predict other properties (usually, numerosity). There is also a need to control physical properties such that they will have similar discriminability. The methods to control these factors are denoted as “control types” (N = 5). Trivially, numerical-comparison stimuli comprised two arrays; therefore, we divided the tagged control types into “between-arrays controls” (N = 3), defined as a control on both arrays, and “within-arrays controls” (N = 2), defined as a control for each of the separated arrays. For a discussion of the different aspects of the different control types, see Figure 4. Importantly, a GM can use one or more control types on every controlled property. A GM can also control a property without using any of the control types. A GM can exercise control over a property by using software to determine and/or manipulate the statistical distribution of the property.

Fig. 4.

Control types. A list of the tagged control types, followed by their textual definition and provided rationale. Each control type is illustrated with an example figure and explained in the next column. Control types were divided according to within-arrays control and between-arrays control. See the full-text version for a full-scale view of Figure 4.

Controlled properties

The final controlled properties list included 13 properties (see Fig. 5). Properties were grouped according to the previously suggested categorization of intrinsic and extrinsic properties (DeWind et al., 2015; Gebuis & Reynvoet, 2011b; Salti et al., 2017). Note that Piazza et al. (2004) also discussed intrinsic and extrinsic properties but used the terms “intensive” and “extensive” parameters. Eventually, we used Shilat et al.’s (2021) definition for the intrinsic–extrinsic categorization. Accordingly, intrinsic properties describe information extracted from individual array items. In contrast, extrinsic properties describe information extracted from the array as a whole (Shilat et al., 2021). In case of disagreement between annotators, extrinsic properties were considered relational, and intrinsic properties were considered nonrelational (Lewis, 1983; Marshall & Weatherson, 2023). That is, intrinsic properties are true for items in isolation, whereas extrinsic properties are true for multiple items (Gentner & Kurtz, 2005). Intrinsic and extrinsic properties can be relatively independent, especially when using different-sized items. For instance, increasing the array convex hull area (an extrinsic property) by increasing items’ distance from each other will not affect total surface area (an intrinsic property). The intrinsic–extrinsic categorization reflects refined nuances of stimuli control, indicating an increased focus of a GM on methodological issues.

Fig. 5.

Physical property definitions and alternative terms. The table lists tagged physical properties, textual definitions, and properties’ alternative terms, divided according to the taxonomy of intrinsic and extrinsic properties. Whenever applicable, we also provide the formula defining each property and provide a visual illustration of the property based on the formula. It was less possible to provide a formula or a visual illustration for some properties, mainly those that were defined by only a single GM. For some properties, providing a formula or a visual illustration was not applicable, mainly for properties that were defined by a single GM. Because most GMs use dots, the relevant equations refer to circles. n denotes the number of dots, r denotes the dot radiuses, i denotes the dot indexes, cd denotes candela units (Trezona, 2000), m denotes meters, and x and y denote the cartesian coordinates of the dot centers. We referred to specific GMs if the GMs used a unique definition of a property or if a property appeared in only one GM. The properties are ordered so that similar properties that rely on the same variable or constant are grouped together. GMs = generation methods to generate non symbolic numerical stimuli. See the full-text version for a full-scale view of Fig. 5.

Theory-driven GMs

We wanted to assess GMs’ theoretical rationale. Designing a GM based on a theoretical rationale provides it grounding and comparability. The annotators searched within each GM article for statements explaining the reasoning behind control choices tagged earlier for each GM. We tagged a GM as “theory-driven” if it provides an explicit and direct justification for the GMs’ control choices. The control choices could be applied to properties that are either hypothesized about or recognized as possible confounds that should be controlled. Specifically, theory-driven GMs were required to (a) provide justifications for control choices grounded in prior literature and theory and (b) discuss the implications of the GM’s control choice. A GM was tagged as “non-theory-driven” if its methodology is not informed and reasoned, thereby not providing an underlying theoretical rationale for the GM’s control choices. This operationalization enabled tagging GMs as either theory-driven or non-theory-driven (for the complete list, see the Supplemental Material).

GMs’ definitions and terminology

The definitions of the different properties can be inconsistent or nonexistent. Density, for example, is inconsistently defined in the literature (as suggested by Dakin et al., 2011; De Marco & Cutini, 2020; and others). We chose to focus on density because it is the extrinsic property many GMs have attempted to control for. We analyzed GMs’ views on density, mainly focusing on different density definitions and their consistency. A definition was marked inconsistent mainly if the annotators found a discrepancy between the definition of density and the actual way it was controlled for or if there were several confounded definitions. The main criteria to analyze a GM was if it explicitly stated to control density. We also included GMs that were not explicitly stated to control density but provided a definition of it or described it in a unique context. We mapped the different definitions for density, based on the characteristics of non symbolic arrays, to one or more of the following dimensions: (a) number dimension, the items’ number; (b) item sizing dimension, the items’ surface area; (b) scattered area dimension, the area on which the items are scattered; and (b) distance dimension, the items’ distances.

Results

Qualitative analysis

GMs list analysis

Figure 6 provides an overview of the different GMs (N = 32), GMs’ status of stimuli reproducibility, the properties they controlled (N = 16), and the ways the properties were controlled (i.e., control types: N = 5). The properties were divided according to the intrinsic–extrinsic categorization. We named the GMs according to their respective publication authors. Some of the GM publications were coauthored by authors participating in more than one publication. Four GMs appear without any control types but controlled for one or more properties. Our tagging methodology was based on explicit statements. A GM could have used any of the control types, but our annotation process did not recognize it. Alternatively, GMs can control properties by manipulating their statistical distribution of the property without using any of the control types previously discussed. Note that manipulating properties distribution is not the same as randomly sampling values from a distribution.

Fig. 6.

List of all GMs, their control types, and controlled properties. The figure presents a list of all GMs and the results of tagging their control types, controlled properties, and stimuli reproducibility. The GMs controlled a wide variety of properties and a wide variety of property combinations. Each row depicts a different GM (N = 32). The GMs are ordered according to their respective publication year and alphabetical order. The leftmost column presents data on the GMs’ reproducibility. Whenever a GM provided stimuli reproducibility, an orange rectangle appears. The rest of the columns present data on the GMs’ controls, marked using different colored rectangles. The controls were divided into control types (N = 5) and controlled properties (N = 16). Whenever a control type was used, a green rectangle appears. The controlled properties were divided into intrinsic and extrinsic properties and are marked in blue and red rectangles, respectively. GMs = generation methods to generate non symbolic numerical stimuli. See the full-text version for a full-scale view of Figure 6.

Three types of stimuli-reproduction data sharing

We found three types of data sharing in the literature providing information on the stimuli sets and GMs. The first is stimuli report, methods that shared a report on the stimuli features (e.g., the ratio of physical properties in each picture) but did not share the image files themselves (e.g., Yousif & Keil, 2019). Note that the stimuli-feature report was not always comprehensive (i.e., it did not necessarily report all controlled properties). Second is stimuli software, methods that provided a way to reproduce their stimuli by providing a software package or graphical user interface that enables stimuli reproduction (e.g., Guillaume et al., 2020). Third is stimuli images, methods that shared their stimuli sets as image files such that they could be reanalyzed after publication (this was shared by only one GM: Bonny & Lourenco, 2023). We also found that a small portion of three GMs explicitly stated in their article they generated their stimuli online per participant. Although sharing custom images created per participant poses logistical difficulties, the files can be uploaded to repository storage (e.g., the Center for Open Science enables uploads up to 50 GB). The images themselves can be captured via screen recordings or existing experiment software package functionalities. Providing a software solution to reproduce the stimuli would allow a comprehensive understanding of property calculations and ability to test questions that were not tested using the original version. New ideas could be tested by reanalyzing previous studies when both the stimuli images and their corresponding responses are provided. For example, Shilat et al. (2021) reanalyzed images generated by Salti et al.’s (2017) GM. To compare results, reanalyze the data comprehensively, and most important, understand the stimuli-design perspective, researchers need access to more data, specifically, trial-by-trial responses coupled with details on the presented stimuli features and a reference to the actual image file. We eventually decided to unify the three types of data-sharing tags into a unified tag and named it “Stimuli Reproducibility.” Figure 6 displays the status of the GMs’ reproducibility (N = 15 out of 32 GMs).

Control type is not related to the controlled properties

We found that the types of controls used by the GMs and the controlled properties had a low to no dependency on one another. For example, a GM can control the congruency of the average diameter to the arrays’ numerosity. However, the same method can control the stimuli saliency by imposing it on other physical properties such that the arrays’ convex hull area will be equalized to their numerosity. Any type of control the GMs impose on the different physical properties does not entail the use of other control types.

Half of the GMs were not tagged as theory-driven

We found that ≈53% of the GMs were theory-driven and have explicitly justified their control choices and their implications. Two notable and common themes appeared in theory-driven GMs. First, they built on the limitations of previous GMs. In addition, they considered geometrical constraints and mathematical associations between numerical and physical dimensions when selecting controls. GMs tagged as non-theory-driven exhibited few to no explanatory statements justifying the selection of controls. Instead, such GMs might provide a rationale by referencing only previous literature. In addition, sometimes there were unexplained changes in control choices across experiments. Note that non-theory-driven GMs used all control types and controlled properties and appeared across all reviewed years. Tagging a GM as theory-driven did not entail anything on the stimuli-control choices. However, our analysis presented a contrast between the grounded justifications provided by theory-driven GMs to those of non-theory-driven GMs.

GMs’ properties definitions and terminology

Properties definitions and the case of density

We analyzed 17 GMs with different views on density, and the results are summarized in Figure 7. Density was found to be the extrinsic property most of the GMs attempted to control for. Approximately 57% of the GMs that controlled for extrinsic properties also controlled density. Most GMs in this analysis (≈70%) explicitly stated to control density. However, five GMs that did not explicitly state to control density were also included because they provided a definition of density or described it in a unique context. Note that 25% of the GMs (3/12) that controlled density did not provide a definition of it. More importantly, there were inconsistencies in definition, even within an article. More than half of the GMs (5/9) that stated that they controlled for density had defined it inconsistently throughout the text. In other words, there is evidence that a substantial portion of the GMs that controlled for density were not consistent in the way it was calculated. For example, Sophian and Chu (2008) discussed two definitions of density. The first definition is based on the total surface area. The second definition is based on individual items. However, they eventually controlled for a third definition, that is, the amount of open space in the array. We could not find a concrete definition of the term “open space.” Instead, open space is the perceived aggregated space unoccupied by the array’s items. Open space might be considered as a proxy of density. Open space was manipulated by using different array configurations or different groupings that provide different levels of open space. Most of the GMs (5/9) that reported to control for density but did not define it or inconsistently referred to it within an article were conceived as a part of the earlier GMs. In later years, more GMs that declared to control for density consistently defined it.

We found six definitions of density: (a) $\frac{Number}{Display area}$ , (b) $\frac{Number}{Convex hull area}$ , (c) $\frac{Total surface area}{Display area}$ , (d) $\frac{Total surface area}{Convex hull area}$ , (e) $\frac{\sum_{1 \leq i, j \leq n} \sqrt{{(x_{i} - x_{j})}^{2} + {(y_{i} - y_{j})}^{2}}}{n^{2}}$ (interdistance), and (f) open space. The number of unique definitions was determined after grouping similar definitions. These definitions contain differing parameters related to array dimensions. Consequently, changes in each density variant differentially alter array properties. The multifaceted definitions cannot be considered equivalent. Their use demonstrates how various GMs manipulated distinct attributes while stating control of the same overarching “density” property. Consequently, the different GMs may explain the effect or bias of density on numerical judgments, but the different explanations are not comparable because the property’s operationalizations differed substantially.

Fig. 7.

Different density definitions and views. The table displays an analysis of different views on density (17 GMs). Two-thirds out of 12 GMs that stated controlling for density inconsistently defined it or did not define density at all. The first column displays the GM’s name. The second column presents the existence of a density control statement. The third column denotes the availability of a definition for density in the text. The fourth column presents the consistency of the definition of density throughout the text. Trivially, if density was not defined, its definition could not be consistent. A definition was marked inconsistent if the annotators found a discrepancy between the definition of density and the actual way it was controlled for. The fifth column details GMs’ different density definitions. We provide a matching formula if it appeared in the article or could easily be translated into one (based on Fig. 5). In the next columns, we mapped the definitions to four non symbolic array dimensions according to the number of items, sizes, scattered area, and distances. The last column represents the main context in which the GMs define density. If a GM did not define density, the corresponding cell remained empty. A green checkmark represents conditions encoded as true. The next conditions were encoded as true if annotators (a) spotted a control statement on density, (b) found a definition of density, (c) found the definition as consistent across the text or (d) if the definition was mapped to a dimension. Whenever one of these conditions was not met, it was marked with a red x-mark. The GMs are ordered according to the following categories: (a) GMs that did not define density, (b) GMs that defined density, (c) definitions mapped to the number dimension, and (d) definitions mapped to the items’ sizing dimension. Otherwise, the GMs are ordered chronologically. GMs = generation methods to generate non symbolic numerical stimuli. See the full-text version for a full-scale view of Figure 7.

Inconsistent terminology

We found that the 16 different controlled-for properties were referred to by a total of 49 alternative terms (Fig. 5). The same property might be referred to by using synonyms that share the same meaning, with some more similar than others (Danner, 2014; Lea, 2008). A property can also be referred to by different homologous terms, although these terms are not straightforward synonyms of the same property. The annotators reviewed the GM text again and discussed whether these terms referred to properties that already existed in our properties list. Total circumference provides an example of a property that could be replaced by various synonyms. For instance, the word “perimeter” can be replaced with the word “circumference” (usually referring to circular shapes) or any other synonym. The word “total” can be replaced with “sum” or any other synonym or combination of these synonyms. We found that many GMs used different synonyms for the total circumference (De Marco & Cutini, 2020; DeWind et al., 2015; Halberda & Feigenson, 2008; Lyons et al., 2014; Price et al., 2012; Rousselle et al., 2004; Rousselle & Noël, 2008; Salti et al., 2017; Yousif & Keil, 2019; Zanon et al., 2021).

Some properties were referred to by terms that are not direct synonyms and can throw the reader off in a different theoretical direction. For instance, the term “area extended,” used as an alternative term to “convex hull area,” can be misinterpreted as related to the items’ surface area. For some terms, it was hard to know if the different authors referred to the same property. For example, the convex hull area was also referred to using the term “field area” (DeWind et al., 2015), “area extended” (Gebuis & Reynvoet, 2011b), or “global occupied” area. The convex hull area was also referred to using the term “total envelope” (Halberda & Feigenson, 2008), which was also used by Soltész et al. (2010) to refer to the items’ “total circumference.” In a broader context, numerical cognition, like many other fields, has a tendency for terminology inconsistencies (e.g., Alcock et al., 2016; Holmbeck, 1997; Rudmin, 2003). Some even phrased numerical-cognition terminological problems as “terminological chaos” (Davis & Pérusse, 1988). The distinction between the terms “number,” “numerosity,” and “quantity” is not always clear (Dos Santos, 2022; Nelson & Bartley, 1961; Stevens et al., 1937). More consistency will help in bridging theoretical and methodological gaps preventing advances in studying numerical cognition.

Quantitative analysis

Stimuli reproducibility

Fewer than half of the GMs are reproducible, and only one provided the actual stimuli

Fewer than half of the GMs provided stimuli reproducibility (see section Provided Data)—a way to reproduce their stimuli (≈47% of the GMs). Of the six GMs (≈18.7%) that provided a detailed report on their stimuli features, only three provided relevant software for stimuli reproduction or shared the actual stimuli images. Three GMs (≈9.4%) explicitly stated in their article they generated their stimuli online (see the Supplemental Material), and out of these GMs, only one also provided a way to reproduce its stimuli (Halberda et al., 2008). Note that only one GM provided its stimuli images paired with the results of each corresponding trial, allowing reanalysis of the gathered results (Bonny & Lourenco, 2023).

GMs’ stimuli reproducibility slightly increased over the years

The chance that a GM would provide options for stimuli reproducibility has been higher in recent years than in the earlier years of this review. Using a nondirectional test, we found that publication year alone did not significantly predict the variance in the average proportion of GMs that provided means for stimuli reproducibility, r(15) = .453, 95% confidence interval (CI) = [–.077, .783], p = .09, without enough Bayesian support for a null effect, BF₀₁ = 0.840. However, in the current context, a unidirectional positive hypothesis is more suitable because many studies showed that the tendency for data sharing in cognition research and other disciplines is increasing and has mainly done so after 2010 (Federer et al., 2018; Kidwell et al., 2016; Martone et al., 2018; Prosser et al., 2023; Tenopir et al., 2015). Therefore, we hypothesized that as the publication year increases, stimuli reproducibility will also constantly increase. In accordance, we found that 20.5% of the variance in the average proportion of stimuli reproducibility is associated with the publication year, r(15) = .453, 95% CI = [.014, 1], p = .045, with weak to moderate Bayesian evidence supporting this effect, BF₁₀ = 2.25. We inspected Figure 8 and searched for descriptive trends across the years that could have explained the small change in reproducibility. Surprisingly, in contrast to data-sharing trends of the last 2 decades, we found that in 2022 and 2023, only three out of nine GMs provided stimuli reproducibility, thereby violating the linear regression assumption of a linear relationship between the independent and the dependent variables. We grouped the GMs into two groups, GMs created before and including 2011 and GMs created after 2011, and analyzed whether publication time was associated with the number of control types. Before 2011, approximately 38% of GMs provided stimuli reproducibility, and after 2011, stimuli reproducibility increased to 53%. To test the association between the GMs’ period (before/after 2011) and stimuli reproducibility (yes/no), a chi-square test of independence was conducted. However, stimuli reproducibility was not significantly associated with GMs’ period, χ²(1, 32) = 0.622, p = .43, with a weak Bayesian support for a null effect, BF₀₁ = 1.783.

Fig. 8.

Stimuli reproducibility throughout the years. The figure depicts the average proportion of GMs that provided stimuli reproducibility (y-axis) as a function of the GMs’ publication year (x-axis). Stimuli reproducibility has constantly grown over the years. Orange circles depict GMs, and their size changes according to the number of GMs published in the same year. For example, four GMs were published in 2022 and five in 2023. Correspondingly, the circle representing GMs from 2023 will be larger and more transparent than the circle representing the GMs published in 2022. As the publication year progressed, so did the proportion of GMs that provided stimuli reproducibility, r(15) = .453, 95% CI = [.014, 1], p = .045, with weak to moderate Bayesian evidence supporting this effect, BF₁₀ = 2.25. The black line denotes the linear regression fit line, and the standard deviation is represented by the shaded gray curve. GMs = generation methods to generate non symbolic numerical stimuli; CI = confidence interval.

Controlled properties

Low to medium similarity in some controlled properties, with most GMs controlling total surface area

The different methods controlled for 16 properties (see Fig. 5). Properties were categorized into intrinsic properties, defined at the level of individual items, or extrinsic properties, defined according to the relationships between items of the whole array. Figure 9 displays the relative frequency of GMs that controlled specific properties. The average relative frequency of all controlled properties was approximately 22%, SEM = 5.648.

Fig. 9.

The relative frequency of the controlled properties. The figure depicts the distribution of different controlled properties. No property was controlled for by all GMs, but the total surface area was controlled for by most of the GMs. Half of the properties were controlled for by only one or two of the GMs. Intrinsic properties were more frequently controlled for than extrinsic properties. The list of properties is presented on the x-axis. Bars represent the relative frequency of the controlled properties between the GMs. The controlled properties were grouped into intrinsic and extrinsic properties and are marked in blue and red, respectively. The properties within the intrinsic and extrinsic groups were ordered from the most frequently controlled property to the least frequently controlled property. When two properties were equally controlled by the GMs, they were ordered in the figure according to the order provided in Figure 5. GMs = generation methods to generate non symbolic numerical stimuli.

No single property was controlled by all methods. Yet some properties were controlled for by a high portion of GMs. The two most controlled-for properties were intrinsic properties—total surface area and average surface area were controlled for in ≈81% and ≈53% of the methods, respectively. Examining the union and intersection of the GMs that controlled for the total surface area and items’ surface area or only one of these properties shows that 44% of the GMs controlled for both properties and that 91% controlled for at least one of them. Therefore, there is a medium similarity in the controlled-for intrinsic properties. The most frequently controlled extrinsic properties were density and convex hull area, controlled in ≈38% of the methods. Only 19% of the methods controlled both the items’ density and convex hull, and 56% of the methods controlled at least one of these properties. Therefore, there is a low to medium similarity among GMs in the controlled extrinsic properties.

High diversity in the controlled properties

Our findings showed that no single property was controlled for by all methods. Instead, either the total surface area or average surface area were controlled in the majority of GMs. Following these findings, we wanted to quantify how diverse GMs’ choices of controlled-for properties were. To test the diversity in the GMs’ controlled properties, we measured the asymmetry of the GMs’ controlled-properties distribution by calculating skewness (Groeneveld & Meeden, 1984; MacGillivray, 1986; Pearson, 1895). Physical properties are categorial discrete variables, and skewness is regularly defined for continuous variables. However, because our data do not refute the assumptions about the skewness of categorial variables (Klein & Doll, 2021), we used the general skewness formula. The controlled-properties distribution had a medium to high positive skewness of 1.353, SE = 0.564. Therefore, although either total or average surface area was controlled for by most GMs, the remaining properties had a low probability a GM would control them. The medium to high skewness also indicates that GMs’ choices of controlled properties were not evenly spread. Finally, there was also a high diversity in the controlled properties, diversity index = .882. Diversity was calculated using the Gini-Simpson’s index of diversity usually denoted using D or G but here for clarity was denoted using the notation “diversity index” (Keylock, 2005; Lande, 1996; Simpson, 1949; Tran et al., 2021). Here, diversity index measures the probability that two GMs will control for different properties. The higher the diversity index is, the higher the probability that the two compared GMs will control different properties. Therefore, there was a high probability that different GMs would control for different properties.

Most GMs controlled a few properties but were diverse in the number of controlled properties

The total number of types of properties controlled in all GMs was 16 (see Fig. 5). However, each GM had actually controlled for a much lower number of properties, M = 3.5, SEM = 0.291. Approximately 59% of the methods controlled for two to three properties. In contrast to the low number of properties controlled by a large portion of GMs, we found that GMs were diverse in the number of controlled-for properties. The number of controlled properties ranged between one and eight. The distribution of the number of controlled properties was not evenly distributed because more than half of the GMs controlled for two or three properties (respectively, 10 and nine GMs). Furthermore, there was a high diversity in the number of properties controlled by the GMs, with diversity index = .798. Therefore, there was a high probability that different GMs would control a different number of properties.

Control for intrinsic properties was more common

GMs controlled for two categories of properties consisting of seven intrinsic and nine extrinsic properties (see section Controlled Properties). Figure 10 displays the relative properties of GMs’ choices to control a certain category or both categories. A majority of 63% of the methods controlled for both intrinsic and extrinsic properties, whereas 34% of the methods controlled for only intrinsic properties, and the remaining ≈3%, equivalent to one GM (Dakin et al., 2011), controlled for only extrinsic properties. The average number of intrinsic properties controlled by the GMs (M = 2.25, SEM = 0.191) was higher than the number of extrinsic properties controlled by the GMs (M = 1.25, SEM = 0.196). According to the ratio of the average number of extrinsic-intrinsic properties, GMs chose to control 80% more intrinsic than extrinsic properties. We calculated the normalized average of controlled intrinsic properties, M = 1.75, SEM = 0.148, considering that there were more extrinsic than intrinsic properties (intrinsic: N = 7; extrinsic: N = 9), and still, the number of intrinsic properties was higher by the GMs than the number of extrinsic properties. There was medium to high diversity in the control of intrinsic and extrinsic properties, but they were equally diverse, with an equal diversity index of .754 for both categories. Because both types of properties’ diversity indexes were equally high, there was a high and independent equal probability that two GMs would control a different number of intrinsic or extrinsic properties. The diversity result marks the high diversity between GMs and the diversity in the control of different types of intrinsic and extrinsic properties.

Fig. 10.

The relative frequency of the controlled properties categories. The figure depicts the distribution of different controlled properties categories. Most GMs controlled for only intrinsic properties. The list of properties categories is presented on the x-axis, and the bars represent the between-GMs relative frequency. GMs = generation methods to generate non symbolic numerical stimuli.

Over the years, the types of controlled properties increased, but individual GMs’ control did not

In each year, different properties were controlled by different GMs, increasing the total number of types of properties controlled by all GMs. As shown in Figure 11, throughout the years, the sum of available controlled properties has constantly increased. Publication year explains 92.8% of the variance in the sum of available controlled properties, F(1, 30) = 388.694, p < .001, r(32) = .963, 95% CI = [.926, .982], with decisive Bayesian evidence supporting this effect, BF₁₀ > 1,000. Inspecting the regression model coefficients reveals that every GM publication added almost half of a property to the pool of available properties, Predicted Number of Available Properties = 0.468 × Actual Number of Available Properties – 931.464. In contrast, the number of properties actually controlled for by the individual GMs did not increase across the years and in some years, even decreased. Publication year did not explain the number of controlled properties, r(32) = –.242, 95% CI = [–.544, 0.117], p = .183, with weak Bayesian evidence supporting the null effect, BF₀₁ = 1.471. Analyzing the same analysis of the change in the number of controlled properties across years but looking separately at intrinsic and extrinsic categories does not provide different evidence. The number of intrinsic and extrinsic properties that were actually controlled for by the GMs did not significantly increase across the years, with weak to moderate Bayesian support for null effects, intrinsic: r(32) = –.203, p = .264, BF₀₁ = 2.503; extrinsic: r(32) = –.161, p = .378, BF₀₁ = 3.135. Furthermore, the diversity in the number of properties controlled by the GMs has changed throughout the years but in relatively small proportions. According to a comparison of diversity index of the number of controlled properties before and after 2011, the probability that two GMs would control a different number of properties had increased by only ≈4% after 2011. Importantly, we do not imply that controlling more properties is necessarily better. Instead, we wanted to provide information on the state of the field in terms of the number of controlled properties. The increase in the total number of properties available for control indicates that new ideas were introduced to the field. The stability in the number of properties controlled by the GMs indicates that new ideas were not easily implemented.

Fig. 11.

Individual GMs’ number of controlled properties compared with the total number of properties available for control across the years. (Left) The number of properties that individual GMs actually controlled for as a function of the year in which the GMs were published (x-axis). (Right) The total number of properties available for control also as a function publication year. Across the years, the total number of properties available for control for all GMs has constantly grown, whereas the number of properties actually controlled by individual GMs has not. Circles denote GMs. The purple color represents the number of properties actually controlled for by the GMs, and the light-blue color represents the total number of available properties. The size of the circles changes as a function of the number of GMs published in the same year that had the same value of properties. For example, in 2022, four GMs were published. This is represented on the left by two purple circles in different sizes. Three GMs controlled for two properties, and one controlled for a single property. Therefore, the GMs were plotted as separate purple circles. The three GMs circle size is larger than the circle of the single GM controlling for a single property. In addition, all GMs published in 2022 could control for 14 different available properties, therefore the four GMs were plotted as a single large light-blue circle. Linear correlation shows that as publication year progressed, so did the number of available properties, r(32) = .963, 95% CI = [.926, .982], p < .001, with decisive Bayesian evidence supporting this effect, BF₁₀ > 1,000. In contrast, publication year has not explained the number of controlled properties, r(32) = –.242, 95% CI = [–.544, .117], p = .183, with weak to moderate Bayesian evidence supporting the null effect, BF₀₁ = 1.944. The black line denotes the linear regression fit line, and the standard deviation is represented by the shaded gray curve. GMs = generation methods to generate non symbolic numerical stimuli; CI = confidence interval. See the full-text version for a full-scale view of Figure 11.

Controlled types

More than half of the GMs used congruency, saliency, and sizes heterogeneity within as control types

Five different control types were used (see Fig. 4). The relative frequency of the methods that used various control types is displayed in Figure 12. No single control type was used by all methods. The average relative frequency of the different control types was equal to approximately 40%, SEM = 10.505, equivalent to approximately two controls per GMs. Importantly, 53% to 62% of GMs used the same three control types: congruency, saliency, and sizes heterogeneity within. Approximately 62% of the GMs used congruency, 56% used saliency, and 53% used sizes heterogeneity within. Note that ≈84% of the methods used at least one of these control types, and 55% of the methods used at least one combination of these control types. Therefore, there is a high similarity in the control types used by the different methods. However, in general, the chance that two GMs would use different control types was approximately 75%, with a medium to high diversity index = .758. Therefore, although there were more common control types, GMs diverged in the control types they chose.

Fig. 12.

GMs’ use of various control types. The figure depicts the relative frequency of use of different control types by the different GMs. Approximately 84% of the GMs used at least one of these control types: congruency, saliency, and sizes heterogeneity within. The x-axis presents the list of control types. The y-axis presents the relative frequency of the GMs that used these control types. The control types are ordered from the most frequently used to the least frequently used control type. GMs = generation methods to generate non symbolic numerical stimuli.

GMs employ a stable number of control types throughout the years

Using a nondirectional test, we found that throughout the years, the average number of control types employed by GMs has not significantly changed, r(32) = –.289, 95% CI = [–0.579, 0.066], p = .109, without enough Bayesian support for a null effect, BF₀₁ = 1.327. To provide a more nuanced analysis, we compared different time periods before and after 2011, which marks half of the reviewed time period. We grouped the GMs into two groups, before and including 2011 and after 2011, and analyzed whether publication time was correlated with the number of control types. The difference between the average number of control types used by the GMs before and after 2011 was calculated using the Mann-Whitney U test. The Mann-Whitney U test (also known as the Wilcoxon rank sum test) is a nonparametric test for the difference between two independent samples, denoted using the statistic U (Mann & Whitney, 1947; McKnight & Najab, 2010; Wilcoxon, 1945). In general, there was no significant difference in the number of control types used by the GMs before and after 2011, U(19, 13) = 83, p = .353, r-biserial correlation = .328, with not enough Bayesian support for the null hypothesis, BF₀₁ = 0.985. In contrast, the diversity in the number of controlled types of the GMs employed increased after 2011: diversity index before 2011 = .692, diversity index after 2011 = .786. The increase in the diversity index shows that after 2011, the probability that two GMs would use a different number of control types had increased ≈14%.

Discussion

In the current review, we examined various GMs with the following objectives in mind. First, we wanted to describe the current state of the field in terms of its common language and ground. Second, we wanted to accurately map the control types and properties controlled by the different GMs. Third, we wanted to test whether the field is limited in its ability to compare between studies and to interpret results through current and new theoretical prisms. Finally, we wanted to provide guidelines for GMs. Our focus was numerical comparisons, yet our conclusions and guidelines apply to other fields and tasks involving cognitive-perceptual experiments.

Using a combination of automatic tools and human annotators who inspected the literature, we found 32 GMs (Fig. 6). These GMs used five different control types (Fig. 4) to control 16 physical properties (Fig. 5). Some of the control types can be imposed on all properties and some on only a portion of them. The control type has a low to no dependency on which properties were controlled for. Consequently, this leads to a variety of GMs and inevitably, to different stimuli sets and results (Clayton et al., 2015; DeWind & Brannon, 2016; Gebuis & Reynvoet, 2011a, 2011b; Kuzmina & Malykh, 2022; Leibovich et al., 2017; Piazza et al., 2004; Salti et al., 2017; Smets et al., 2015).

The ability to compare and replicate previous results relies on proper definitions of the controlled properties. A large portion of the GM publications lack proper definitions of properties they chose to control for. Our analysis showed that some GMs stated to control certain properties but did not define these properties. Although some of the properties do not need definitions (mainly definitions stemming from Euclidean geometry), other properties, such as density or the convex hull area, do need an exact definition. Another problem is that some of the properties were inconsistently defined between GMs (Fig. 7). Even worse, some properties were inconsistently defined within the same publication. The lack of consistency of property definitions between GMs is more harmful than the lack of definitions because it makes the GMs that have controlled for that property incomparable. Inconsistent property definitions within an article make it not only less replicable but also impossible to understand the prism through which the researchers conducted their study and the logic underlying the GM design. All in all, the lack of proper definitions creates a lack of common language, prevents proper reproduction or reanalysis of the results, and makes GMs incomparable with one another.

Designing a theory-driven GM requires providing justification for the control choices and establishing them on previous methodological prisms and numerical-cognition theories. Note that we recognized half of the GMs as non-theory-driven GMs, exhibiting very few explanatory statements for control selections. In contrast, theory-driven GMs justified controls by building on previous GMs or numerical theories and considering geometrical constraints and mathematical associations. Articulating the rationale underlying a GM’s approach to stimuli control and its implications grounds it within the context of the relevant experimental literature. More crucially, it equips fellow scientists to understand the prism through which results should be interpreted (Almaatouq et al., 2022). Finally, reflecting on the stimuli-design rationale and implications aids GM designers in interpreting their own results.

On the surface, the majority of GMs used one of a few controls. All GMs except for four controlled one of two intrinsic properties—the individual items’ surface area or the items’ total surface (Figs. 6 and 9). Note that items’ individual and total surface areas are similar properties, dependent on the radius square of the items (because most of the GMs use dot-shaped items). In addition, most GMs used the same control types (Fig. 12). Nevertheless, although most of the GMs controlled for similar properties and used similar control types, our mapping shows that the different GMs are incomparable. This is because each GM used additional controls that were highly diverse. In fact, most properties had a low probability of being controlled for. Furthermore, GMs were diverse in the total number of properties they chose to control for. Finally, the diversity of the controlled properties and control types increased throughout the years.

There is the question of whether new ideas changed the way GMs were designed. Scientific progress occurs when scientific ideas feed the scientific domain and contribute to its expansion and development (Bird, 2000; Kuhn, 1970). The numerical-cognition field is a diverse field. The diversity of the field was evident by the high variety of controlled properties and the low probability that two GMs would control for the same properties (or the same number of properties). Note that over the years, new ideas kept emerging. Accordingly, the diversity of GMs’ control choices grew during the second decade of the reviewed period. Furthermore, the number of properties available for control has constantly increased throughout the years (Fig. 11). Each new GM was predicted to add half a property to the total pool of available controls. In contrast, throughout the years, the actual number of properties GMs controlled for was stable (Fig. 11). Some properties, such as external perimeter, open space, and regularity, were suggested as being impactful in the early years of this review but were not controlled for in later years. The contrast between the increase of ideas to the stability in controlled properties suggests that the ideas coming from previous GMs are usually not adopted by later GMs. Note that the stability in controlled properties is not necessarily harmful because controlling more properties is not necessarily better. However, it provides evidence for the lack of assimilation of ideas and findings in the field.

It is worth focusing on two different properties that reflect opposing trends in the field. On one hand, the adoption of the convex hull area shows that new ideas can be successfully implemented into the field. In 2011b, Gebuis and Reynvoet showed how the convex hull area biased numerical comparisons, and since then, others have highlighted the great need to control convex hull area in numerical-comparison tasks (Clayton et al., 2015; Rodríguez & Ferreira, 2023). We found that since the convex hull area was introduced to the literature, most of the GMs have implemented it in their design. On the other hand, the case of density provides a different picture. Although a substantial portion of the GMs attempted to control density, it was not carefully defined and was therefore poorly controlled for.

The ability to reproduce, compare, and study new ideas relies on the quality and amount of shared data. In the last 2 decades, data sharing in cognitive research and other disciplines has increased (Federer et al., 2018; Kidwell et al., 2016; Martone et al., 2018; Prosser et al., 2023; Tenopir et al., 2015). Yet the shared-data quality has not necessarily increased (Tedersoo et al., 2021). GMs were inconsistent in their data-sharing practices, with different amounts, quality, and means provided for stimuli reproduction. There is evidence that stimuli reproducibility increased over the years, but it is not robust. In terms of proportion, fewer than half of the GMs provided some sort of stimuli-reproducibility method. Note that only one GM provided their actual stimuli images, complemented with results. Providing the stimuli images and details along with the data of the corresponding response would make it possible to inspect the results through the prisms of new GMs.

Taken together, the field shows a partial capability to improve GMs’ control and/or to assimilate new ideas. We think that the progress of the numerical-cognition field would be more efficient if it would be possible to compare previous ideas and test novel ideas in light of previous findings. The ability to compare ideas and test new ideas considering previous ones relies on the elaboration and precision of the method section and providing quality means for stimuli reproducibility and reanalysis.

Additional GM frameworks

We examined the broad scope of the control of physical properties. Our guiding principle was to conduct an inductive data-driven review, as opposed to a narrative-driven review. For this purpose, we used parsimonious definitions and dichotomous tagging criteria based on explicit statements. However, different GMs approach numerical cognition from various perspectives, and several of the definitions we provided could be reconsidered through other prisms. Evaluating alternative approaches and definitions is outside the scope of broadly mapping the field and providing guidelines. Yet we wanted to discuss additional nuances of stimuli control that could be considered.

We considered a GM as controlling for saliency if the veridical-mathematical ratios of two or more properties of the image arrays were matched on a decimal scale. For example, if a GM equalized the veridical ratios of the number of items and total surface area across the two presented arrays, it was tagged as controlling for saliency. Using this definition, one can study discrimination acuity by making the dimensions equally salient and consequently, equally discriminable. One example for this definition of saliency comes from Salti et al.’s (2017) GM, in which they equalized ratio with a motivation to compare the acuity of discriminating different stimuli dimensions, such as quantity and total surface area. Recently, Aulet and Lourenco (2021, 2022) suggested an alternative approach to match the perceptual rather than mathematical discriminability of quantity and surface area. In a numerical-comparison task with children (Aulet & Lourenco, 2022), when decimal-scale mathematical ratios were matched, a bias for numerosity over surface area emerged. However, when perceptual discriminability was matched, that is, ensuring changes in quantity were equally easy to discriminate as changes in total surface area, a bias for surface area over numerosity emerged. Generating a perceptually matched stimuli approach relates to long-standing debates over veridical versus nonveridical perception and appropriate methods for equating discriminability across dimensions (Akins, 1996; Hoffman, 2018; Melara & Mounts, 1993; Shepard, 1981; Wei & Stocker, 2017).

We considered a GM as controlling for saliency if it matched the ratios of two or more properties of the presented arrays. However, the statistical features of the entire stimuli set also determine saliency. Skewed, unbalanced, and highly variable distributions of stimuli properties can significantly affect results by affecting saliency across dimensions (DeWind et al., 2015; Gebuis & Reynvoet, 2011a; Park, 2022; Salti et al., 2017). We considered a GM as controlling for saliency if it matched ratios of two or more properties of the presented arrays. Different GMs with different property distributions will lead to unequal saliency, causing certain dimensions to become more perceptually prominent. For example, Park (2022) demonstrated how Yousif and Keil’s (2019) GM, by manipulating the additive area, created an unbalanced stimuli set. Specifically, the distribution of displayed quantities was much more distributed than the additive and total surface area distributions. Therefore, the item size dimension was more salient than numerosity, with much less balance in the presented quantities than in the arrays item sizes (Park, 2022). An inherent assumption of several GMs is that statistical distribution features across the entire stimuli can bias results by changing dimension saliency (DeWind et al., 2015; Gebuis & Reynvoet, 2011a; Salti et al., 2017).

Another statistical aspect of the entire stimuli set that might bias the results is the frequency of the different presented images and properties. In a recent study by Krajcsi and Kojouharova (2023), stimuli frequency affected symbolic number-comparison judgments. Participants compared numbers denoted by novel symbols with manipulated frequencies—when quantity increased, symbol frequency decreased or increased. Results showed that symbols’ frequency biased participants’ judgments (and explained the “size effect,” which is out of our scope; see Cohen Kadosh et al., 2008; Moyer & Landauer, 1967). The effect of frequency on symbolic comparisons may extend to non symbolic stimuli. More frequent stimuli could be more familiar, similar to the familiarity advantage of canonical arrangements, which exhibit grouping and/or regularity (Jansen et al., 2014; Mandler & Shebo, 1982; this mainly relates to subitizing, and there are alternative explanations relying on numerosity and not on items arrangements: Klahr, 1973; Trick & Pylyshyn, 1994). Overall, frequency effects of the entire stimuli set merit consideration in non symbolic experiments.

Often, GMs did not consider statistical aspects of the entire stimuli set. Regarding applicability, several GMs enabled control over distribution features such as ranges, variability, means, and skewness for certain properties (e.g., De Marco & Cutini, 2020; Gebuis & Reynvoet, 2011a; Piazza et al., 2004; Salti et al., 2017; Yousif & Keil, 2019; Zanon et al., 2021). However, most GMs did not provide options to control the distribution, and when available, they were for only a limited set of properties. More often, GMs reported on and/or visualized statistical distribution features (e.g., De Marco & Cutini, 2020; DeWind et al., 2015; Gebuis & Reynvoet, 2011a, 2011b; Guillaume et al., 2020; Kuzmina & Malykh, 2022; Yousif & Keil, 2019; Zanon et al., 2021). Most GMs reported only numerical ranges, with a minimal addition on physical-properties distributions. The GMs that did report on properties distributions usually reported just on range and did not provide data on the full distribution, such as the distribution mean and variability.

We tagged saliency and congruency as independent control types, based on GMs’ explicit control statements. However, saliency and congruency are not independent. Performance on non symbolic number-comparison tasks is affected by the interaction between numerical ratio, physical-property congruency, and physical-property saliency (DeWind & Brannon, 2019). Numerical ratios are the ratio between the number of items in each array. As per the canonical ratio effect, trials with lower numerical ratios approaching 1 are more difficult, and performance deteriorates (e.g., DeWind et al., 2015; Leibovich et al., 2017; Price et al., 2012). Note that performance can vastly differ between different difficult trials created by different GMs based on the congruency and saliency of the physical and numerical dimensions. Braham et al. (2018) predicted accuracy on a trial-to-trial level using stimuli generated by the Panamath GMs (Halberda et al., 2008; Halberda & Feigenson, 2008). Reynvoet et al. (2021) also generated stimuli using the Panamath GMs but also used an adapted version of the G&R GM (Gebuis & Reynvoet, 2011a). In both studies, on easy trials displaying the Panamath stimuli, accuracy was comparable regardless of physical-property congruency and saliency (Braham et al., 2018; Reynvoet et al., 2021). However, in difficult trials generated by G&R, the effects of congruity and saliency were substantially larger than for Panamath trials. In difficult G&R trials, incongruency and saliency could be associated with below-chance accuracy, whereas when using Panamath, accuracy did not decrease below 80% for all numerical ratios. Reynvoet et al. explained that physical properties in G&R stimuli were more salient than the Panamath ones because the former had more variability compared with numerical values. In convergence, DeWind et al. (2015) suggested that the effect of difficult, incongruent, and not-controlled-for saliency trials on performance is vastly different between GMs. Thus, the interaction between congruency, saliency, and numerical ratio biases results, with different GMs leading to different conclusions about numerical cognition. Note that, as reviewed in the article, understanding and comparing GMs control is often very hard.

GMs controlled for various properties over the years. However, the number of properties that can realistically be controlled in any given study is limited because there are inherent geometrical constraints and nonlinear associations between properties (e.g., De Marco & Cutini, 2020; DeWind et al., 2015; Gebuis & Reynvoet, 2011a; Piazza et al., 2002; Salti et al., 2017; see also, Shilat et al., 2021). In addition, as demonstrated earlier, GMs have many physical-property definitions that are associated with the same parameters (see Figs. 1 and 5). The interdependencies between properties restrict the number of properties GMs can control at once. The unpredictable correlations between properties restrict the number of properties that can feasibly be controlled in one study. Attempting to control too many properties would require an infeasible number of trials to counterbalance the different factors. Therefore, simply controlling more properties does not necessarily enhance methodology.

Rather than using an additive approach of controlling for more properties, GMs can account for mathematical associations in stimuli design. When done thoroughly, accounting for property associations in the design would provide better stimuli control. Several GMs have provided notable discussions of the relationships between different physical properties and incorporated these connections into their design (e.g., De Marco & Cutini, 2020; DeWind et al., 2015; Piazza et al., 2002; Salti et al., 2017). Although a full discussion of property associations is outside the scope here, properly using them can strengthen GM methodology.

One approach taken by some GMs is to group physical properties into distinct categories based on their mathematical associations. The three main GM categorizations were of (a) Piazza et al. (2004; see also Dehaene et al., 2005), (b) DeWind et al. (2015), and (c) Salti et al. (2017). These three categorizations differentiate between intrinsic properties (related to individual items) and extrinsic properties (related to the overall array). The categorized properties rely on items’ number, size, location, or a combination of the size and location. In addition, these categorizations align with the mapping to dimensions we used to analyze density’s definitions (i.e., items’ number, sizing, scattered area, and distance dimensions).

Note that the explanatory power of current intrinsic–extrinsic GMs categorization is limited. It is often difficult to compare GMs or generalize categorization with properties that were not included in the original GM. The limited explanatory power is mainly due to problematic definitions that do not enable clear category membership. Like categorization systems in other domains, the boundaries between groups can be fuzzy, and the membership to a category is not always clear (Bird & Tobin, 2018; Cole & Leide, 2006; Jacob, 2004; McCloskey & Glucksberg, 1978; Mervis & Rosch, 1981). For instance, it is not clear how to categorize properties, such as spatial frequency or luminance, that do not belong to the Euclidean domain spanning intrinsic–extrinsic dichotomy.

Nevertheless, using categorization frameworks allows GM designers to systematically consider implications of controlling certain properties over others. Relying on categorization avoids an impractical “catch-all” approach of trying to control for every possible property. Most importantly, using property categorization enables a priori postulation and prioritization of necessary experimental controls. Because categorizations are grounded in mathematical associations, they can be applied independently of the author’s assumptions about the role or bias of physical properties in numerical cognition. Ultimately, relying on property categorization enhances the explanatory power, comparability, and generalizability of GMs’ methodology and empirical findings (Almaatouq et al., 2022; Broniatowski, 2021; Makel & Plucker, 2017; Rocca & Yarkoni, 2021; Yarkoni, 2022).

We examined GMs’ control of physical properties using five identified control types. DeWind et al. (2015) introduced an alternative control approach that combines GM design with modeling. Note that this approach diverges from common control types such as congruency or saliency. Their control was reduced to systematically varying eight properties. Later, they modeled their effects on performance. The model represented physical properties and the items’ quantity across orthogonal dimensions of size, spacing, and number, which span the array’s characteristic space. This methodology of DeWind et al. is less constrained by traditional control types and avoids the impracticality of controlling numerous properties. Across the years, several GMs criticized DeWind et al.’s model and stimuli. Guillaume et al. (2020) questioned the perceptual validity of their abstract dimensions of size and spacing. Salti et al. (2017) and De Marco and Cutini (2020) pointed out that the model necessitates using homogeneous-sized items; otherwise, its dimensionality collapses. Moreover, their approach does not facilitate a detailed examination of interactions between different stimuli dimensions, such as saliency and numerical ratio, which, as we presented earlier, might produce varying effects. Fundamentally, DeWind et al.’s method deviates from other GMs, challenging our standard mapping based on tagging of control types. However, it notably pre-implements many of our subsequent recommendations (see section Conclusions and Guidelines), thereby enhancing the interpretability and comparability of the results.

The multifaced control of non symbolic numerical stimuli

In this review, we inspected the GMs from a prism of the physical properties they controlled for. The choice to focus on physical properties relies on several motivations. First, physical properties are intertwined with numerosity, and therefore, the initial control choices have major effects on the experimental results. Second, there is a wide perspective and a wealth of frameworks for controlling physical properties, whereas the need for control unifies researchers from conflicting views of numerical cognition. Third, we identified that the choice to control certain physical properties creates unique correlations between different physical properties. Such choice of control makes the stimuli in different studies incomparable. However, our focus on physical properties necessarily limited our span and created priors for the rest of the process. There are other aspects that could be considered when designing a GM, but they were not reviewed here.

Throughout this review, we considered all non symbolic numerical-comparative judgments regardless of the stimuli-presentation format. There are several presentation formats of non symbolic numerical-comparison tasks. As presented in Figure 13, the two compared arrays can be displayed simultaneously or sequentially. The two compared arrays can also be displayed in a spatially separated or spatially overlapping (i.e., intermixed) format. Studies have demonstrated discrepancies in reaction times, accuracy, and Weber fractions based on the presentation format used for the same stimuli arrays (Braham et al., 2018; Kuzmina & Malykh, 2022; Norris & Castronovo, 2016; Smets et al., 2016). Although investigating reasons for these discrepancies is outside our scope, the choice of GMs contributes to variability between presentation formats (Gilmore et al., 2011; but see Price et al., 2012). Consequently, appropriate GM selection is critical regardless of presentation type.

Fig. 13.

Two stimuli-presentation formats. The stimuli can be presented using (left) intermixed and (right) separated display modes. Arrays contain eight blue and 16 yellow dots, with a numerical ratio of 0.5. The total surface area is equal in both arrays (neutral congruency), and average surface area ratio was equated to the numerical ratio (saliency matched). Importantly, these modes of presentation exhibit different results, modulated by, among other things, control of the stimuli physical properties.

The effect and relevance of extrinsic physical properties in intermixed displays is unclear. Especially, different approaches of density would lead to different conclusions on the relationship between density and presentation format. The very influential and popular Panamath GM (Halberda et al., 2008) controlled for items’ surface area but not for extrinsic properties. Halberda et al. (2008) suggested that in intermixed displays, controlling for extrinsic properties, such as density, interdistance, and convex hull, is negligible, but they did not provide rationale for this claim. How should one consider density in an intermixed display such as shown in Figure 13? If density is calculated as the inverse of average occupancy $\frac{Number}{Convex hull area}$ , then one can ask whether the overlapping arrays share the same convex hull or have separated convex hull areas. If density is calculated as $\frac{Total surface area}{Convex hull area}$ , then again, how should one consider convex hull area (when total surface area is controlled for)? Considering open space in intermixed displays is also unclear. Open area is the perceived aggregated blank space not containing any items in an array. How should we consider the open space of between one of the arrays items if this space is occupied by the different array items? In contrast to the former properties relating to the item’s sizes and location, interdistance might be less problematic to consider than the former mentioned properties. Interdistance is dependent only on the items’ locations and can be calculated in one array independently of the other.

Several studies investigated the effect of extrinsic physical properties using intermixed displays. Participants were generally slower and less accurate on intermixed trials compared with spatially separated displays (Braham et al., 2018; Kuzmina & Malykh, 2022; Norris & Castronovo, 2016; Price et al., 2012). Braham et al. (2018) used Panamath and found that convex hull area and density congruency predicted accuracy, although density and convex hull were not directly controlled. Likewise, Kuzmina and Malykh (2022) tested intermixed displays. They found higher accuracy and faster responses when convex hull was incongruent with numerosity. This aligns with evidence that Panamath’s convex hull strongly correlates with numerosity (Clayton et al., 2015). Further study on the relevance of physical properties in intermixed displays is warranted. Tracking eye movements during intermixed trials may elucidate participants’ processing of extrinsic properties (Odic & Halberda, 2015). All in all, appropriately controlling physical properties remains critical across presentation formats. Different stimuli sets produce distinct correlations between numerical and physical dimensions regardless of presentation format.

Top-down strategic effects and learning effects might bias participants to attend to different physical properties according to task goals (Salti et al., 2019). Specifically, using the same stimuli but changing the task goals, such as changing the judged magnitude, influences the results. For example, changing the task from a numerical task in which the participants’ goal is to choose the more numerous array to a physical task in which the participants’ goal is to choose the larger array changes the effect of physical properties on performance (Katzin et al., 2019; Leibovich et al., 2015; Leibovich-Raveh et al., 2018; Salti et al., 2017; Tomlinson et al., 2020). Avitan et al. (2022) found that changing the task goal to choose a smaller magnitude (quantity or area) instead of the larger magnitude modulated participants’ performance and interacted with task type (numerical or physical). Leibovich-Raveh et al. (2018) manipulated participants’ emphasis on accuracy or speed during a numerical-comparison task and discovered that the effect of different physical properties was dependent on participants’ emphasis on accuracy or speed. Furthermore, the experimental block design should be considered when designing a numerical-comparison experiment. Pekár and Kinder (2020) generated their stimuli using Gebuis and Reynvoet’s (2011a) GM and found increased congruency effects when the different stimuli sets were mixed in the same block compared with blocked display of stimuli. Tokita and Ishiguchi (2010) found that the effect of physical properties was modulated by practice. Thus, different block designs and amounts of practice induce different effects and should be carefully compared with one another.

The ability to compare GMs will contribute to the understanding of symbolic numerical abilities and not only to non symbolic ones. It has been found that performance in non symbolic numerical tasks can predict a range of symbolic numerical abilities. In adults, the acuity of non symbolic comparisons is associated with age-appropriate mathematical tests, such as arithmetic tests or numerical word-problem-solving tests (e.g., Braham et al., 2018; Lourenco et al., 2012; Reynvoet et al., 2021). There is also evidence that non symbolic numerical training can include symbolic arithmetic improvements (e.g., Au et al., 2018). In children, non symbolic comparison was correlated with math achievement in childhood and adolescence (Halberda et al., 2008; Holloway & Ansari, 2009; Starr et al., 2013). Neuropsychological evidence comes from children with developmental dyscalculia showing impaired non symbolic comparisons performance (Piazza et al., 2010). Despite this body of literature, the results have been inconsistent. Several studies have suggested non symbolic comparisons are not reliable or valid measures of symbolic numerical abilities (Price et al., 2012; Sasanguie et al., 2014). It was suggested that the discrepancy between studies is the result of methodological differences and especially because of differences in the stimuli GM (Gilmore et al., 2011; Norris & Castronovo, 2016; Reynvoet et al., 2021). Reynvoet et al. (2021) found non symbolic comparison performance was related only to symbolic abilities when using Panamath stimuli (Halberda et al., 2008; Halberda & Feigenson, 2008) but not Gebuis and Reynvoet (2011a) stimuli. Therefore, GM design should also be considered when predicting symbolic numerical abilities.

The aspects mentioned above go beyond the physical properties’ consideration, demonstrating the need of choosing a prism through which GM is designed. This prism is limited because one cannot consider all aspects. Therefore, there is a need for creating a way to compare results coming from different GMS.

Conclusion and Guidelines

Taken together, the variety of different GMs, properties, and control types reflects a theoretical and methodological wealth. This wealth makes it hard to maintain a common language within the field, but its advantages are apparent. The field is thriving, given the increasing number of articles, and it has a large number of venues and ideas that could be pursued.

We see the diversity in the field as a strength and do not advocate using one GM or any other way to impose a common language. Nonetheless, currently, it is difficult to compare GMs and results across experiments using different GMs. Accordingly, we recommend that authors provide explicit lists of controlled properties and control types, preferably using common terminology from the literature. Clear, consistent verbal and mathematical definitions should be supplied for all controlled properties and control types. These definitions should encompass all study properties of interest—including uncontrolled ones—relating to hypotheses, conclusions, or known biases of interest. Articulating control statements and definitions in this manner enhances comparability, replicability, interpretability, and explanatory power, making previous ideas translatable in light of newer views (Almaatouq et al., 2022; Broniatowski, 2021; Makel & Plucker, 2017; Rocca & Yarkoni, 2021; Yarkoni, 2022).

We recommend GM designers contemplate their own control choices and consider their implications. Furthermore, GMs should also provide explicit justifications for control choices grounded in prior literature and theory and discuss the implications of those control decisions. A solid and comprehensive foundation offers several advantages: (a) It serves as a bridge between theory and methodology, (b) it provides a prism through which results are interpreted, (c) it allows considering theoretical and methodological trade-offs, (d) it enhances GMs’ comparability by providing principles for comparing between GMs, and (e) it enhances GMs’ generalizability, mainly by allowing the reconciliation of novel ideas within the context of previous GMs.

We recommend providing access to the code package used to generate the stimuli. Providing access to the original code will allow more careful understanding of property calculations and control type logic. In addition, the control of some properties requires imposing numerous constraints on the stimuli, and providing the code will aid in understanding them. Finally, providing the code will allow testing of further questions and issues not tested in the original work.

The inherent associations between physical properties restrict the number that can feasibly be controlled in one study. However, property associations can inform control choices. We recommend authors consider the mathematical associations between their properties of interest a priori. Properties differ in independence given associations with other properties. A priori analysis of mathematical associations enables prioritizing properties of greater independence or relevance while accounting for realistic constraints on simultaneous control in one study. The literature offers some notable discussions of property associations (e.g., De Marco & Cutini, 2020; DeWind et al., 2015; Piazza et al., 2002; Salti et al., 2017).

Controlling some properties places numerous constraints on stimuli. Given complex intercorrelations, predicting all impacts is challenging, and it is very hard to predict how controlling one or more properties might affect other properties. Accordingly, we suggest that authors should perform postproduction analysis and examine the correlations among the different properties after the stimuli are produced. Postproduction analysis would permit understanding of how the property associations are realized in the actual stimuli. It can measure multicollinearity and other factors that can bias conclusions. Most critically, it allows evaluating whether the design and the results interpretations align with hypotheses, establishing result validity.

We suggest GM designers consider and provide comprehensive details on control distributions because full stimuli-set statistics can bias results. Alongside numerical ranges, report at least one central tendency and dispersion measure of the controls. For asymmetric distributions, include at least two central tendency measures. If applicable, state the sampling distribution used. Provide not only summaries per array but also characterize overall set properties across arrays. Thorough distribution reporting of both GM designers themselves and fellow researchers would allow for assessing possible confounds and effects from variability, skewness, and so on.

In addition, we strongly recommend providing the actual stimuli image files, even if generated procedurally. Stimuli should be paired with behavioral- and neural-response data (where applicable) for each trial. This way, new ideas could be tested by reanalyzing previous studies (e.g., Shilat et al., 2021). Moreover, supplying stimuli with responses allows extracting and examining statistical image features and correlations post hoc—yielding insights even if such data were not originally documented. Overall, releasing full stimulus sets and trial-level responses significantly enhances replicability, analytical flexibility, interpretability, and potential cumulative knowledge growth.

These recommendations are in line with open-source practices, which increase study reliability and provide community-based knowledge sharing (AlMarzouq et al., 2005; Makel & Plucker, 2017). Hopefully, these recommendations will enable researchers to better understand results obtained using different methods and provide more generalizability and reproducibility to the field (Almaatouq et al., 2022; Broniatowski, 2021; Makel & Plucker, 2017; Rocca & Yarkoni, 2021; Schooler, 2014). These recommendations allow diversity on one hand and an ability to examine old data through new prisms on the other hand. Suggested guidelines for GMs appear in Box 1. These guidelines provide a framework for stimuli control and can be implemented in other cognitive-research domains involving perceptual decisions.

Box 1.

Guidelines

1. Provide control statements Provide an explicit list of all controlled properties and control types.
2. Provide control definitions
Provide a list of explicit and consistent verbal and mathematical definitions.
3. Consider and provide justifications and implications Consider and provide grounded justifications for the control choices and discuss their implications.
4. Provide stimuli-generation code Provide the code package used to generate the stimuli.
5. Consider associations Consider property associations when choosing controls.
6. Perform postproduction analysis Examine the correlations among different properties after producing the stimuli.
7. Consider and report stimuli distribution Consider the stimuli distribution and provide comprehensive details on distributions, which allows assessing possible confounds and effects.
8. Provide stimuli and responses Provide the actual stimuli images alongside the corresponding responses to each stimulus.

Footnotes

Acknowledgements

We thank Ram Frost and Tali Leibovich-Raveh for their valuable insights. We also thank Nurit Gronau, Roy Shoval, Tal Makovski, and Maayan Trzewik for their valuable advice. Furthermore, we thank Adi Gabzu for her important insights and help with the data curation and tagging. Y. Shilat is grateful to Noga and Gefen for their support and encouragement. We wish to thank Desiree Meloul for her enlightening insights and editing the drafts of the article. Finally, we express our gratitude to the anonymous reviewers for their thorough reading of our article and their important recommendations.

Transparency

Action Editor: Rogier Kievit

Editor: David A. Sbarra

Author Contributions

Yoel Shilat: Conceptualization; Data curation; Formal analysis; Investigation; Methodology; Project administration; Software; Validation; Visualization; Writing – original draft; Writing – review & editing.

Avishai Henik: Conceptualization; Methodology; Project administration; Supervision; Writing – review & editing.

Hanit Galili: Investigation; Visualization; Writing – review & editing.

Shir Wasserman: Data curation; Investigation; Validation; Visualization; Writing – review & editing.

Alon Salzmann: Data curation; Investigation; Software; Visualization; Writing – review & editing.

Moti Salti: Conceptualization; Project administration; Supervision; Writing – original draft; Writing – review & editing.

ORCID iDs

Yoel Shilat

Avishai Henik

Hanit Galili

Moti Salti

Supplemental Material

Additional supporting information can be found at

Notes

References

Abreu De Souza

Remigio Gamba

Pedrini

(Eds.). (2018). Multi-modality imaging: Applications and computational techniques. Springer. https://doi.org/10.1007/978-3-319-98974-7

Akins

(Ed.). (1996). Perception. Oxford University Press.

Alcock

Ansari

Batchelor

Bisson

M.-J.

De Smedt

Gilmore

Göbel

S. M.

Hannula-Sormunen

Hodgen

Inglis

Jones

Mazzocco

McNeil

Schneider

Simms

Weber

(2016). Challenges in mathematical cognition: A collaboratively-derived research agenda. Journal of Numerical Cognition, 2(1), 20–41. https://doi.org/10.5964/jnc.v2i1.10

Allïk

Tuulmets

(1991). Occupancy model of perceived numerosity. Perception & Psychophysics, 49(4), 303–314. https://doi.org/10.3758/BF03205986

Almaatouq

Griffiths

T. L.

Suchow

J. W.

Whiting

M. E.

Evans

Watts

D. J.

(2022). Beyond playing 20 questions with nature: Integrative experiment design in the social and behavioral sciences. Behavioral and Brain Sciences, 47, Article e33. https://doi.org/10.1017/S0140525X22002874

AlMarzouq

Zheng

Rong

Grover

(2005). Open source: Concepts, benefits, and challenges. Communications of the Association for Information Systems, 16, Article 37. https://doi.org/10.17705/1CAIS.01637

Anderson

S. M.

Byer

O. D.

(2017). Finding polygonal areas with the corset theorem. The College Mathematics Journal, 48(3), 171–178. https://doi.org/10.4169/college.math.j.48.3.171

Apthorp

Bell

(2015). Symmetry is less than meets the eye. Current Biology, 25(7), R267–R268. https://doi.org/10.1016/j.cub.2015.02.017

Arend

L. E.

Spehar

(1993). Lightness, brightness, and brightness contrast: 1. Illuminance variation. Perception & Psychophysics, 54(4), 446–456. https://doi.org/10.3758/BF03211767

10.

Attout

Noël

M-P.

Vossius

Rousselle

(2017). Evidence of the impact of visuo-spatial processing on magnitude representation in 22q11.2 microdeletion syndrome. Neuropsychologia, 99, 296–305. https://doi.org/10.1016/j.neuropsychologia.2017.03.023

11.

Jaeggi

S. M.

Buschkuehl

(2018). Effects of non-symbolic arithmetic training on symbolic arithmetic and the approximate number system. Acta Psychologica, 185, 1–12. https://doi.org/10.1016/j.actpsy.2018.01.005

12.

Aulet

L. S.

Lourenco

S. F.

(2021). The relative salience of numerical and non-numerical dimensions shifts over development: A re-analysis of Tomlinson, DeWind, and Brannon (2020). Cognition, 210, Article 104610. https://doi.org/10.1016/j.cognition.2021.104610

13.

Aulet

L. S.

Lourenco

S. F.

(2022). No intrinsic number bias: Evaluating the role of perceptual discriminability in magnitude categorization. Developmental Science, 26(2), Article e13305. https://doi.org/10.1111/desc.13305

14.

Avitan

Galili

Henik

(2022). Less is more? Instructions modulate the way we interact with continuous features in non-symbolic dot-array comparison tasks. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4065685

15.

Bays

P. M.

Catalao

R. F. G.

Husain

(2009). The precision of visual working memory is set by allocation of a shared resource. Journal of Vision, 9(10), Article 7. https://doi.org/10.1167/9.10.7

16.

Bird

(2000). Thomas Kuhn. Acumen Publishing.

17.

Bird

Tobin

(2018). Natural kinds. In Zalta

E. N.

(Ed.), The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/spr2018/entries/natural-kinds/

18.

Blandford

Cox

A. L.

Cairns

(2008). Controlled experiments. In Cairns

Cox

A. L.

(Eds.), Research methods for human-computer interaction (pp. 1–16). Cambridge University Press. https://doi.org/10.1017/CBO9780511814570.002

19.

Bonny

J. W.

Lourenco

S. F.

(2023). Electrophysiological comparison of cumulative area and non-symbolic number judgments. Brain Sciences, 13(6), Article 975. https://doi.org/10.3390/brainsci13060975

20.

Boreman

G. D.

(2021). Modulation transfer function in optical and electro-optical systems (2nd ed.). SPIE Press.

21.

Boring

E. G.

(1954). The nature and history of experimental control. The American Journal of Psychology, 67(4), 573–589. https://doi.org/10.2307/1418483

22.

Braham

E. J.

Elliott

Libertus

M. E.

(2018). Using hierarchical linear models to examine approximate number system acuity: The role of trial-level and participant-level characteristics. Frontiers in Psychology, 9, Article 2081. https://doi.org/10.3389/fpsyg.2018.02081

23.

Broniatowski

D. A.

(2021). Psychological foundations of explainability and interpretability in artificial intelligence. National Institute of Standards and Technology. https://doi.org/10.6028/NIST.IR.8367

24.

Chen

(2014). Association between individual differences in non-symbolic number acuity and math performance: A meta-analysis. Acta Psychologica, 148, 163–172. https://doi.org/10.1016/j.actpsy.2014.01.016

25.

Clayton

Gilmore

Inglis

(2015). Dot comparison stimuli are not all alike: The effect of different visual controls on ANS measurement. Acta Psychologica, 161, 177–184. https://doi.org/10.1016/j.actpsy.2015.09.007

26.

Clewell

S. F.

Haidemos

(1983). Organizational strategies to increase comprehension. Reading World, 22(4), 314–321. https://doi.org/10.1080/19388078309557721

27.

Cohen Kadosh

Tzelgov

Henik

. (2008). A synesthetic walk on the mental number line: The size effect. Cognition, 106(1), 548–557. https://doi.org/10.1016/j.cognition.2006.12.007

28.

Cole

Leide

(2006). A cognitive framework for human information behavior: The place of metaphor in human information organizing behavior. In Spink

Cole

(Eds.), New directions in human information behavior (pp. 171–202). Springer.

29.

Coren

Girgus

J. S.

(1980). Principles of perceptual organization and spatial distortion: The gestalt illusions. Journal of Experimental Psychology: Human Perception and Performance, 6(3), 404–412. https://doi.org/10.1037/0096-1523.6.3.404

30.

Dakin

S. C.

Tibber

M. S.

Greenwood

J. A.

Kingdom

F. A. A.

Morgan

M. J.

(2011). A common visual metric for approximate number and density. Proceedings of the National Academy of Sciences, USA, 108(49), 19552–19557. https://doi.org/10.1073/pnas.1113195108

31.

Danner

H. G.

(2014). A thesaurus of English word roots. Rowman & Littlefield.

32.

Davis

Pérusse

(1988). Numerical competence in animals: Definitional issues, current evidence, and a new research agenda. Behavioral and Brain Sciences, 11(4), 561–579. https://doi.org/10.1017/S0140525X00053437

33.

Dehaene

(1997). The number sense: How the mind creates mathematics. Oxford University Press.

34.

Dehaene

Izard

Piazza

(2005). Control over non-numerical parameters in numerosity experiments [Matlab program documention]. https://www.unicog.org/publications/DehaenePiazza_ControlOverNonNumericalParameters_DocumentationDotsGeneration.pdf

35.

De Marco

Cutini

. (2020). Introducing CUSTOM: A customized, ultraprecise, standardization-oriented, multipurpose algorithm for generating nonsymbolic number stimuli. Behavior Research Methods, 52(4), 1528–1537. https://doi.org/10.3758/s13428-019-01332-z

36.

De Valois

K. K.

Switkes

. (1980). Spatial frequency specific interaction of dot patterns and gratings. Proceedings of the National Academy of Sciences, USA, 77(1), 662–665. https://doi.org/10.1073/pnas.77.1.662

37.

DeWind

N. K.

Adams

G. K.

Platt

Brannon

(2015). Modeling the approximate number system to quantify the contribution of visual stimulus features. Cognition, 142, 247–265. https://doi.org/10.1016/j.cognition.2015.05.016

38.

DeWind

N. K.

Brannon

E. M.

(2016). Significant inter-test reliability across approximate number system assessments. Frontiers in Psychology, 7, Article 310. https://doi.org/10.3389/fpsyg.2016.00310

39.

DeWind

N. K.

Brannon

E. M.

(2019, January 7). Measuring congruence effects in nonsymbolic number comparison: The importance of the degree of congruence. Methods in Numerical Cognition Workshop, Eötvös Loránd University, Budapest. https://osf.io/ds2h7

40.

Dos Santos

C. F

. (2022). Re-establishing the distinction between numerosity, numerousness, and number in numerical cognition. Philosophical Psychology, 35(8), 1152–1180. https://doi.org/10.1080/09515089.2022.2029387

41.

Durgin

F. H.

(1995). Texture density adaptation and the perceived numerosity and distribution of texture. Journal of Experimental Psychology: Human Perception and Performance, 21(1), 149–169. https://doi.org/10.1037/0096-1523.21.1.149

42.

Efford

(2000). Digital image processing: A practical introduction using Java. Addison-Wesley.

43.

Egner

(2007). Congruency sequence effects and cognitive control. Cognitive, Affective, & Behavioral Neuroscience, 7(4), 380–390. https://doi.org/10.3758/CABN.7.4.380

44.

Eriksen

C. W.

(1980). The use of a visual mask may seriously confound your experiment. Perception & Psychophysics, 28(1), 89–92. https://doi.org/10.3758/BF03204322

45.

Estes

B. W.

(1961). Judgment of size in relation to geometric shape. Child Development, 32(2), 277–286. https://doi.org/10.2307/1125941

46.

Federer

L. M.

Belter

C. W.

Joubert

D. J.

Livinski

Y.-L.

Snyders

L. N.

Thompson

(2018). Data sharing in PLOS ONE: An analysis of data availability statements. PLOS ONE, 13(5), Article e0194768. https://doi.org/10.1371/journal.pone.0194768

47.

Fisher

G. H.

Foster

J. J.

(1968). Apparent sizes of different shapes and the facility with which they can be identified. Nature, 219(5154), 653–654. https://doi.org/10.1038/219653c0

48.

French

R. S.

(1953). The discrimination of dot patterns as a function of number and average separation of dots. Journal of Experimental Psychology, 46(1), 1–9. https://doi.org/10.1037/h0059543

49.

Fricke

(2018). Semantic scholar. Journal of the Medical Library Association, 106(1). https://doi.org/10.5195/jmla.2018.280

50.

Frith

C. D.

Frith

(1972). The solitaire illusion: An illusion of numerosity. Perception & Psychophysics, 11(6), 409–410. https://doi.org/10.3758/BF03206279

51.

Gebuis

Reynvoet

(2011a). Generating nonsymbolic number stimuli. Behavior Research Methods, 43(4), 981–986. https://doi.org/10.3758/s13428-011-0097-5

52.

Gebuis

Reynvoet

(2011b). The interplay between nonsymbolic number and its continuous visual properties. Journal of Experimental Psychology: General, 141(4), 642–648. https://doi.org/10.1037/a0026218

53.

Gentner

Kurtz

K. J.

(2005). Relational categories. In Ahn

W.-K.

Goldstone

R. L.

Love

B. C.

Markman

A. B.

Wolff

(Eds.), Categorization inside and outside the laboratory: Essays in honor of Douglas L. Medin (pp. 151–175). American Psychological Association. https://doi.org/10.1037/11156-009

54.

Gilmore

Attridge

Inglis

(2011). Measuring the approximate number system. Quarterly Journal of Experimental Psychology, 64(11), 2099–2109. https://doi.org/10.1080/17470218.2011.574710

55.

Ginsburg

(1976). Effect of item arrangement on perceived numerosity: Randomness vs regularity. Perceptual and Motor Skills, 43(2), 663–668. https://doi.org/10.2466/pms.1976.43.2.663

56.

Groeneveld

R. A.

Meeden

(1984). Measuring skewness and kurtosis. The Statistician, 33(4), 391–394. https://doi.org/10.2307/2987742

57.

Guillaume

Schiltz

Van Rinsveld

(2020). NASCO: A new method and program to generate dot arrays for non-symbolic number comparison tasks. Journal of Numerical Cognition, 6(1), 129–147. https://doi.org/10.5964/jnc.v6i1.231

58.

Gusenbauer

(2019). Google Scholar to overshadow them all? Comparing the sizes of 12 academic search engines and bibliographic databases. Scientometrics, 118(1), 177–214. https://doi.org/10.1007/s11192-018-2958-5

59.

Gusenbauer

Haddaway

N. R.

(2020). Which academic search systems are suitable for systematic reviews or meta-analyses? Evaluating retrieval qualities of Google Scholar, PubMed, and 26 other resources. Research Synthesis Methods, 11(2), 181–217. https://doi.org/10.1002/jrsm.1378

60.

Guy

Medioni

(1993). Inferring global perceptual contours from local features. In Proceedings of IEEE Conference on Computer Vision and Pattern Recognition (pp. 786–787). IEEE. https://doi.org/10.1109/CVPR.1993.341175

61.

Halberda

Feigenson

(2008). Developmental change in the acuity of the “number sense”: The approximate number system in 3-, 4-, 5-, and 6-year-olds and adults. Developmental Psychology, 44(5), 1457–1465. https://doi.org/10.1037/a0012682

62.

Halberda

Eisinger

(2011). Are you sensing something? https://panamath.org/index.php

63.

Halberda

Mazzocco

M. M. M.

Feigenson

(2008). Individual differences in non-verbal number acuity correlate with maths achievement. Nature, 455(7213), 665–668. https://doi.org/10.1038/nature07246

64.

Hoffman

D. D.

(2018). The interface theory of perception. In Wixted

J. T.

(Ed.), Stevens’ handbook of experimental psychology and cognitive neuroscience (Vol. 2., pp. 1–24). Wiley. https://doi.org/10.1002/9781119170174.epcn216

65.

Holloway

I. D.

Ansari

(2009). Mapping numerical magnitudes onto symbols: The numerical distance effect and individual differences in children’s mathematics achievement. Journal of Experimental Child Psychology, 103(1), 17–29. https://doi.org/10.1016/j.jecp.2008.04.001

66.

Holmbeck

G. N.

(1997). Toward terminological, conceptual, and statistical clarity in the study of mediators and moderators: Examples from the child-clinical and pediatric psychology literatures. Journal of Consulting and Clinical Psychology, 65(4), 599–610. https://doi.org/10.1037/0022-006X.65.4.599

67.

Huntley-Fenner

Cannon

(2000). Preschoolers’ magnitude comparisons are mediated by a preverbal analog mechanism. Psychological Science, 11(2), 147–152. https://doi.org/10.1111/1467-9280.00230

68.

Jacob

(2004). Classification and categorization: A difference that makes a difference. Library Trends, 52, 515–540.

69.

Jansen

B. R. J.

Hofman

A. D.

Straatemeier

van Bers

B. M. C. W.

Raijmakers

M. E. J.

van der Maas

H. L. J.

(2014). The role of pattern recognition in children’s exact enumeration of small numbers. British Journal of Developmental Psychology, 32(2), 178–194. https://doi.org/10.1111/bjdp.12032

70.

Jones

(2015). Artificial-intelligence institute launches free science search engine. Nature. https://doi.org/10.1038/nature.2015.18703

71.

Jonker

Pennink

B. W.

(2010). The essence of research methodology. Springer. https://doi.org/10.1007/978-3-540-71659-4_2

72.

Katzin

Salti

Henik

(2019). Holistic processing of numerical arrays. Journal of Experimental Psychology: Learning, Memory, and Cognition, 45(6), 1014–1022. https://doi.org/10.1037/xlm0000640

73.

Kaur

Sharma

Mishra

Sinhababu

Chakravarty

(2022). Visual research discovery using connected papers: A use case of blockchain in libraries. The Serials Librarian, 83(2), 186–196. https://doi.org/10.1080/0361526X.2022.2142722

74.

Kersey

A. J.

Aulet

L. S.

Cantlon

J. F.

(2022). Emergence of counting in the brains of 3- to 5-year-old children. bioRxiv. https://doi.org/10.1101/2022.12.13.520249

75.

Keylock

C. J.

(2005). Simpson diversity and the Shannon-Wiener index as special cases of a generalized entropy. Oikos, 109(1), 203–207. https://doi.org/10.1111/j.0030-1299.2005.13735.x

76.

Kidwell

M. C.

Lazarević

L. B.

Baranski

Hardwicke

T. E.

Piechowski

Falkenberg

L.-S.

Kennett

Slowik

Sonnleitner

Hess-Holden

Errington

T. M.

Fiedler

Nosek

B. A.

(2016). Badges to acknowledge open practices: A simple, low-cost, effective method for increasing transparency. PLoS Biology, 14(5), Article e1002456. https://doi.org/10.1371/journal.pbio.1002456

77.

Klahr

(1973). A production system for counting, subitizing and adding. In Chase

W. G.

(Ed.), Visual information processing (pp. 527–546). Academic Press. https://doi.org/10.1016/B978-0-12-170150-5.50017-2

78.

Klein

Doll

(2021). Tests on asymmetry for ordered categorical variables. Journal of Applied Statistics, 48(7), 1180–1198. https://doi.org/10.1080/02664763.2020.1757045

79.

Krajcsi

Kojouharova

(2023). Stimulus frequency alone can account for the size effect in number comparison. Acta Psychologica, 232, Article 103817. https://doi.org/10.1016/j.actpsy.2022.103817

80.

Kuhn

T. S.

(1970). The structure of scientific revolutions. University of Chicago Press.

81.

Kuzmina

Malykh

(2022). The effect of visual parameters on nonsymbolic numerosity estimation varies depending on the format of stimulus presentation. Journal of Experimental Child Psychology, 224, Article 105514. https://doi.org/10.1016/j.jecp.2022.105514

82.

Lande

(1996). Statistics and partitioning of species diversity, and similarity among multiple communities. Oikos, 76(1), 5–13. https://doi.org/10.2307/3545743

83.

Lea

(Ed.). (2008). Oxford learner’s thesaurus: A dictionary of synonyms. Oxford University Press.

84.

Leibovich

Henik

(2013). Magnitude processing in non-symbolic stimuli. Frontiers in Psychology, 4, Article 375. https://doi.org/10.3389/fpsyg.2013.00375

85.

Leibovich

Henik

Salti

(2015). Numerosity processing is context driven even in the subitizing range: An fMRI study. Neuropsychologia, 77, 137–147. https://doi.org/10.1016/j.neuropsychologia.2015.08.016

86.

Leibovich

Katzin

Harel

Henik

(2017). From “sense of number” to “sense of magnitude”: The role of continuous magnitudes in numerical cognition. Behavioral and Brain Sciences, 40, Article e164. https://doi.org/10.1017/S0140525X16000960

87.

Leibovich-Raveh

Stein

Henik

Salti

(2018). Number and continuous magnitude processing depends on task goals and numerosity ratio. Journal of Cognition, 1(1), Article 19. https://doi.org/10.5334/joc.22

88.

Lewis

(1983). Extrinsic properties. Philosophical Studies: An International Journal for Philosophy in the Analytic Tradition, 44(2), 197–200. https://www.jstor.org/stable/4319628

89.

Lourenco

S. F.

Bonny

J. W.

Fernandez

E. P.

Rao

(2012). Nonsymbolic number and cumulative area representations contribute shared and unique variance to symbolic math competence. Proceedings of the National Academy of Sciences, USA, 109(46), 18737–18742. https://doi.org/10.1073/pnas.1207212109

90.

Mao

Zeng

Wang

Zhou

Mou

(2023). The developmental relationship between nonsymbolic and symbolic fraction abilities. Journal of Experimental Child Psychology, 232, Article 105666. https://doi.org/10.1016/j.jecp.2023.105666

91.

Lyons

I. M.

Price

G. R.

Vaessen

Blomert

Ansari

(2014). Numerical predictors of arithmetic success in grades 1-6. Developmental Science, 17(5), 714–726. https://doi.org/10.1111/desc.12152

92.

MacGillivray

H. L.

(1986). Skewness and asymmetry: Measures and orderings. The Annals of Statistics, 14(3), 994–1011. https://doi.org/10.1214/aos/1176350046

93.

Makel

M. C.

Plucker

J. A.

(Eds.). (2017). Toward a more perfect psychology: Improving trust, accuracy, and transparency in research. American Psychological Association. https://doi.org/10.1037/0000033-000

94.

Maldonado Moscoso

P. A.

Maduli

Anobile

Arrighi

Castaldi

. (2023). The symmetry-induced numerosity illusion depends on visual attention. Scientific Reports, 13(1), Article 12509. https://doi.org/10.1038/s41598-023-39581-w

95.

Malykh

Tarasov

Baeva

Nikulchev

Kolyasnikov

Ilin

Marnevskaia

Malykh

Ismatullina

Kuzmina

(2023). Large-scale study of the precision of the approximate number system: Differences between formats, heterogeneity and congruency effects. Heliyon, 9(4), Article e14912. https://doi.org/10.1016/j.heliyon.2023.e14912

96.

Mandler

Shebo

B. J.

(1982). Subitizing: An analysis of its component processes. Journal of Experimental Psychology: General, 111(1), 1–22. https://doi.org/10.1037/0096-3445.111.1.1

97.

Mann

H. B.

Whitney

D. R.

(1947). On a test of whether one of two random variables is stochastically larger than the other. The Annals of Mathematical Statistic, 18(1), 50–60. http://www.jstor.org/stable/2236101

98.

Marchant

A. P.

Simons

D. J.

de Fockert

J. W.

(2013). Ensemble representations: Effects of set size and item heterogeneity on average size perception. Acta Psychologica, 142(2), 245–250. https://doi.org/10.1016/j.actpsy.2012.11.002

99.

Marshall

Weatherson

(2023). Intrinsic vs. extrinsic properties. In Zalta

E. N.

Nodelman

(Eds.), The Stanford encyclopedia of philosophy. Metaphysics Research Lab, Stanford University. https://plato.stanford.edu/archives/fall2023/entries/intrinsic-extrinsic/

100.

Martone

M. E.

Garcia-Castro

VandenBos

G. R.

(2018). Data sharing in psychology. American Psychologist, 73(2), 111–125. https://doi.org/10.1037/amp0000242

101.

McCloskey

M. E.

Glucksberg

(1978). Natural categories: Well defined or fuzzy sets? Memory & Cognition, 6(4), 462–472. https://doi.org/10.3758/BF03197480

102.

McKnight

P. E.

Najab

(2010). Mann-Whitney U Test. In Weiner

I. B.

Craighead

W. E.

(Eds.), The Corsini encyclopedia of psychology. Wiley. https://doi.org/10.1002/9780470479216.corpsy0524

103.

Mehler

Bever

T. G.

(1967). Cognitive capacity of very young children. Science, 158(3797), 141–142. https://doi.org/10.1126/science.158.3797.141

104.

Melara

R. D.

Mounts

J. R.

(1993). Selective attention to Stroop dimensions: Effects of baseline discriminability, response mode, and practice. Memory & Cognition, 21(5), 627–645. https://doi.org/10.3758/bf03197195

105.

Mervis

C. B.

Rosch

(1981). Categorization of natural objects. Annual Review of Psychology, 32(1), 89–115. https://doi.org/10.1146/annurev.ps.32.020181.000513

106.

Moyer

R. S.

Landauer

T. K.

(1967). Time required for judgements of numerical inequality. Nature, 215(5109), 1519–1520. https://doi.org/10.1038/2151519a0

107.

Navon

(1977). Forest before trees: The precedence of global features in visual perception. Cognitive Psychology, 9(3), 353–383. https://doi.org/10.1016/0010-0285(77)90012-3

108.

Nelson

T. M.

Bartley

S. H.

(1961). Numerosity, number, arithmetization, measurement and psychology. Philosophy of Science, 28(2), 178–203. https://doi.org/10.1086/287799

109.

Norris

Castronovo

(2016). Dot display affects approximate number system acuity and relationships with mathematical achievement and inhibitory control. PLOS ONE, 11(5), Article e0155543. https://doi.org/10.1371/journal.pone.0155543

110.

Odic

Halberda

(2015). Eye movements reveal distinct encoding patterns for number and cumulative surface area in random dot arrays. Journal of Vision, 15(15), Article 5. https://doi.org/10.1167/15.15.5

111.

Park

(2022). Flawed stimulus design in additive-area heuristic studies. Cognition, 229, Article 104919. https://doi.org/10.1016/j.cognition.2021.104919

112.

Pearson

(1895). X. Contributions to the mathematical theory of evolution.—II. Skew variation in homogeneous material. Philosophical Transactions of the Royal Society of London A, 186, 343–414. https://doi.org/10.1098/rsta.1895.0010

113.

Pekár

Kinder

(2020). The interplay between non-symbolic number and its continuous visual properties revisited: Effects of mixing trials of different types. Quarterly Journal of Experimental Psychology, 73(5), 698–710. https://doi.org/10.1177/1747021819891068

114.

Pelham

B. W.

Blanton

(2007). Conducting research in psychology: Measuring the weight of smoke (3rd ed.). Thomson Wadsworth.

115.

Peli

(1990). Contrast in complex images. Journal of the Optical Society of America A, 7(10), 2032–2040. https://doi.org/10.1364/JOSAA.7.002032

116.

Piaget

(1968). Quantification, conservation, and nativism: Quantitative evaluations of children aged two to three years are examined. Science, 162(3857), 976–979. https://doi.org/10.1126/science.162.3857.976

117.

Piazza

Facoetti

Trussardi

A. N.

Berteletti

Conte

Lucangeli

Dehaene

Zorzi

(2010). Developmental trajectory of number acuity reveals a severe impairment in developmental dyscalculia. Cognition, 116(1), 33–41. https://doi.org/10.1016/j.cognition.2010.03.012

118.

Piazza

Izard

Pinel

Bihan

D. L.

Dehaene

(2004). Tuning curves for approximate numerosity in the human intraparietal sulcus. Neuron, 44(3), 547–555. https://doi.org/10.1016/j.neuron.2004.10.014

119.

Piazza

Mechelli

Butterworth

Price

C. J.

(2002). Are subitizing and counting implemented as separate or functionally overlapping processes? NeuroImage, 15(2), 435–446. https://doi.org/10.1006/nimg.2001.0980

120.

Price

G. R.

Palmer

Battista

Ansari

(2012). Nonsymbolic numerical magnitude comparison: Reliability and validity of different task variants and outcome measures, and their relationship to arithmetic achievement in adults. Acta Psychologica, 140(1), 50–57. https://doi.org/10.1016/j.actpsy.2012.02.008

121.

Prosser

A. M. B.

Hamshaw

R. J. T.

Meyer

Bagnall

Blackwood

Huysamen

Jordan

Vasileiou

Walter

(2023). When open data closes the door: A critical examination of the past, present and the potential future for open data guidelines in journals. British Journal of Social Psychology, 62(4), 1635–1653. https://doi.org/10.1111/bjso.12576

122.

Rangelov

Fellrath

Mattingley

J. B.

(2022). Integrated perceptual decisions rely on parallel evidence accumulation. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.4156143

123.

Reynvoet

Ribner

A. D.

Elliott

Van Steenkiste

Sasanguie

Libertus

M. E.

(2021). Making sense of the relation between number sense and math. Journal of Numerical Cognition, 7(3), 308–327. https://doi.org/10.5964/jnc.6059

124.

Rocca

Yarkoni

(2021). Putting psychology to the test: Rethinking model evaluation through benchmarking and prediction. Advances in Methods and Practices in Psychological Science, 4(3). https://doi.org/10.1177/25152459211026864

125.

Rodríguez

Ferreira

R. A.

(2023). To what extent is dot comparison an appropriate measure of approximate number system? Frontiers in Psychology, 13, Article 1065600. https://doi.org/10.3389/fpsyg.2022.1065600

126.

Ross

(2003). Visual discrimination of number without counting. Perception, 32(7), 867–870. https://doi.org/10.1068/p5029

127.

Rousselle

Noël

M.-P.

(2008). The development of automatic numerosity processing in preschoolers: Evidence for numerosity-perceptual interference. Developmental Psychology, 44(2), 544–560. https://doi.org/10.1037/0012-1649.44.2.544

128.

Rousselle

Palmers

Noël

M.-P.

(2004). Magnitude comparison in preschoolers: What counts? Influence of perceptual variables. Journal of Experimental Child Psychology, 87(1), 57–84. https://doi.org/10.1016/j.jecp.2003.10.005

129.

Rudmin

F. W.

(2003). Critical history of the acculturation psychology of assimilation, separation, integration, and marginalization. Review of General Psychology, 7(1), 3–37. https://doi.org/10.1037/1089-2680.7.1.3

130.

Salti

Harel

Marti

(2019). Conscious perception: Time for an update? Journal of Cognitive Neuroscience, 31(1), 1–7. https://doi.org/10.1162/jocn_a_01343

131.

Salti

Katzin

Leibovich

Henik

(2017). One tamed at a time: A new approach for controlling continuous magnitudes in numerical comparison tasks. Behavior Research Methods, 49(3), 1120–1127. https://doi.org/10.3758/s13428-016-0772-7

132.

Sanford

E. M.

Halberda

(2023). Successful discrimination of tiny numerical differences. Journal of Numerical Cognition, 9(1), 196–205. https://doi.org/10.5964/jnc.10699

133.

Sasanguie

Defever

Maertens

Reynvoet

(2014). The approximate number system is not predictive for symbolic number processing in kindergarteners. Quarterly Journal of Experimental Psychology, 67(2), 271–280. https://doi.org/10.1080/17470218.2013.803581

134.

Schooler

J. W.

(2014). Metascience could rescue the ‘replication crisis.’ Nature, 515(7525), Article 9. https://doi.org/10.1038/515009a

135.

Sekuler

Blake

(1994). Perception (3rd ed.). McGraw-Hill.

136.

Shepard

R. N.

(1981). Psychological relations and psychophysical scales: On the status of “direct” psychophysical measurement. Journal of Mathematical Psychology, 24(1), 21–57. https://doi.org/10.1016/0022-2496(81)90034-1

137.

Shilat

Salti

Henik

(2021). Shaping the way from the unknown to the known: The role of convex hull shape in numerical comparisons. Cognition, 217, Article 104893. https://doi.org/10.1016/j.cognition.2021.104893

138.

Simpson

E. H.

(1949). Measurement of diversity. Nature, 163(4148), 688–688. https://doi.org/10.1038/163688a0

139.

Smets

Moors

Reynvoet

(2016). Effects of presentation type and visual control in numerosity discrimination: Implications for number processing? Frontiers in Psychology, 7, Article 66. https://doi.org/10.3389/fpsyg.2016.00066

140.

Smets

Sasanguie

Szücs

Reynvoet

(2015). The effect of different methods to construct non-symbolic stimuli in numerosity estimation and comparison. Journal of Cognitive Psychology, 27(3), 310–325. https://doi.org/10.1080/20445911.2014.996568

141.

Solon

(1980). The Pyramid Diagram: A college study skills tool. Journal of Reading, 23(7), 594–597. https://doi.org/10.230/40017000

142.

Soltész

Szűcs

(2010). Relationships between magnitude representation, counting and memory in 4- to 7-year-old children: A developmental study. Behavioral and Brain Functions, 6(1), Article 13. https://doi.org/10.1186/1744-9081-6-13

143.

Sophian

Chu

(2008). How do people apprehend large numerosities? Cognition, 107(2), 460–478. https://doi.org/10.1016/j.cognition.2007.10.009

144.

Stäb

Ilg

U. J.

(2021). Video-game play and non-symbolic numerical comparison. Addiction Biology, 26(6), Article e13065. https://doi.org/10.1111/adb.13065

145.

Stalnaker

(2002). Common ground. Linguistics and Philosophy, 25(5/6), 701–721. https://doi.org/10.1023/A:1020867916902

146.

Starr

Libertus

M. E.

Brannon

E. M.

(2013). Number sense in infancy predicts mathematical abilities in childhood. Proceedings of the National Academy of Sciences, USA, 110(45), 18116–18120. https://doi.org/10.1073/pnas.1302751110

147.

Stevens

S. S.

Volkmann

Newman

E. B.

(1937). A scale for the measurement of the psychological magnitude pitch. The Journal of the Acoustical Society of America, 8(3), 185–190. https://doi.org/10.1121/1.1915893

148.

Tedersoo

Küngas

Oras

Köster

Eenmaa

Leijen

Ä.

Pedaste

Raju

Astapova

Lukner

Kogermann

Sepp

(2021). Data sharing practices and data availability upon request differ across scientific disciplines. Scientific Data, 8(1), Article 192. https://doi.org/10.1038/s41597-021-00981-0

149.

Tenopir

Dalton

E. D.

Allard

Frame

Pjesivac

Birch

Pollock

Dorsett

(2015). Changes in data sharing and data reuse practices and perceptions among scientists worldwide. PLOS ONE, 10(8), Article e0134826. https://doi.org/10.1371/journal.pone.0134826

150.

Tokita

Ishiguchi

(2010). How might the discrepancy in the effects of perceptual variables on numerosity judgment be reconciled? Attention, Perception & Psychophysics, 72(7), 1839–1853. https://doi.org/10.3758/APP.72.7.1839

151.

Tomlinson

R. C.

DeWind

N. K.

Brannon

E. M.

(2020). Number sense biases children’s area judgments. Cognition, 204, Article 104352. https://doi.org/10.1016/j.cognition.2020.104352

152.

Tonelli

Togoli

Arrighi

Gori

(2022). Deprivation of auditory experience influences numerosity discrimination, but not numerosity estimation. Brain Sciences, 12(2), Article 179. https://doi.org/10.3390/brainsci12020179

153.

Torday

J. S.

Baluška

(2019). Why control an experiment? From empiricism, via consciousness, toward implicate order. EMBO Reports, 20(10), Article e49110. https://doi.org/10.15252/embr.201949110

154.

Tran

U. S.

Lallai

Gyimesi

Baliko

Ramazanova

Voracek

(2021). Harnessing the fifth element of distributional statistics for psychological science: A practical primer and shiny app for measures of statistical inequality and concentration. Frontiers in Psychology, 12, Article 716164. https://doi.org/10.3389/fpsyg.2021.716164

155.

Trezona

P. W.

(2000). Luminance: Its use and misuse. Color Research & Application, 25(2), 145–147. https://doi.org/10.1002/(SICI)1520-6378(200004)25:2<145:AID-COL9>3.0.CO;2-0

156.

Trick

L. M.

Pylyshyn

Z. W.

(1994). Why are small and large numbers enumerated differently? A limited-capacity preattentive stage in vision. Psychological Review, 101(1), 80–102. https://doi.org/10.1037/0033-295X.101.1.80

157.

Waskom

M. L.

Okazawa

Kiani

(2019). Designing and interpreting psychophysical investigations of cognition. Neuron, 104(1), 100–112. https://doi.org/10.1016/j.neuron.2019.09.016

158.

Wei

X.-X.

Stocker

A. A.

(2017). Lawful relation between perceptual bias and discriminability. Proceedings of the National Academy of Sciences, USA, 114(38), 10244–10249. https://doi.org/10.1073/pnas.1619153114

159.

Wilcoxon

(1945). Individual comparisons by ranking methods. Biometrics Bulletin, 1(6), 80–83. https://doi.org/10.2307/3001968

160.

Yarkoni

(2022). The generalizability crisis. Behavioral and Brain Sciences, 45, Article e1. https://doi.org/10.1017/S0140525X20001685

161.

Yousif

S. R.

Keil

F. C.

(2019). The additive-area heuristic: An efficient but illusory means of visual area approximation. Psychological Science, 30(4), 495–503. https://doi.org/10.1177/0956797619831617

162.

Zanon

Potrich

Bortot

Vallortigara

(2021). Towards a standardization of non-symbolic numerical experiments: GeNEsIS, a flexible and user-friendly tool to generate controlled stimuli. Behavior Research Methods, 4, 146–157. https://doi.org/10.3758/s13428-021-01580-y

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

1.08 MB

0.00 MB

A Methodological Framework for Stimuli Control: Insights From Numerical Cognition

Abstract

Keywords

Stimuli Control in Non Symbolic Numerical-Comparison Studies

The Current Review

Method

Procedure overview

GMs database

Provided data

Identifying controls

Control types

Controlled properties

Theory-driven GMs

GMs’ definitions and terminology

Results

Qualitative analysis

GMs list analysis

Three types of stimuli-reproduction data sharing

Control type is not related to the controlled properties

Half of the GMs were not tagged as theory-driven

GMs’ properties definitions and terminology

Properties definitions and the case of density

Inconsistent terminology

Quantitative analysis

Stimuli reproducibility

Fewer than half of the GMs are reproducible, and only one provided the actual stimuli

GMs’ stimuli reproducibility slightly increased over the years

Controlled properties

Low to medium similarity in some controlled properties, with most GMs controlling total surface area

High diversity in the controlled properties

Most GMs controlled a few properties but were diverse in the number of controlled properties

Control for intrinsic properties was more common

Over the years, the types of controlled properties increased, but individual GMs’ control did not

Controlled types

More than half of the GMs used congruency, saliency, and sizes heterogeneity within as control types

GMs employ a stable number of control types throughout the years

Discussion

Additional GM frameworks

The multifaced control of non symbolic numerical stimuli

Conclusion and Guidelines

Footnotes

Acknowledgements

Transparency

ORCID iDs

Supplemental Material

Notes

References

Supplementary Material