Abstract
People across cultures engage in various practices that alter their appearance (e.g., makeup, tanning, facial aesthetic treatment). Theories in social and evolutionary psychology propose that the primary function of these practices is to create an appearance perceived more positively by others, ultimately resulting in more favorable outcomes in social, romantic, or professional relations. In two preregistered studies that improved upon and extended prior work, we tested the effect of popular types of minimally invasive facial aesthetic treatment on how people are perceived by others. Study 1 (2,720 raters, 114 targets) showed that treatment significantly increased perceived attractiveness (a 0.09-point change on a seven-point scale), but not perceived approachability (e.g., trustworthiness) or capability (e.g., competence). Study 2 (481 raters, 81 targets) showed that treatment significantly increased targets’ desirability as a short-term romantic partner (a 0.10-point change on a seven-point scale) and as a platonic friend (a 0.08-point change on a seven-point scale), but not their desirability as a long-term romantic partner. Thus, our results suggest that a single session of minimally invasive facial aesthetic treatment leads to more positive perceptions on dimensions related to attractiveness, but these effects are relatively small.
Introduction
When encountering a stranger, people spontaneously judge their attractiveness, trustworthiness, competence, and other socially relevant traits based on facial appearance alone (Freeman & Johnson, 2016; Todorov et al., 2015). These judgments can have important downstream consequences. First impressions have been shown to influence a host of important outcomes, including interpersonal trust, dating decisions, criminal sentencing, personnel selection, and voting (Maestripieri et al., 2017; Olivola et al., 2014; van Vugt & Grabo, 2015). Given the importance of a person's facial appearance in everyday life, it is not surprising that people across the world engage in various practices that alter their appearance (Davis & Arnocky, 2022; Kowal et al., 2022; Maisel et al., 2018).
Tanning, cosmetic products, and even diet can lead to transitory changes in facial appearance, but aesthetic treatments, such as plastic surgery or injectable treatment can produce longer lasting or even permanent changes. Minimally invasive treatments with neurotoxin (e.g., botulinum toxin), dermal fillers, or skin lasers are particularly popular. In 2023, more than 25 million of these procedures were performed in the United States (American Society of Plastic Surgeons, 2023). Women are more likely to opt for treatment compared to men. Based on a large-scale survey of clinics in the Netherlands, Decates et al. (2024) estimated that approximately one in 29 Dutch women between the ages of 18 and 70 had received neurotoxin treatment, whereas one in 35 had received treatment with dermal fillers.
Theories in social and evolutionary psychology explain the prevalence of these practices with the benefits they confer on the individual who engages in them (Davis & Arnocky, 2022). Presumably, people opt for these treatments because they produce a facial appearance that elicits more favorable impressions (e.g., the person is seen as more attractive or competent), which may ultimately lead to more favorable outcomes in the job market, dating market, or in other domains. Here, we tested this prediction in two large, preregistered studies that examined the effect of aesthetic treatment on first impressions (Study 1, n = 2,720 raters, k = 114 targets) and partner preferences (Study 2, n = 481 raters, k = 81 targets).
Facial First Impressions
From an early age, people spend a considerable amount of time looking at others’ faces (Morrisey et al., 2019; Sugden & Moulson, 2019; but see Varela et al., 2023) to extract a wealth of information about a person, such as their gender, age, or emotional state (Bruce & Young, 2012; Hugenberg & Wilson, 2013). People also form judgments about socially relevant trait dimensions based on an individual's facial appearance. Although there is some disagreement on the exact number and content of trait dimensions, previous studies have mostly converged on three core dimensions: Attractiveness, Approachability, and Capability (Jones et al., 2021; Lin et al., 2021; Oosterhof & Todorov, 2008; Sutherland et al., 2013). These results fit with functional views of first impressions: Attractiveness impressions capture inferences of youthfulness and health and are thought to primarily function as an assessment of a person's mate value; Approachability impressions capture inferences of trustworthiness and friendliness and are thought to primarily function as an assessment of a person's intentions; Capability impressions capture inferences of competence and dominance and are thought to primarily function as an assessment of a person's ability to implement their intentions (Oosterhof & Todorov, 2008; Zebrowitz, 2012). People across cultures judge others along these dimensions with some consensus, suggesting that there are shared beliefs that link certain facial features to specific characteristics (Stolier et al., 2020; Sutherland et al., 2019).
Changing Facial Appearance
Previous work has uncovered a long list of facial features that influence first impressions, such as skin texture and wrinkles (Hess et al., 2023; Jones et al., 2012), averageness and gender-typicality (Jones & Jaeger, 2019; Sofer et al., 2015), skin color (Fink et al., 2006), and resemblances to emotion expressions (e.g., naturally tilted corners of the mouth that resemble a subtle smile or scowl; Jaeger & Jones, 2022; Oosterhof & Todorov, 2008). Although the effects of facial appearance on impressions and behavior are usually studied either by examining natural variation in facial features or by digitally altering facial features in photos and videos, people also engage in various practices that change their appearance to elicit more positive first impressions. People across cultures try to enhance their physical attractiveness with cosmetic products, tanning, dieting, exercising, or various other strategies (Davis & Arnocky, 2022). In a recent study with more than 93,000 participants from 93 countries, 99% of participants reported spending more than ten minutes per day on activities aimed at enhancing physical attractiveness (Kowal et al., 2022). These practices are particularly prevalent among women, presumably because physical attractiveness is a more important factor in the mate preferences of men (Davis & Arnocky, 2022; Kowal et al., 2022).
One way to change one's facial appearance is via facial aesthetic treatment. This includes surgical procedures, such as face lifts and eyelid corrections, but also less invasive procedures, such as injections with neurotoxin in the forehead muscles and fillers in the lips. Minimally invasive procedures in particular have become increasingly popular in recent years in countries such as the United States (American Society of Plastic Surgeons, 2023) and the Netherlands (Decates et al., 2024). Changing ones appearance to project a positive first impression is an important motivation for pursuing these treatments: When asked about why they were pursuing aesthetic treatment, many named the desire to make a better first impression on others (74%), to look more attractive as a romantic partner (74%), and to look more sexually appealing (48%; Maisel et al., 2018). Most report increased satisfaction with their facial appearance after treatment (Cohen et al., 2022; Weinkle et al., 2018). They also report that it is easier to make new friends and that they make a better impression on others (Cohen et al., 2022). Thus, people report that others perceive them more favorably after treatment (Maisel et al., 2018). This is in line with social and evolutionary theories that would predict that these benefits are (at least partly) why people opt for treatment in the first place (Davis & Arnocky, 2022). But how much do these procedures actually change how individuals are perceived by others?
Various studies have tested whether minimally invasive facial aesthetic treatment cause people to be perceived more positively by others. The majority of this work tested whether people are perceived as younger, healthier, and more attractive after (vs. before) undergoing aesthetic treatment (e.g., Przylipiak et al., 2019; Shah & Rieder, 2021). Some studies also examined effects on other dimensions of first impressions, including perceived trustworthiness, friendliness, competence, and intelligence (Bater et al., 2017; Dayan et al., 2008, 2010, Dayan, Bacos, et al., 2019, Dayan, Rivkin, et al., 2019; Othman et al., 2021; Van Den Elzen et al., 2023). Whereas most studies found that people were perceived more positively on these different dimensions after treatment, these studies also share several methodological limitations, which questions whether the results provide an accurate estimate of the treatment effect. We highlight three key limitations of previous studies.
First, in some studies, pre- and posttreatment photos were presented next to each other and participants rated which version scores higher on some trait of interest (Othman et al., 2021; Przylipiak et al., 2019). This method highlights even subtle differences in facial features, which may go unnoticed in everyday life, and can therefore dramatically inflate the estimated effect of treatment on first impressions.
Second, many studies relied on very small samples of targets (e.g., k = 12 in Bater et al., 2017; k = 20 in Dayan, Bacos, et al., 2019; k = 17 in Dayan et al., 2008; k = 8 in Othman et al., 2021; k = 34 in Van Den Elzen et al., 2023), which limits the reliability and generalizability of the results. It should be noted that in most studies, relatively large samples of raters were recruited (e.g., n = 504 in Bater et al., 2017; n = 304 in Dayan et al., 2010; n = 393 in Van Den Elzen et al., 2023). Statistical power to detect effects in studies in which multiple raters evaluate multiple targets is a function of both the rater and target sample size. It is therefore plausible that due to the large rater samples, the studies were sufficiently powered to detect even small effects. Still, because effect sizes often vary substantially across different stimuli, results from studies with small stimulus samples are more likely to yield unreliable results and imprecise effect size estimates (Simonsohn et al., 2024; Wells & Windschitl, 1999).
A key challenge when comparing pre- and posttreatment images of faces is to eliminate all other factors that could influence first impressions, such as differences in facial expression, head position, makeup, jewelry, skin tone, and visible clothing. Although most studies describe some efforts to address these confounds (e.g., instructing participants to remove makeup and jewelry and to maintain a neutral expression), some of these are more difficult eliminate (e.g., targets may display subtle forms of positive affect after a successful treatment, seasonal changes may lead to a more tanned appearance). The influence of potential confounds in previous studies is unknown because the extent of standardization efforts is often not described in detail and images are not openly accessible. However, when stimulus samples are small, even subtle confounds in a limited number of images can strongly influence the results.
Third, and perhaps most worryingly, almost all studies estimated the effects of aesthetic treatment by first averaging impressions for a given target across all raters. 1 Studies in which multiple participants respond to multiple stimuli where the primary goal is to test the effect of some stimulus characteristic on responses (e.g., participants rating pre- and posttreatment photos on attractiveness) are ubiquitous in psychology and many other disciplines. In these study designs, both participants and stimuli represent random factors because they are sampled from a larger population of interest that the researcher wants to make inferences about (e.g., people who undergo aesthetic treatment). However, as noted in seminal studies by Clark (1973) and Judd et al. (2012), researchers often ignore (i.e., do not statistically model) both sources of variance in their analyses. Rather, researchers commonly average across different (groups of) stimuli to create average scores per participant. This practice can dramatically increase false positive findings and (perhaps counterintuitively) the false positive rate is highest when stimulus samples are small and participant samples are large. In their simulations, Judd et al. found a false positive rate of 50% with 30 stimuli and 90 participants (the maximum number of participants they tested).
We found one study that estimated the effects of aesthetic treatment and modeled both raters and targets as random effects in a mixed-effects model (Bater et al., 2017), but most did not (e.g., Dayan et al., 2008, 2010; Dayan, Rivkin, et al., 2019; Van Den Elzen et al., 2023). Thus, the risk of false positive results in this literature appears high, especially when taking into account the common combination of small stimulus samples and large rater samples, next to other factors that favor the publication of positive results (i.e., publication bias; Francis, 2014). In sum, a critical review of the literature suggests that the effect of facial aesthetic treatment on first impressions may be less reliable or less strong than suggested by previous investigations.
The Present Studies
We present the results of two preregistered studies that examine the effect of minimally invasive facial aesthetic treatment (i.e., injections of neurotoxin and dermal fillers) on first impressions and partner preferences, which aimed to address the methodological and statistical limitations of previous work. Rather than digitally altering facial features to study the impact on social outcomes, as is common in the first impression literature (e.g., Jaeger et al., 2020; Little et al., 2012; Olivola et al., 2014), we study the effects of real-life changes in facial appearance resulting from aesthetic treatment. Study 1 tested the effects of aesthetic treatment on sixteen different trait dimensions, of which eleven were explicitly included to capture core dimensions of impression formation (Attractiveness, Approachability, and Capability; Oosterhof & Todorov, 2008; Sutherland et al., 2013). Although we include more trait dimensions than most previous studies, our primary goal in Study 1 was to replicate previous work which has exclusively focus on effects of facial treatment on trait impressions. In Study 2, we extended this focus by examining effects on partner preferences. We embedded images in simplified profiles with additional information that is commonly found on dating and social networking profiles to test if aesthetic treatment increases the perceived desirability of targets as a potential partner for a more serious, long-term relationship, as a potential partner for a more casual, shorter-term relationship, and as a potential friend. The aim was to simulate behavior in a dating context and investigate whether aesthetic treatment influences contextualized behavior compared to more abstract trait ratings. This setup also allowed us to examine whether effects of treatment emerge even when people have access to other pieces of information on targets.
Our studies address critical limitations of previous work on this topic. We recruited considerably larger samples of targets and raters (2,720 raters and 114 targets in Study 1) compared to previous studies (e.g., Bater et al., 2017; Dayan et al., 2008; Othman et al., 2021). We obtained a larger initial sample of targets that were photographed under standardized conditions. Unlike some prior studies were multiple treatment sessions took place between the pre- and posttreatments images (e.g., Van Den Elzen et al., 2023), only one treatment session took place between the two images in the current study, which allowed us to isolate the effect of a single treatment session with injectables. Targets were demographically more diverse with a larger age range and, unlike almost all previous studies, we also included male targets in Study 1. Based on a predefined list of potential differences between pre- and posttreatment images (e.g., hairstyle, skin tone, emotion expression), three raters independently screened a large stimulus set to eliminate potential confounds. Participants viewed one target at a time and never saw both pre- and posttreatment photos of the same target. Differences due to treatment were thus never due to ratings by the same participants, but always between participants. We estimate mixed-effects regression models that treat both raters and targets as random factors (Judd et al., 2012). In sum, our study design and analysis approach should yield more precise effect size estimates and more generalizable results than previous work on the topic.
Our data, study materials, preregistration documents, and analysis scripts are available at: https://osf.io/q5kjc/. Note that we are not able to share individual target images because of privacy reasons.
Study 1
Following previous work on the effects of facial aesthetic treatment (e.g., Dayan, Rivkin, et al., 2019; Van Den Elzen et al., 2023), Study 1 tested for differences in trait judgments between pre- and posttreatment images in large samples of raters (n = 2,720) and targets (k = 114). We included 11 trait dimensions that were found to best reflect judgments of three central dimensions of impression formation: Attractiveness (attractive, youthful, healthy), Approachability (trustworthy, honest, outgoing, friendly), and Capability (dominant, powerful, competent, intelligent; Oosterhof & Todorov, 2008; Sutherland et al., 2013). We also assessed ratings on five additional trait dimensions that are often studied in the first impressions literature for exploratory purposes (beautiful, charismatic, energetic, leader-like, and femininity/masculinity) and ratings of perceived treatment history to test how accurately people can detect whether a person has undergone aesthetic treatment.
Methods
This study was preregistered: https://aspredicted.org/DHD_WRC.
Participants
Given the absence of previous work with a similar design and analysis approach, we adopted the following rule of thumb for determining our sample size. A simulation study suggested that ratings by 20–40 participants are required to obtain relatively reliable mean ratings for a target on rating dimensions similar to the ones tested here (Hehman et al., 2018). We therefore planned to have every image rated by 40 participants. The current study had 68 between-subjects conditions (a more detailed description of the conditions follows below), resulting in a required sample size of 2,720 participants. A total of 2,827 Prolific workers from the United States and Canada completed the study. Participants received £1.05 and the median completion time was approximately seven minutes. We excluded 52 responses because the same participant had already completed the study before and we only retained the data from their first completion (this exclusion criterion was not preregistered). Following our preregistered exclusion criteria, we also excluded six participants who reported having taken the study on a phone and 33 participants who always gave the same rating across all trials. No participants had to be excluded because they reported poor English proficiency or because they had missing rating data for at least on trial. This resulted in a final sample of 2,720 participants (Mage = 47.34, SDage = 15.15; 48.71% female, 50.26% male, 0.99% nonbinary).
Stimuli
We collaborated with a clinic for facial aesthetic treatment located in the Netherlands, the Netherlands. From a total of 1,167 individuals that visited the clinic for facial aesthetic treatment one or multiple times, a preselection of 240 potentially suitable pre- and posttreatment photo pairs was made by the second author (see inclusion criteria below). The pretreatment photos were taken on the day of the treatment. The posttreatment photos were taken on average 41 days after the treatment (min = 0 days, max = 178 days). No other facial aesthetic treatments were performed between the two time points. Individuals received an average of 2.3 neurotoxin or filler treatments during the session (min = 1, max = 9) in different regions of the face (e.g., neurotoxin injections in the forehead, crow's feet, or corners of the mouth, filler injections in the temple, jawline, chin, or upper and lower lips).
The images were created using a standardized procedure. Individuals were photographed in the same room with the same background and lighting and at the same distance with a Canon EOS RP (focal length 85 mm, f/16, exposure time 1/125). After selection of the final image set, preprocessing was applied in order to further standardize the stimuli. A pretrained transformer-based face segmentation method (Zheng et al., 2022) was used to detect the face and create a uniform background color. Images were standardized to the same size (736 × 915 pixels). Using landmark detection, all faces where aligned and centered such that the eyes where at the same height in the image. We did not apply any nonlinear transformations to keep the aspect ratio and properties of the image intact (see Figure 1 for example images).

Examples of pretreatment (left) and posttreatment (right) images. Here, we only include photos in which the face regions of four different individuals were merged to protect the anonymity of the target individuals. In the study, participants viewed full, uncropped photos of individuals.
Three people (the first and second authors and another employee at RealFaceValue) independently coded the images for suitability. We employed strict, predefined inclusion criteria. Images were excluded if a part of the face was occluded (e.g., a hairstyle that covers the forehead) or if images were blurry. Crucially, to rule out any confounds, we excluded images if there were even small differences in facial expression, head position, hairstyle, facial hair, makeup, accessories, or lighting between the pre- and posttreatment images. Disagreements on the inclusion of an image were resolved via a discussion among the three coders. This resulted in a final sample of 114 unique individuals with eligible pre- and posttreatment photo pairs. Targets’ age ranged from 21 to 72 (Mage = 47.01, SDage = 12.58) and the majority was female (80.70% female, 19.30% male).
We created four separate stimulus sets. We broadly matched targets’ and raters’ age, dividing targets into two age groups of approximately the same size: 58 younger targets (21–48 years old) and 56 older targets (49–72 years old). For each age set, we recruited participants that broadly matched the targets’ age: 20–55-year-old participants to rate the younger targets and 50–85-year-old participants to rate the older targets. Within each age set, we created two sets of stimuli. Each stimulus set included one image per target, either the pre- or the posttreatment image. Within each stimulus set, half of the images were pretreatment images and half were posttreatment images. Image sets were created by sorting targets by age (separately for men and women), numbering targets, and selecting the pretreatment image of all odd-numbered targets and the posttreatment image of all even-numbered targets for set one (and the opposite for set two). This procedure ensured that participants rated both pre- and posttreatment images, but never both versions for the same target. Participants were unaware that the images showed individuals before and after undergoing treatment.
Measures
Participants rated the images on one of 17 dimensions. There is some disagreement over the exact dimensional structure of first impressions, but most previous work points to three core dimensions: Attractiveness, Approachability, and Capability (Jones et al., 2021; Lin et al., 2021; Oosterhof & Todorov, 2008; Sutherland et al., 2013). We sampled 11 traits that were found to capture these dimensions best in previous work: attractive, youthful, healthy (as indicators of the Attractiveness dimension), trustworthy, honest, outgoing, friendly (as indicators of the Approachability dimension), and dominant, powerful, competent, intelligent (as indicators of the Capability dimension). We also included five additional traits that are often studied in previous work on first impressions and the effect of aesthetic treatment: beautiful, charismatic, energetic, leader-like, and masculinity–femininity. Some of these traits may also load on one of the three dimensions, but this emerged less consistently in previous findings (Jones et al., 2021; Lin et al., 2021; Oosterhof & Todorov, 2008; Sutherland et al., 2013). Finally, we included an additional condition in which we asked participants to rate whether they think the person in the photo underwent aesthetic treatment.
All 17 dimensions were rated using the same format. A statement (e.g., “This person looks trustworthy”) was displayed below the photo and participants indicated their agreement on a seven-point Likert scale (strongly disagree, disagree, somewhat disagree, neither disagree nor agree, somewhat agree, agree, strongly agree).
We included several additional questions at the end of the survey. In addition to questions on basic demographic characteristics, we asked participants whether they ever considered having facial aesthetic treatment done. If they answered yes, we asked them which treatments they had done (injectables, skin lasering, or facial surgery).
Design and Procedure
The study had a 2 (target age group) × 2 (image set) × 17 (rating dimension) design and participants were randomly assigned to one of the 68 between-subjects conditions. Depending on the age group, participants rated either 56 or 58 images. Images were blocked by targets’ gender. Participants first rated all female targets and then all male targets or vice versa. The order of the two blocks was randomized. Within each block, participants rated one image at a time in a random order and no time limit was implemented.
Analysis approach
Given the nested nature of our data, we estimated a series of multilevel regression models with the lme4 (Bates et al., 2015) and lmerTest packages (Kuznetsova et al., 2017) in R (R Core Team, 2024). In all models (unless otherwise specified), we regressed participants’ ratings on a dummy variable indicating the treatment condition (0 = pretreatment, 1 = posttreatment). Note that although we do not explicitly specify this for every analysis to avoid repetition, all models included random intercepts per participant and target and random slopes for the treatment effect.
Results
A total of 173 participants (6.36%) indicated that they had undergone facial aesthetic treatment themselves (106 had injectables, 42 had skin lasering, and 48 had surgery).
We estimated consensus in first impressions by computing the intraclass correlation coefficient for each of the four image sets (see Table 1). We report the ICC(2,1) metric (as defined in Shrout & Fleiss, 1979), which reflects the average correlation between ratings of two judges. All consensus estimates were significant at p < .001, and results were similar to those observed in previous work (e.g., Hehman et al., 2017; Jaeger, 2020). We generally observed stronger consensus for characteristics that are more strongly determined by, or reflected in, a person's facial appearance (i.e., youthfulness, attractiveness, health). Average ratings for the pre- and posttreatment images per dimension are displayed in Table 1.
Summary statistics of participants’ ratings.
Note. Characteristics 1–3 make up the Attractiveness dimension, characteristics 4–7 make up the Approachability dimension, and characteristics 8–11 make up the Capability dimension.
Primary Analyses
To test the effect of facial aesthetic treatment on first impressions, we estimated three multilevel regression models, one for each first impression dimension (Attractiveness, Approachability, Capability), with random intercepts per participant and target and random slopes for the treatment effect. We regressed ratings for a specific trait dimension on a dummy variable indicating the treatment condition (0 = pretreatment, 1 = posttreatment). We also report the predicted means per condition. Targets scored slightly but significantly higher on Attractiveness after treatment (M = 3.97) compared to before treatment (M = 3.88), β = 0.091, SE = 0.028, 95% CI [0.032, 0.150], t(111.1) = 3.27, p = .001 (see Figure 2). We did not find a significant effect of treatment on perceived Approachability (pretreatment M = 4.08, posttreatment M = 4.14), β = 0.061, SE = 0.034, 95% CI [−0.007, 0.117], t(110.9) = 1.77, p = .080, and perceived Capability (pretreatment M = 4.26, posttreatment M = 4.30), β = 0.041, SE = 0.023, 95% CI [−0.001, 0.086], t(111.7) = 1.77, p = .079.

The effect of facial aesthetic treatment on the perceived Attractiveness, Approachability, and Capability of targets.
Researchers have argued that morality and sociability (which are often part of a broader Approachability or Warmth dimension) show important dissociations in social perception and behavior (e.g., being valued differently in others and predicting different outcomes; Brambilla et al., 2011; Goodwin et al., 2014; Jaeger et al., 2022). Similar distinctions have been made for dominance and competence (which are often part of a broader Capability or Dominance dimension; Chen et al., 2014). We therefore estimated the effects of aesthetic treatment on four alternative dimensions, each captured by two traits that were found to be indicative of the dimension in previous work (Brambilla et al., 2011; Chen et al., 2014; Goodwin et al., 2014; Jaeger et al., 2022): Morality (trustworthy, honest), Sociability (outgoing, friendly), Dominance (dominant, powerful), and Competence (competent, intelligent).
We estimated four regression models, one for each trait dimension in which ratings for a specific dimension were regressed on a treatment dummy variable (0 = pretreatment, 1 = posttreatment). We did not find a significant effect of treatment on perceived Morality (pretreatment M = 4.06, posttreatment M = 4.21), β = 0.052, SE = 0.035, 95% CI [−0.022, 0.119], t(108.6) = 1.51, p = .135, perceived Sociability (pretreatment M = 4.11, posttreatment M = 4.17), β = 0.065, SE = 0.041, 95% CI [−0.017, 0.148], t(112.0) = 1.57, p = .120, perceived Dominance (pretreatment M = 4.06, posttreatment M = 4.11), β = 0.049, SE = 0.032, 95% CI [−0.013, 0.114], t(111.5) = 1.51, p = .133, or perceived Competence (pretreatment M = 4.46, posttreatment M = 4.50), β = 0.038, SE = 0.029, 95% CI [−0.015, 0.097], t(110.9) = 1.31, p = .193. Overall, our results suggest that individuals were seen as more attractive after treatment, but impressions did not significantly improve for other fundamental dimensions (even though all effects were in the positive direction).
Additional Analyses
We also examined whether participants were able to detect whether targets’ had received facial aesthetic treatment. Regressing perceived treatment history on condition (0 = pretreatment, 1 = posttreatment) did not show a significant effect, β = 0.196, SE = 0.044, 95% CI [0.105, 0.278], t(109.6) = 4.43, p < .001. That is, participants were more confident that an individual had undergone aesthetic treatment when they viewed the posttreatment (vs. pretreatment) images.
We also explored the effect of aesthetic treatment on ratings for each of the seventeen separate characteristics tested here. Table 2 displays the results of separate regression models that tested for the effect of aesthetic treatment on each dimension. We report both unadjusted p-values and p-values that were adjusted with the Benjamini–Hochberg method to control the false discovery rate due to the larger number of tests. Effects on most trait ratings were nonsignificant, even though mean differences were in the same direction for all dimensions except one, and only the effect on youthfulness impressions (and perceived treatment history) remained significant after adjustment.
Effects of facial aesthetic treatment on ratings for each dimension.
Note. We report both unadjusted and adjusted p-values (Benjamini–Hochberg adjustment based on 17 tests). Characteristics 1–3 make up the Attractiveness dimension, characteristics 4–7 make up the Approachability dimension, and characteristics 8–11 make up the Capability dimension.
Discussion
Individuals scored higher on perceived Attractiveness after (vs. before) having undergone minimally invasive facial aesthetic treatment. The observed effect corresponded to a 0.091-point increase in perceived attractiveness on our 7-point scale. We did not find significant changes in perceived Approachability (trustworthy, honest, outgoing, friendly) or perceived Capability (dominant, powerful, competent, intelligent). Although it is difficult to directly compare the effect sizes observed here with effect sizes observed in previous studies due to methodological and statistical differences (we provide several comparison benchmarks in the General Discussion), the treatment effects in Study 1 were less comprehensive and less strong than suggested by previous studies (e.g., Dayan et al., 2010; Van Den Elzen et al., 2023).
Following the majority of previous work on the effects of facial aesthetic treatment, Study 1 examined changes in trait ratings before and after treatment. Trait impressions are an important precursor of the favorable outcomes that aesthetic treatment is expected to provide, as predicted by both social and evolutionary theories (Davis & Arnocky, 2022). Extending the focus of Study 1 and previous work in this literature, we examined the effect on social outcomes more directly in Study 2.
Study 2
Many people seek aesthetic treatment to look more appealing to romantic partners (Maisel et al., 2018). People who underwent treatment also reported that they are more confident when meeting new people and that it is easier to make new friends (Cohen et al., 2022). We therefore tested whether people are perceived as more desirable partners after (vs. before) having undergone treatment. We examined preferences for three types of common relationships: a more serious, long-term romantic relationship, a more casual, shorter-term romantic relationship, and platonic relationship between friends (Mogilski et al., 2019; Valentine et al., 2014). To mimic the conditions of dating or social networking sites, we embedded images of targets in simplified profiles that also included other (fictitious) information on targets (i.e., their name, age, and hobbies).
Partner preferences are strongly influenced by the gender and age of the rater and target. Given that we only had a small sample of male targets, we only focused on female targets (k = 81) in the present study. We recruited single, male, heterosexual participants (n = 481) to rate the desirability of targets. We also roughly matched the age of raters and targets (e.g., 25–45-year-old participants rated 25–39-year-old targets).
Methods
This study was preregistered: https://aspredicted.org/R7Z_JQ9.
Participants
As a conservative rule of thumb, we doubled our sample size per condition in comparison to Study 1. The current study had six between-subjects conditions (a more detailed description of the conditions follows below), resulting in a total required sample size of 480 participants. We used Prolific's screening tool to target heterosexual male participants from three specific age groups (described further below) who reported being single, widowed, divorced, or separated. A total of 514 Prolific participants from the United States, Canada, and the United Kingdom completed the study. Participants received £1.80 and the median completion time was approximately eight minutes. We excluded one response because the same participant had already completed the study before and we only retained the data from their first completion (this exclusion criterion was not preregistered). In line with our preregistered exclusion criteria, we also excluded two participants who reported having taken the study on a phone, one participant who did not identify as male, four participants who did not identify as heterosexual, and 22 participants who always gave the same rating across all trials on one or more rating dimensions. No participants had to be excluded because they reported poor English proficiency, because their age fell outside of the targeted age range, or because they had missing rating data for at least on trial. This resulted in a final sample of 481 male participants (Mage = 46.20, SDage = 11.71).
Stimuli
We used a subset of the stimuli that were used in Study 1. We excluded 22 male targets and two targets for whom the image preprocessing (described above) led to slight distortions around the neck or forehead. Romantic interest in others may be lower among old individuals. To avoid floor effects, we excluded seven targets older than 65. This resulted in a final sample of 81 eligible female targets whose age ranged from 21 to 65 (Mage = 46.20, SDage = 11.71).
We created six stimulus sets. As people tend to have platonic and romantic relationships with people from a similar age range, we matched targets’ and raters’ age even more closely than in Study 1. Targets were divided into three age groups: 25 younger targets (25–39 years old), 33 middle-aged targets (40–55 years old), and 23 older targets (55–65 years old). For each age set, we recruited participants that broadly matched the targets’ age: 25–45-year-old participants to rate the younger targets, 40–60-year-old participants to rate the middle-aged targets, and participants older than 50 to rate the older targets. Within each age set, we created two sets of stimuli using the same procedure as in Study 1. Thus, participants rated both pre- and posttreatment images, but never both versions for the same target.
The images were embedded in simplified profiles that included three other pieces of fictitious information on targets to resemble a profile that people might see on a social network or dating website: name (sampled from the most common names in the United States from the decade in which targets were born), age (the true age of targets with noise, a randomly drawn number between −2 and +2, added to preserve anonymity), and two hobbies (sampled from the most common hobbies of the specific age group according to various websites; see Figure 3 for an example).

Example of a profile shown to participants. Here, we only include a photo in which the face regions of four different individuals were merged to protect the anonymity of the target individuals. In the study, participants viewed full, uncropped photos of individuals.
Measures
We measured three types of partner preferences by asking participants to give three ratings for each profile in the following fixed order. The measures were adapted from a previous study on partner preferences (Mogilski et al., 2019). To measure friendship value, we asked each participant to imagine that “you are considering this person for a platonic (nonromantic) friendship. Examples of this type of relationship would include someone you meet for coffee or a dinner once in a while. Does this person look like they could be a good friend?” To measure short-term mate value, we asked each participant to imagine that “you are considering this person for a more casual, potentially shorter-term romantic relationship. Examples of this type of relationship would include a single date accepted in the spur of the moment or a one-night stand. Does this person look like they could be a good partner for casual romantic dating?” To measure long-term mate value, we asked each participant to imagine that “you are considering this person for a more serious, potentially longer-term relationship. Examples of this type of relationship would include someone you may want to, at some later point in the relationship, move in with or marry (or enter into a relationship on similar grounds as marriage). Does this person look like they could be a good partner for a more serious romantic relationship?” Participants responded to each question on a seven-point Likert scale.
In addition to questions on basic demographic characteristics, we also included three additional measures. As in Study 1, we asked participants whether they ever considered having facial aesthetic treatment done and, if yes, which treatments they had done (injectables, skin lasering, or facial surgery). We also asked participants whether they are currently open to each of the three relationship types examined here (friendship, casual dating, serious relationship) with a binary yes–no question. Finally, we asked participants whether they would consider someone for each of the three relationship types knowing that the person had undergone facial aesthetic treatment.
Design and Procedure
The study had a 3 (target age group) × 2 (image set) design and participants were randomly assigned to one of the six between-subjects conditions. Depending on the age group, participants rated 25, 33, or 23 profiles. The profiles were displayed in a random order.
Analysis Approach
Similar to Study 1, we estimated a series of multilevel regression models with the lme4 (Bates et al., 2015) and lmerTest packages (Kuznetsova et al., 2017) in R (R Core Team, 2024). In all models (unless otherwise specified), we regressed participants’ ratings on a dummy variable indicating the treatment condition (0 = pretreatment, 1 = posttreatment). Note that although we do not explicitly specify this for every analysis to avoid repetition, all models included random intercepts per participant and target and random slopes for the treatment effect.
Results
Eight participants (1.66%) indicated that they had undergone facial aesthetic treatment in the past (three had injectables, five had surgery). The majority of participants indicated being open to each of the three relationship types (90.44% friendship, 80.04% casual dating, 60.29% serious relationship). Most participants also indicated that they would consider a person who had undergone treatment as a friend (98.16%), for casual dating (93.50%), and for a serious relationship (82.76%).
Primary Analyses
To test the effect of facial aesthetic treatment on partner preferences, we estimated three multilevel regression models, one for each partner preference dimension, with random intercepts per participant and target and random slopes for the treatment effect. We regressed ratings for a specific relationship type on a treatment dummy (0 = pretreatment, 1 = posttreatment). We also report the predicted means per condition. Targets scored slightly higher on the friendship dimensions after treatment (M = 4.47) compared to before treatment (M = 4.39), β = 0.075, SE = 0.033, 95% CI [−0.001, 0.132], t(76.68) = 2.25, p = .027 (see Figure 4). There was also a small but significant effect of treatment for casual, short-term dating (pretreatment M = 3.63, posttreatment M = 3.72), β = 0.097, SE = 0.039, 95% CI [0.028, 0.175], t(77.59) = 2.50, p = .015, but not for a more serious, longer-term relationship (pretreatment M = 3.21, posttreatment M = 3.29), β = 0.074, SE = 0.049, 95% CI [−0.005, 0.148], t(77.75) = 1.87, p = .066. Thus, individuals were perceived as a more desirable friend and a more desirable partner for casual dating, but not as a more desirable partner for a serious relationship, after they had undergone aesthetic treatment.

The effect of facial aesthetic treatment on partner preferences.
Discussion
Results showed that targets were perceived as more desirable friends and more desirable partners for a casual, short-term romantic relationship after they had undergone a single session of minimally invasive facial aesthetic treatment. We observed a 0.075-point increase in friendship preferences and a 0.097-point increase in casual dating preferences on our seven-point scale. The latter effect was very close to the 0.091-point increase in perceived attractiveness observed in Study 1, which can be interpreted as converging evidence as facial attractiveness is an important predictor of sexual attraction and mate preferences (especially in men; Walter et al., 2020). However, we did not observe an increase in the perceived desirability as a longer-term romantic partner.
General Discussion
People across the world alter their appearance in various ways (e.g., using makeup or ornaments) and theories in social and evolutionary psychology predict that people engage in these practices because they lead to more positive evaluations by others (Davis & Arnocky, 2022). One increasingly popular method is the use of minimally invasive aesthetic treatment (e.g., injections with neurotoxins and dermal fillers). Indeed, surveys have shown that many people pursue these treatments because they want to project a more positive first impression and look more attractive as a romantic partner (Maisel et al., 2018). After treatment people frequently report that they make a better impression on others and that it is easier to make new friends (Cohen et al., 2022). Here, we tested in two high-powered, preregistered studies how effective these treatments actually are in changing how individuals are perceived by others.
In Study 1, we compared judgments of individuals before and after treatment on three core dimensions of impression formation and found that treatment positive increased judgments on the Attractiveness dimension, but not on the Approachability or Capability dimensions. In Study 2, we extended the focus to partner preferences and found that treatment positively increased the perceived desirability as a more casual, short-term romantic partner and as a platonic friend, but not as a more serious, long-term romantic partner. At first glance, the absolute change in impressions posttreatment appears to be small. A single session of aesthetic treatment positively influenced impressions of Attractiveness and desirability for casual dating by approximately 0.095 points on our seven-point scale (see Figures 2 and 4). We did not observe significant effects on perceived Approachability (e.g., trustworthy, friendly), perceived Capability (e.g., competent, powerful), and perceived desirability as a long-term partner, suggesting that potential benefits on these outcomes, if they exist, are still smaller.
Evaluating the size of the treatment effect is difficult because there are multiple reference points that one could take. The minimally invasive procedures that were tested here lead to less extensive changes in physical appearance than surgical treatments, which means that it would be unreasonable to expect large effects or effects that are close to those observed by more extensive treatments (e.g., Dayan et al., 2004). Individuals likely weigh the benefits of the specific treatment against its costs. Given that minimally invasive procedures carry fewer risks and are less expensive than more extensive surgical treatments, individuals may perceive that even small changes in appearance and elicited impressions are worth the relatively small costs. Moreover, people also name various other motivations for pursuing treatment (independent of how they are perceived by others), such as the desire to improve feelings of self-worth, confidence, and happiness, which may explain the popularity of the treatments (Maisel et al., 2018).
Another way to assess the size of the effects observed here (a change of slightly less than 0.1 points on a seven-point scale) is to compare them against other appearance altering methods that were tested in studies that employed similar designs and rating scales. A previous study that examined the effect of self-applied makeup on perceived attractiveness found a 0.6-point change (Etcoff et al., 2011). Similarly, digitally smoothing the facial skin in photos led to a 0.7-point change (Tsankova & Kappas, 2016). Smiling (vs. looking neutral) increased perceived attractiveness by approximately 0.4 points (Sutherland et al., 2017). Thus, the effects observed in the present studies were considerably smaller compared to other strategies that individuals might engage in to influence how they are perceived by others. It should also be noted that all studies, including the studies reported here, are likely to somewhat overstate treatment effects compared to what would be observed in more complex everyday situation, in which people usually have access to various other cues that they can rely (e.g., clothing, verbal, and nonverbal behavior).
In our studies, we aimed to address several limitations of previous work that tested the effects of facial aesthetic treatment on first impressions. Some prior studies may have overestimated treatment effects because pre- and posttreatment photos were presented next to each other, which highlights even subtle differences that may go unnoticed in everyday life (Othman et al., 2021; Przylipiak et al., 2019). Even though rater samples were often relatively large, the majority of previous work relied on very small samples of 10 to 30 targets, which limits the reliability and generalizability of results. Crucially, almost all previous studies did not statistically model stimuli as random factors, that is, as instances of stimuli that are (like individual participants) sampled from a larger population of interest (e.g., Dayan et al., 2008, 2010; Dayan, Rivkin, et al., 2019; Van Den Elzen et al., 2023). Rather, mean ratings of pre- and posttreatment images (averaged across all raters) were compared. This practice (initially introduced in linguistics as the language-as-fixed-effect fallacy; Clark, 1973) can substantially increase the prevalence of false positive results (Judd et al., 2012). Especially when stimulus samples are small, but participant samples are large, as is the case in the literature in aesthetic treatment, false positive rates inflate to 50% and higher.
In line with these concerns, the present studies, in which we addressed these limitations, yielded results that diverge from previous findings. We find an effect on perceived Attractiveness (a 0.09-point change on a seven-point scale) that is considerably smaller than what was observed in studies that may have overestimated the effect with side-by-side image presentations (a 0.81-point change on a seven-point scale in Othman et al., 2021; a 30-point change on a 101-point scale in Przylipiak et al., 2019). We also do not find a significant effect on perceived Approachability (e.g., trustworthy, friendly) or perceived Capability (e.g., competent, powerful) in contrast to previous work that reported significant, and often relatively large effects (e.g., Dayan, Bacos, et al., 2019; Othman et al., 2021; Van Den Elzen et al., 2023). Based on the considerations outlined above, we think that it is plausible that the present results offer more precise and generalizable estimates of the effect of facial aesthetic treatment on first impressions that these previous studies. However, there were of course many other differences between the current and previous studies which could explain the different results (e.g., the nationality of targets and raters, differences in the exact nature of the treatment across clinics) and we do not mean to suggest that previously observed effects were definitely inflated or even spurious.
Theoretical Implications
Across different cultures, people spend considerable amounts of time on altering their facial appearance with the help of makeup, tanning, aesthetic medical treatments, and other practices (Davis & Arnocky, 2022; Kowal et al., 2022). Theories in social and evolutionary psychology explain the prevalence of these practices with the favorable outcomes they are hypothesized to deliver to the individuals engaging in the practices (Davis & Arnocky, 2022). That is, these specific practices are thought to produce an appearance that is generally perceived favorably by others, ultimately leading to more positive outcomes in the dating market, job market, or other domains of life. Our findings suggest that minimally invasive treatments (i.e., botulinum toxinand fillers), which have become increasingly popular in many countries in recent years (Decates et al., 2024), may have a more limited impact on first impressions and partner preferences than suggested by previous studies (e.g., Dayan, Bacos, et al., 2019; Othman et al., 2021; Van Den Elzen et al., 2023).
On the one hand, this is not surprising given that most medical professionals and individuals who opt for the treatment primarily focus on enhancing attractiveness (Maisel et al., 2018). On the other hand, individuals who opt for the treatment also commonly report that they do so because of various expected benefits in additional (nonromantic) social domains (Maisel et al., 2018). After treatment, people also report that they make better impressions on others and feel more confident around others, for instance. We found no evidence for such improvements though when examining perceptions friendliness, morality, or competence. It is also possible that people experience positive outcomes after treatment, not because they are perceived very differently by others, but because they feel more confident and self-assured, which could lead them to behave differently in social interactions and obtain more positive reactions by others (in line with a self-fulfilling prophecy effect; Jussim, 1986). This account requires further testing in the context of aesthetic treatment. In general, it would be interesting to test how strongly individuals’ expectations about the effect of a specific treatment on first impressions and other outcomes align with the actual changes that are observed in a rigorous test of the effect.
The somewhat limited effect of treatment on impressions along social dimensions is surprising when considering previous work on the attractiveness halo in impression formation, which has consistently found that attractive individuals are also perceived more positively on various other dimensions (Batres & Shiramizu, 2022; Langlois et al., 2000). Although we found that people were perceived as slightly more attractive and as a slightly more desirable casual dating partner, we did not find that these effects translated into considerably more positive perceptions of friendliness, trustworthiness, honesty, or other traits. Correlations between perceptions of attractiveness and these other traits are often moderate to strong, but not perfect. It is possible that current treatments alter appearance in a way that specifically enhances attractiveness and related traits (e.g., perceived age and health), but not other traits, which is plausible given the common focus on attractiveness and youthfulness in aesthetic treatment and the targeted way in which procedures can be applied (Maisel et al., 2018). Furthermore, others have also suggested that even though perceptions of attractiveness and typicality are correlated, isolating the unique effects of both on perceived trustworthiness suggests that typicality and not attractiveness makes people look more trustworthy (Sofer et al., 2015). We did not observe an effect of treatment on gender-typicality (the only type of typicality that was measured in Study 1), which may explain the null effect for perceptions of trustworthiness and related traits. Finally, it is also possible that the improvements in attractiveness observed in the present study were simply too small to translate into detectable improvements on other correlated trait dimensions.
Limitations and Future Directions
There are several limitations to our study design and constraints on the generalizability of our results that should be noted. One goal of Study 1 was to also examine whether the effect of treatment different between men and women (almost all prior studies focused exclusively on female targets). We did not observe any interactions with target gender in our studies (these results are reported in the Supplemental Materials: https://osf.io/q5kjc/), but our studies might have been underpowered to detect these effects.
The goal of the current studies was to estimate the effect of a single minimally invasive treatment session on various outcomes. However, it should be noted that our sample likely included some individuals who already received some form of aesthetic treatment in the past (we did not have access to their entire treatment history). It is plausible that the effect of the treatments examined here is larger if it is a person's first, rather than their fifth treatment because the marginal change in appearance after several treatments may decrease. On the other hand, effects might also increase if multiple treatments are required to reach a certain level of change in overall facial appearance that is required to produce changes in first impressions. The current results should be interpreted as the marginal effect of a treatment session among people in general, rather than among people receiving their first treatment.
More generally, the current studies relied on targets that were treated in the Netherlands and raters from the United States, Canada, and the United Kingdom (to all but eliminate any chance that raters know a target) and it is possible that effects of aesthetic treatment vary across countries. Facial features that are typically, or more effectively, targeted by minimally invasive treatments may contribute more to first impressions in some countries. Attitudes toward aesthetic treatment may also differ and this can influence how people judge those who underwent treatment (in situations where people believe that someone underwent treatment). Finally, treatment practices may differ across different sites leading to different effects.
The present studies focused on individuals who underwent minimally invasive facial aesthetic treatment (i.e., injections with neurotoxins and dermal fillers). There are many types of aesthetic procedures that vary in the region that they target and in how extensive the treatment is (e.g., eyelid surgery, liposuction, breast, or cheek implants) which should strongly influence their effects on how individuals are perceived by others. We focused on minimally invasive procedures because they are more common than more invasive procedures and because they are becoming increasingly popular in absolute terms and in comparison to more invasive procedures (American Society of Plastic Surgeons, 2023; Decates et al., 2024). For example, it has been estimated that around one in 30 Dutch women between the ages of 18 and 70 has received treatment with neurotoxin or dermal fillers (Decates et al., 2024).
It is also possible that the effect of aesthetic treatment on impressions is nonlinear. Although some treatment may lead to an appearance that is perceived more positively, this effect could reverse if the treatment produces facial features that are perceived as unnatural or exaggerated. In the present studies, we found no evidence for larger effects on first impressions or partner preferences among targets who treatment for the entire face (i.e., the upper, mid, and lower areas of the face were treated, k = 61), rather than more targeted treatment (i.e., only one or two areas were treated, k = 20). We also did not find that participants had more negative impressions of targets who were more strongly perceived as having had aesthetic treatment done. Given that we only focused on minimally invasive procedures that were applied in a single session, studies with targets who underwent more extensive treatment are needed to test this question more directly.
Many studies that tested the effects of other types of aesthetic treatment (or other types of appearance altering behaviors) suffer from the same limitations that we outlined here. Thus, additional work is needed to better estimate and compare their effects on how people are perceived by others. Another interesting avenue for future research is to test how aware people are of the magnitude of change in how they are perceived by others posttreatment. Moreover, computer software, such as generative artificial intelligence algorithms, could be trained on existing image sets to create representative facial appearances after different treatments (e.g., more or less extensive neurotoxin or filler treatments). Such approaches could help to test the effects of different treatment types in a more gradual and systematic ways.
Supplemental Material
sj-docx-1-pec-10.1177_03010066251337353 - Supplemental material for Face value: The effect of facial aesthetic treatment on first impressions and partner preferences
Supplemental material, sj-docx-1-pec-10.1177_03010066251337353 for Face value: The effect of facial aesthetic treatment on first impressions and partner preferences by Bastian Jaeger, Berno Bucker, Jacques van der Meulen and Mark van Vugt in Perception
Footnotes
Acknowledgements
We thank Ahnjili Zhuparris for her help with stimulus selection and coding.
Author Contribution(s)
Data Availability Statement
Declaration of Conflicting Interests
This research was partly funded by AbbVie Inc. though an investigator-initiated grant scheme and partly funded by RealFaceValue B.V. The second and third authors are cofounders of, and have a financial interest in, RealFaceValue B.V.
Ethics Approval and Informed Consent Statements
The studies were approved by the Ethics Review Board of Tilburg University (reference code: TSB_RP1173). All participants provided informed consent before taking part in the study.
Funding
The author(s) disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: This research was funded by AbbVie Inc. and RealFaceValue B.V.
Supplemental Material
Supplemental material for this article is available online.
Notes
References
Supplementary Material
Please find the following supplemental material available below.
For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.
For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.
