Abstract
The study presents a comprehensive analysis of acoustic emission (AE) data collected from bending tests on prestressed reinforced concrete (RC) girders, with the aim of detecting and characterizing cracking for structural health monitoring (SHM) applications. Multiple assessment approaches are implemented, including both established and newly developed AE-based criteria, organized into distinct method specifications (MSs). For each MS, novel damage classification rules are proposed, and blind predictions are carried out to identify damage states ranging from microcrack initiation to macrocrack formation. The performance of each method is quantitatively evaluated using precision, recall, accuracy, and a global score enabling a comparative assessment. Results show that several entropy-based MSs achieve high predictive performance, and optimum assessment criteria are experimentally calibrated. The correlation between AE activity and damage progression is validated using additional specimens not involved in the blind phase. The study demonstrates the feasibility of using AE parameters for reliable damage classification in RC girders and provides a validated framework to support SHM procedures and future field applications.
Keywords
Introduction
The urgent need for early detection of damage in reinforced concrete (RC) and prestressed RC bridge members, with a focus on bridge girders,1,2 has become evident, as recent structural failures have underscored the limitations of current structural health monitoring (SHM) approaches.3,4 In particular, current SHM techniques, also considering experimental applications, still challengingly identify incipient and low-to-moderate damage in RC structures, especially referring to early microcracking initiation and evolution phenomena. 5 Acoustic emission (AE) testing 6 has emerged in the last few decades as a nondestructive testing technique with its capacity to potentially detect damage and degradation in real-time, automatically, and remotely,7–9 over a wide range of materials and structures,8,10–14 including prestressed structures.2,15
Among the copious literature performing AE tests on RC elements, few studies provided quantitative damage assessment criteria and measures associated with prestressed girders, especially regarding post-tensioned systems. Abdelrahman et al.
16
and ElBatanouny et al.
17
tested prestressed beams under cyclic loading considering pre-cracked and corroded conditions. They developed a damage index based on cumulative energy and damage quantification charts, that is effective for identifying damage level and that would be promisingly applied for in situ monitoring. Omondi et al.
18
combined AE testing and digital image correlation (DIC) to improve crack monitoring in prestressed RC sleepers, proving that AE-based assessment can be significantly more effective when DIC identifies the critical areas. Zeng et al.
19
performed four-point bending tests on I-section prestressed beams, and implemented basic AE analysis to identify the evolution of the fracture process. The study provides potential damage descriptors based on acoustic activity that could be effective for monitoring in prestressed RC elements. Zhang et al.
20
tested prestressed girders by coupling AE testing to DIC and found that AE testing potentially detects crack earlier than traditional techniques and that it can significantly enhance accuracy of conventional monitoring techniques. Prem et al.
21
identified AE energy versus deflection slope thresholds associated with formation on macrocracks in RC beams under bending, also correlating higher energy amounts to shear failure modes. The study characterized other correlations based on rise time to amplitude ratio (RA) versus average frequency (AF), severity index versus historic index,
Elbatanouny et al. 22 carried out bending tests on prestressed bridge girders removed from real bridges and developed intensity analysis charts that would be efficient for early detection of cracking deterioration. Castellanos-Toro et al. 15 tested posttensioned beams under both static and vibration loading conditions, reproducing the evolution from precracking to macrocrack. They identified potential acoustic damage correlations but confirmed the challenging data interpretation, also highlighting that limited information regarding current state and previous loading history of the structure might harden the damage interpretation.
Overall, AE testing showed potential correlations with mechanical response and damage evolution, but no studies, to the authors’ knowledge, focused on incipient and low-to-moderate cracking damage identification, systematically investigated multiple analysis methods, and provided robust damage criteria (DC), for example, by means of unbiased assessment or blind predictions (BPs). Moreover, literature correlations and criteria often reflect specific material and structural applications and can be difficult to extend to different scenarios. Thus, a pressing need exists for a thorough investigation that rigorously and quantitatively assesses current AE-based evaluation methods, potentially introducing new method specifications (MSs) and criteria as tools to detect early and low-to-moderate damage in typical prestressed girders.
The present study covers the abovementioned literature gap by systematically evaluating the effectiveness of various AE analysis methods in detecting early-stage and moderately developed cracking damage in prestressed RC girders. To this end, the study employed a comprehensive experimental program, integrating blind AE monitoring with mechanical and visual damage assessments to establish reference damage states (DSs). The novelty and scientific contribution of the paper refers to (1) methodological novelty, by implementing an AE testing framework integrated with BP and introducing revised and new MSs, (2) quantitative evaluation, by defining promising DC and objectively assessing their performance across multiple methods, and (3) practical relevance, by demonstrating how the proposed criteria could be employed for proof-testing and SHM of real-world bridge elements.
Experimental tests
Mechanical tests
Six posttensioned RC girder specimens were tested under four-point bending in the framework of an extensive experimental campaign aimed at assessing the influence of different grouting conditions, prestressed levels, and strengthening techniques on flexural response.1,23 Four tests, namely on specimens S1, S2, S3, and S4, represent the main core of the study, whereas two additional tests on specimens S5 and S6 were carried out for validation purposes; these latter tests are described in section “Validation considering additional specimens.”
The girder geometry was designed as representative of a real bridge deck with longitudinal beams having T-cross section in a length scale of 1:5. The internal posttensioning system resulted in two parabolic monostrand tendons with different prestress levels among different girders (Figure 1).

The tests were conducted at the Department of Structures for Engineering and Architecture of the University of Naples Federico II, the experimental layout with the specimen beam is reported in Figure 2. The testing program consisted into a sequence of two loading protocols, as follows: protocol P1, which consisted of a force-controlled quasi-static cyclic test in accordance with ACI 437
24
; and protocol P2, which was a displacement-controlled monotonic test untill (P2). Three subprotocols were defined within P1, namely P1L1, P1L2, and P1L3, aimed at providing three different external force levels (


P1L1, P1L2, and P1L3 loading protocols expressed as applied force (
The applied force
Acoustic emission (AE) tests
AE tests were performed with the multichannel AMSY-6 system produced by Vallen Systeme (Bürgermeister-Seidl-Str., Deutschland) using the acquisition software Visual AE. Nonintegrated and preamplified piezoelectric sensors were used, that is, VS30 and VS150 sensors, working in 28–80 kHz and 100–450 kHz, and resonant at 30 and 150 kHz, respectively. Specifically, only VS30 sensors were used for testing S1 specimen, whereas only VS150 ones were used for testing S3 and S4; to correlate VS30 and VS150 responses, S2 was tested with both VS30 and VS150. These two types of sensors were used to assess the AE sensitivity in terms of low (VS30) and medium (VS150) frequency resonance/operation, given that AE testing of RC structures is typically carried out within this range.15,17 Validation tests, performed on S5 and S6, also implemented sensors with intermediate working/resonant frequencies (VS75), as described in section “Validation considering additional specimens.”
Four sensor channels were implemented for each test, and typically three sensors were located in the middle point area. In particular, for specimen S1 and all loading protocols but P1L1, Ch1, Ch2, and Ch4 sensors were located in the middle point area (28 cm away from middle point axis), and Ch3 was out of this latter area (178 cm away from middle point axis); Ch4 was in the support area (located 278 cm away from middle point axis) under P1L1 test. For other specimens, Ch1, Ch2, and Ch3 sensors were located in the middle point area (28 cm away from middle point axis), and Ch4 was out of the area (approximately 90–100 cm away from middle point axis).
Pretrigger and posttrigger time interval was set equal to 100 μs, hit definition time, hit lockout time, and maximum hit duration time were set equal to 250 μs, 2000 μs, and 100 ms, respectively; no peak definition time was set. Band-pass 25–82 and 75–350 kHz filters were implemented for VS30 and VS150 sensors, respectively. Gain was set to 34 dB and input setting to 10 Vpp.
Check and preparation tests were carried out to verify the installation of the sensors and to calibrate the main acquisition parameters; pencil-lead break, pulsing tests, and sensors’ localization tests were performed. Only recordings along the mechanical tests were analyzed in this study; in particular, AE data were identified by channel (Ch1, Ch2, Ch3, and Ch4) and related mechanical test, that is, specimen (S1, S2, S3, and S4) and loading protocol (P1L1, P1L2, P1L3, and P2). All basic AE features were recorded, with a focus on number of hits or hits (
Damage assessment criteria and blind predictions (BPs)
Methodology
AE analysis and MSs
Four different methods (Ms)/analysis parameters were considered: Kaiser effect (and Felicity ratio), RA versus AF response,
Violation of Kaiser effect 25 and associated Felicity ratio (FR) 26 were assessed (method (M) K) considering a number of 11 MSs, depending on the quantitative definition of significant activity descriptors, that is, considering:
hits (
counts (
hits (
history index larger than or equal to 1.4, considering four different formulations, applied considering the case of resetting and not resetting
Only a MS was implemented for RA versus AF (or AFRA) analysis, 27 assessing the evolution of RA versus AF for all channels and over subsequent subsets within each test (MS AFRA).
Finally, AE or acoustic (information) entropy (H)29,30 (M E) was assessed considering both (E1) Shannon (
Entropy was assessed starting from the first AE event and increasing detection window along time including each consequent AE event.
Damage assessment
The assessment methodology is organized in four steps: (1) M selection, (2) M implementation (through MS), (3) AEs processing according to MS, and (4) damage assessment results. Figure 4 shows the workflow considering M K and MS K1 as an example.

Workflow of the implemented assessment methodology (Table 1). P2 is analyzed up to a peak force of 1.5 times P1L3 peak force (1.5
AEs processing (step 3) is based on the implementation of quantitative DC, defined for each method and MS as it is reported in Table 1. Each implementation of DC yielded a BP (step 4), which consists in a DS corresponding to each specimen and loading protocol.
Damage assessment methodology: DC associated with implemented assessment methods and MSs: DC ID, EC, DSWs, and DSC.
DC: damage criteria; DSC: damage state criteria; DSW: detection sampling window; EC: elementary criteria; ID: identifier; LD: low damage; MS: method specification; ND: no damage; RA: rise time to amplitude ratio; SD: severe damage.
DC were defined by a set of rules and analysis features implemented by processing the results associated with each MS. Each DC is defined by DC ID, elementary criteria (EC), detection sampling windows (DSWs), and damage state criteria (DSC). DC IDs correspond to MS IDs unless multiple DC are defined for a single MS. EC correspond to elementary detections referred to the specific conditions (e.g., violation of Kaiser effect or entropy threshold exceedance) checked in DSWs, identified within each loading protocol. For a given DC ID, the quantitative interpretation of the EC detections over the sensor channels results in a DSC, which allows to define a BP.
EC define the level of attention associated with the specific detection, and binary or ternary EC were defined according to the specific DC. Binary EC are based non-EC detection (null level of attention) and EC detection (low to severe level of attention); ternary EC are associated with EC1 (null level of attention), EC2 (low-to-moderate level of attention), and EC3 (moderate-to-severe level of attention). EC (Table 1) were defined according to the interpretation of the AE results and accounting for the literature evidence.
EC associated with K, AFRA, and b MSs were set considering well consolidated DC. With regard to E MSs, EC3 were derived from the blind criteria already considered for the analysis of the results, whereas EC1 and EC2 were defined by setting (1) half EC3 as a reference EC1/EC2 threshold value for E1.1, E2.1, and E2.2, which operate in the context of a single order of magnitude and (2) 10−2 EC3 as a reference EC1/EC2 threshold value for E1.2.
For fixed sensor channels, DSWs (Table 1) represent the context of the assessment realization associated with EC checks. DSWs associated with investigated MSs coincide with related subsets. In detail, DSWs related to the Kaiser effect and Felicity ratio MSs (K MSs) consist in each cycle of P1 tests and to P2 up to peak force of 1.5 times P1L3. DSWs related to AFRA and
DSC (Table 1) define specific rules for interpreting, in a univocal quantitative manner, the results of EC checks within DSW for each MS (and specimen). DSC essentially consist in criteria associated with number and level of attention of detected EC over total number of channels and DSWs. DSC are defined for three hypothesized DSs, classified among no damage (ND), low damage (LD), and severe damage (SD), for each specimen and loading protocol. Since the predictions are blind, the classification of the level of damage is based on reasonable clear distinguishable criteria, which also account for the range of observed parameter/feature values. Obviously, DSC were quantitatively defined with regard to the specific tests, but these can be easily extended to other applications by referring to the size of DSWs and accounting for the number of sensor channels.
For each MS, the synthesis of the application of DSC to all loading protocols and specimens provides a BP. In particular, a BP is expressed by a 4 × 4 matrix, where rows represent loading protocols (P1L1, P1L2, P1L3, and P2) and columns stand for specimens (S1, S2, S3, and S4). P2 was considered up to a peak force of 1.5 times P1L3 peak force (1.5
In order to account for the dispersion of the investigated DC in terms of BP matrices, mode (M-matrix), disagreement frequency (DF-matrix), and entropy (E-matrix) were estimated. M-matrix was defined by the most frequently predicted DS, DF-matrix corresponded to the frequency of deviation from the mode, and E-matrix was defined by Shannon entropy (
Results and discussion remarks
Key response occurrences
The experimental results are processed in this article identifying the key response occurrences, considering Kaiser effect and Felicity ratio, AFRA,
The occurrence of Kaiser violations reported in Table 2 clearly indicates that specimen S1 and S2 were affected by damage since protocol P1L1, with more (less) significant damage associated with S1 (S2) as 5 (2) occurrences were detected; a number of 5 occurrences is detected for S2 corresponding to P1L2. A number of 3 and 1 occurrences associated with S3 and S4, respectively, were observed corresponding to P1L3, and, corresponding to P2, the occurrences associated with S4 become more significant (4). Once again, the number and significance of the Kaiser effect violations potentially indicates the evolution of damage.
Sum of total occurrences of Kaiser effect violation along cycles (cycles 1 and 2 for protocols P1 and unique test procedure for protocol P2) for each test and specimen, considering all channels (MS K3.1b).
Size of sample for each sum value is 4 (1 cycle × 4 channels) for all specimens but S1 and protocol P1L1, and all specimens and protocol P2 (1 test procedure × 4 channels), whereas it is equal to 8 in all other cases 8 (2 cycles × 4 channels).
Tables 3–6 report the cases in which AF decreases and RA increases simultaneously for specimens S1, S2, S3, and S4, respectively, for each test and channel. A summary of total occurrences is reported in Table 3.
Simultaneous occurrences (1/0) of decreasing AF and increasing RA along subsequent subsets for each test and channel, associated with specimen S1 (MS AFRA).
AF: average frequency; RA: rise time to amplitude ratio.
Simultaneous occurrences (1/0) of decreasing AF and increasing RA along subsequent subsets for each test and channel, associated with specimen S2 (MS AFRA).
AF: average frequency; RA: rise time to amplitude ratio.
Simultaneous occurrences (1/0) of decreasing AF and increasing RA along subsequent subsets for each test and channel, associated with specimen S3 (MS AFRA).
AF: average frequency; RA: rise time to amplitude ratio.
Simultaneous occurrences (1/0) of decreasing AF and increasing RA along subsequent subsets for each test and channel, associated with specimen S4 (MS AFRA).
AF: average frequency; RA: rise time to amplitude ratio.
Table 7 considers all channels. According to the total occurrences, S1 and S2 were significantly affected by P1L1, whereas null to one occurrence were associated with S3 and S4. The number of occurrences decreases passing from P1L1 to P1L2 for S1 and S2, with no occurrences related to S3 and S4 under P1L2. P1L3 occurrences related to S1 and S2 are comparable with P1L1 and P1L2 ones, whereas a number of three and zero occurrences are detected for S3 and S4, respectively. Finally, occurrences related to S1 and S2 increase from P1L3 to P2, and a significant number of occurrences also affect S3 and S4. The abovementioned detections suggest that S1 and S2 might be affected by damage since P1L1, whereas, only from P1L3 and P2, S3 and S4 begin to be damaged, respectively. It is recalled that P2 data are associated with the detection windows that extend up to 1.5
Summary of total simultaneous occurrences of decreasing AF and increasing RA along subsequent subsets for each test and specimen, considering all channels (MS AFRA).
Size of sample for each sum value is 20 (5 subset variations × 4 channels).
AF: average frequency; RA: rise time to amplitude ratio.
As a representative result of
Low
Low
Low
Low
Summary of total low
Size of sample for each sum value is 24 (6 subsets variations × 4 channels) and 20 (5 subsets × 4 channels) for low value occurrences and decreasing occurrences, respectively.
Acoustic entropy was estimated considering the same subsets considered for AF versus RA and
High cumulative value of Shannon entropy (Σ
High cumulative value of Kullback–Leibler entropy (Σ
Summary of total occurrences of high value of investigated entropy measures (MS E1.1, MS E1.2, MS E2.1, and MS E2.2) for each test and specimen, considering all channels.
Size of sample for each sum value is 4 (1 test procedure × 4 channels).
Blind predictions (BPs)
Figure 5(a) depicts all BPs associated with investigated DC. All BPs tend to detect an increasing severity damage trend passing from protocol P1L1 to P2 and a decreasing one from S1 to S4, with a combination of these effects when both features vary. There are clear similarities among the different estimations, for example, in most cases, ND was associated with P1L1 for S1 and S2 specimens, whereas in most cases, SD was detected for all specimens under P2 test. Outlier BPs are associated with

Damage assessment results: (a) BP matrices corresponding to estimated DSs associated with investigated protocols and specimens for all DC and (b) Dispersion analysis of BPs: M-matrix, DF-matrix, and E-matrix. M-matrix values 1, 2, and 3 correspond to ND, LD, and SD, respectively; DF-matrix values are associated with estimation deviances from the mode; E-matrix values report Shannon entropy measures, considering ND, LD, and SD corresponding to 1, 2, and 3, respectively.
Figure 5(b) shows the results of the dispersion analysis. M-matrix estimation highlights the most frequent DS associated with the set of investigated DC, and this represents a measure of consensus among the different BPs. The experimental interpretation of M-matrix is crucial since all predictions are blind and several different methods and formulations were implemented to derive BPs; this is reported in the following section. For S1 and S2, all protocol mode estimates are associated with SD, as well as P2 estimates associated with all specimens; for S3 and S4, ND is associated with P1L1–P1L2 and P1L1–P1L2–P1L3, respectively, whereas LD is only frequently estimated for S3 under P1L3. It is interesting to note that LD condition, representing a transition between LD and SD, only appears 1 time out of 16 cases in M-matrix, and this suggests that this mechanical state is more difficultly detected.
DF-matrix depicts how often the estimations disagree (deviations from the mode), and high values are associated with significant variability. Looking at the M-matrix as a reference, highest DF-matrix counts are overall associated with first DS achievements or DS transitions, and this is meaningful since the earliest achievement of DS is certainly the most challenging condition to assess. In particular, a deviation value equal to 11 corresponds to specimen S2 under P1L1 (first test) and a value equal to 12 is related to the transition between LD and SD.
Finally, E-matrix represents a quantitative measure of uncertainty and disorder, and, in this context, it indicates how much the different estimations are spread across multiple DSs. E-matrix could be evaluated considering acceptable entropy thresholds potentially associated with reasonably uncertain and disordered estimations, according to the case study DS. Further comments are omitted since this matrix, as well as the other ones, will be meaningfully interpreted with regard to the disclosed experimental data in the following section.
Experimental assessment of blind predictions (BPs)
Experimental response and damage states (DSs)
Figure 6 depicts the applied force (

Applied force (
Even if nominal concrete properties were kept constant for all specimens providing an average cubic strength of 35 MPa after 28 days of curing, the specimens were designed with a different prestress level. Specifically, the target prestress jacking forces in Table 16 were applied to the specimens through both internal and external posttensioning (S3 and S4 only).
Prestress applied to specimens: internal (
The different prestressing levels resulted in significantly different damage scenarios for the specimens. In detail, S1 was designed to develop cracking under external load lower than
The object of interest for the present assessment is the mechanical phase that initiates with the tensile microcracking formation, which might not even be visible, corresponding to the earliest visible deviation from linear elastic response (incipient reduction of flexural stiffness from the initial conditions), and ends with the evolution of macrocracking that affects, in a considerable and steady manner, the flexural stiffness (postcracked cross-section stiffness), namely a steady stiffness reduction larger than 20%. In this study, the former phase is conventionally referred to as LD, whereas the latter to SD. The transition phase can be relatively gradual and not straightforward to identify, especially in terms of damage initiation. LD and SD were identified by synthesizing physical observations (cracks noted during testing, especially for SD) and force-deflection response data. 23 Even though the analysis of the crack patterns is beyond the scope of this article, an example of crack pattern typically associated with SD is illustrated in Figure 7. It can be noted that the crack is just visible, and it extends from the bottom surface to over 10 cm, with a width exceeding 0.1 mm in the lower crack portion tending to decrease along the evolution. LD condition is associated with microcracks that are not generally visible for their extremely reduced width and extension.

Example of crack pattern associated with achievement of conventional SD state.
The achievement of SD can be clearly observed in Figure 6: global significant stiffness degradation due to macrocracking associated with specimens S1, S2, S3, and S4 can be associated with P1L1, P1L3, P2, and P2, respectively, recalling that in these latter two cases SD is identified between

Experimental (actual) DSs associated with investigated protocols and specimens.
Performance evaluation of blind predictions (BPs)
Methodology
The performance evaluation of BPs was implemented according to consolidated metrics based on the confusion matrix, accounting for true positive (TP), false positive (FP), false negative (FN), and true negative (TN) conditions.34–36 Similar method were used in the literature applications for performance evaluation in the context of AE testing.37,38 In particular, precision (P) (Equation (4), recall (R) (Equation (5), and F1-score (F1S) (harmonic mean of P and R, Equation (6)), and accuracy (A; Equation (7)) were assessed for each DC and grouping all three DS; for this latter computation, both DC-averaged and DC-weighted computations were implemented, and weight coefficients were set equal to 1/7, 2/7, and 4/7 for ND, LD, and SD, respectively, in order to double the weight from ND to LD and from LD to SD.
Finally, a global score parameter (
Results and discussion
The confusion matrices are not reported for the sake of brevity, and Figure 9 shows

Precision (
The prediction performances in terms of all parameters depicted in Figure 9 generally have higher precision, recall, combined score (
The more challenging detection of LD is also due to the mechanical classification. As a matter of fact, the mechanical identification of LD is inherently characterized by subtle and localized damage phenomena that affect the beam local flexural behavior without causing clear changes in its global response. While SD is univocally defined in quantitative terms, mechanical achievement of LD is more affected by uncertainty. Moreover, over the experimental tests, LD was achieved a relatively reduced number of times. This partial or limited expression of damage impacts AE generation and assessment, since the associated acoustic signatures are typically less pronounced and more prone to be masked by noise or disturbance. Finally, it should be noted that, in some cases, for example, for several K criteria and AFRA, mechanical LD was conservatively detected as SD, and this is due to both (1) the narrow mechanical transition between LD and SD and (2) the not optimized/calibrated nature of the AE criteria thresholds.
Despite the relatively challenging predictions related to LD and SD, it should be noted that, overall, the prediction performance can be considered as satisfactory, as it is quantitatively discussed in the following paragraph. Moreover, some methods and DC are highly satisfactory. The generally positive performance of the investigated set of methods/DC can be demonstrated by considering (1) the relatively high-performance metrics of the mode results (M-matrix in Figure 6) plotted in Figure 10 (i.e., see F1S associated with mode) and (2) the relatively low deviations and entropies associated with the mode (Figure 6), as previously discussed. In other words, the investigated methods likely provide a comprehensive prediction (high performance metrics associated with M-matrix) that is also potentially associated with a relatively low dispersion (low deviations in DF-matrix and reduced entropy in E-matrix). Even though the set of investigated methods, or better DC, affects the abovementioned estimations with regard to both mode and dispersion data, the abovementioned outcomes can be considered as reasonable and representative since a variety of methods and a discretely large number of DC is considered.

Precision (
In order to comprehensively account for DSs, Figure 10 depicts
Figure 10 also shows that weighing DS does not majorly affect the results in most cases, as compared to the average data, even though (a) weight coefficients were substantially unbalanced (1/7, 2/7, and 4/7 for ND, LD, and SD, respectively) and (b) they implemented the lowest weight coefficient to the least challengingly predictable DS. Specifically, maximum (average) discrepancy absolute value associated with
The applicative classification of the performance of the different methods and DC is based on the assessment of the global score (

Global score (
A large number of DC are associated with
Overall, Kaiser method provides good predictions that are not necessarily significantly conditioned by the specific MS, and this suggests that the strong physical interpretation of Kaiser effect and Felicity ratios balances the uncertainty associated with the conventional definition of the significant activity criterion (that generates the multiple DC). Furthermore, the study showed that only specific formulations are likely to be associated with low performance (i.e., K3.2a). On the other hand, E2.2 provides the best prediction, and this confirms that entropy-based measures, in particular, cumulative Kullback–Leibler entropy with regard to the specific EC, have a high potential. In future studies, the blindly assessed E2.2 criterion should be experimentally calibrated for enhancement purposes. It is also interesting to notice that DC associated with E2.2 are compatible with DC found to be effective in past studies that focused on various metallic materials,39–42 and this indicates that the entropy metrics and DC based on historical, especially Kullback–Leibler formulation, do not potentially depend on material/geometry and application.
Validation considering additional specimens
The developed MSs and criteria were defined blindly, and both E2.2 and K1 were found to be reliable for the detection of the flexural cracking process, from the microcracking formation (incipient deviation from linear behavior, conventionally associated with LD) to macrocracking initiation and propagation (significant and steady stiffness reduction, defined as SD). All investigated criteria, including the abovementioned ones, were assessed with regard to two additional girder specimens (namely, S5 and S6), derived from the benchmark beams. S5 and S6 were tested under the same testing protocols and loading conditions already used for S1 to S4, and further details are omitted for the sake of brevity and generality. The prestress level of S5 and S6 was the same as S1 and S2, respectively, without any external posttensioning. The instrument arrangement is depicted in Figure 12, where is can be noted that channels (Ch) 1 and 4 (2 and 3) correspond to sensor VS75 (VS150), and that Ch1, Ch2, and Ch3 are in the middle point area of the beam, with Ch3 is attached to the inferior surface of the girder and Ch1 and Ch2 are symmetrically attached to the upper part of the lateral surface of the girder. A band-pass 50–200 kHz filter was used for MS75, and for all other sensor/acquisition features, the same parameters reported in section “Acoustic emission tests” were set.

Type and location arrangement of sensors associated with tests on S5 and S6 specimens (dimensions in mm).
VS75 sensors were also used to test the additional girders in order to account for the variation of the sensors, with regard to tests performed on S1–S4 specimens, which implemented sensors with lower and higher working/resonant frequencies (VS30 and VS150).
It should be noted S5/S6 Ch1 and Ch2 are located in the same position of S3/S4 Ch1 and Ch2, as well as S5/S6 Ch3 positions correspond to S4 Ch3 one, but only S3/S4 Ch2 and S4 Ch3 correspond to identical S5/S6 Ch2 and Ch3 sensors, respectively, related to VS150.
Figure 13(a) shows the

Applied force (
The time evolution of the Σ

Applied force (
K1 MS detected SD corresponding to P1L1 and P1L2, respectively, and ND was detected for preceding protocol. Similar trends similar to the S1 to S4 results were found considering the other MSs.
K1 almost provides the same estimations associated with E2.2, and the only difference is related to a more accurate but less conservative E2.2 identification of LD for S2 under P1L2. The slightly more conservative estimation of K1 aligns with proof-test objectives of the MS.
Experimental calibration and final corroboration
E2.2 was found to be the best MS over BPs, and, in this section, E2.2 EC (Σ

Average and weighted global scores (
It is interesting to note that optimum Σ
Damage classification and assessment effectiveness is highly sensitive to Σ
As a final corroboration, S was recomputed for all MSs using the complete dataset of six specimens. The results, presented in Figure 16 (specimens S1–S6) in terms of both average and weighted S values, show good consistency with those from Figure 11 (specimens S1–S4). In particular, the inclusion of the two additional specimens (S5 and S6) does not significantly alter the S scores. Overall, S tends to remain stable or to increase slightly when moving from four to six specimens (from 16 to 24 tests). Notably, E1.1, AFRA, and E1.2 estimations, already associated with similar satisfactory S values for specimens S1 to S4 (Figure 11), exhibit an upward trend and, following the E2.2 criterion, emerge as the most effective ones. K1 criterion remains satisfactory but yields a slightly lower S score than these top-performing MSs.

Global score (
The final assessment, based on 6 specimens and 24 tests, confirms and reinforces the validity of the proposed criteria, supporting their potential application in the development of SHM procedures.
Conclusions
Summary
The article develops and evaluates damage assessment criteria based on AE testing for the identification of low and moderate cracking damage conditions in prestressed RC girders. Six specimens were tested under four-point bending tests according to cyclic and monotonic procedures. AE data were detected, and multiple analysis methods and MSs were tested through a blind evaluation of DSs associated with each test. The effectiveness of the proposed criteria was quantitatively assessed against the experimental results, by means of a score (S) ranging from zero to one, leading to their validation and, for the best-performing MS, to a dedicated calibration effort.
Key results and discussion remarks
Key results and discussion remarks are summarized in the following.
Genuine AEs sourced by crack onset and propagation could be potentially detected by more than 1–1.5 m distances, even though with reduced entity, under the tested conditions and used equipment. The use of sensors resonant in the 75–150 kHz range is preferable, and acquisition parameters can be optimized following the indications of this study. It is recommended to deploy four sensors for each expected damage zone.
Kaiser effect and Felicity ratio analysis confirmed a solid alignment with mechanical response, especially with regard to K1 MS, which yielded an
AFRA analysis revealed that an increase in RA and decrease in AF reflects the evolution of damage, expanding the traditional scope of AFRA, which is typically focused on crack type identification. In particular, AFRA was associated with an S value equal to about 0.8. Furthermore, it should be noted that AFRA-based criterion relies on trend analysis rather than absolute values or specific criteria, and this strengthens its potential robustness for application under significant uncertainties (e.g., in situ monitoring).
AE entropy analysis was found to be potentially well correlated with damage initiation and evolution, showing a clear pattern related to the absolute and cumulative computations, especially with regard to E2.2 MS. Entropy parameters were reasonably stable, and their trends did not depend on the specific test, also highlighting a relatively low sensitivity in terms of value ranges to different testing arrangements and sensors. In particular, E2.2 yielded an
The validation highlighted the role of sensor characteristics, with higher sensitivity to E2.2 as the sensor resonant frequencies decrease. Despite these variations, the entropy-based E2.2 criterion consistently identifies DSs across different setups, proving its robustness.
The significance of the developed MSs is strengthened by recalling that the focus was on minor damage conditions: (1) LD condition is not generally visible since the related cracks have relatively reduced widths (e.g., lower than 0.1 mm) and extension and (2) SD condition does not represent an actual structural damage condition, but represents the effective initiation of the postcracked response.
Implementation in structural health monitoring (SHM)
The following remarks support the implementation of the investigated MSs for SHM purposes.
The sensor characteristics, among the tested frequency ranges (i.e., from 30 to 150 kHz resonant sensors) and related filtering (Sections “Acoustic emission tests” and “Validation considering additional specimens”), influence the AE sensitivity to damage and noise, which decreases as resonant frequency increases. However, the comparisons among the different specimens demonstrate that the assessment based on most effective MSs is not affected by the variation of the sensors, and 75–150 kHz resonant sensors are recommended as they balance AE sensitivity to noise and clear detection of genuine events.
The AE acquisition parameters adopted in the study are explicitly reported, and additional implementation guidance can be provided by the corresponding author upon reasonable request.
The criteria were validated using commercial wideband AE sensors and standard acquisition systems, requiring only moderate sampling rates and a minimum of four sensors per damage-prone area, which can be adjusted based on monitoring needs.
Entropy-based and AFRA criteria appear well suited for passive monitoring of RC girders subjected to (a) increasing load or deformation conditions (e.g., as a stop criterion during proof-testing43,44), and potentially (b) bridge service conditions. K1 criterion could be suitable in the context of proof-testing and load–unload testing protocols, where the applied load is explicitly known and controlled.
The proposed DC are computationally lightweight and based on simple statistical operations, making them suitable for real-time and time-continuous SHM applications. Several MSs (e.g., E2.2 and AFRA) have been successfully tested on commercial AE software supporting real-time evaluation, and their simplicity allows for future implementation on embedded or low-power computing platforms.
Limitations and potential developments of the study
The effectiveness of the developed criteria refers to the tests performed but might be extended to other similar or comparable cases, with due consideration. While the findings confirm the potential of the proposed methods for damage classification, the limited number of specimens (six girders under four loading protocol each, resulting in 24 tests) and associated statistical constraints must be acknowledged, especially regarding the potential implementation in situ.
The more challenging detection of LD is also attributable to the uncertainty of its mechanical classification, which is based on subtle, localized damage affecting only the beam local flexural behavior without altering the global response. Moreover, limited (LD) occurrence in the experimental tests and low-intensity AE signatures it generates made LD more difficult to identify. This often led to conservative misclassification as SD, especially when AE thresholds were not specifically calibrated. Therefore, further tests should focus more on damage initiation phenomena, addressing the issue from a microscopical point of view and also considering the material scale.
Finally, future work will focus on refining the processing logic for each MS, exploring more suitable calibration approaches, and integrating additional physical data to enhance the validation of prediction outcomes. A larger and more diverse experimental dataset, including tests in situ, will be considered to strengthen the statistical significance and general applicability for SHM purposes.
Footnotes
Acknowledgements
The contribution of Eng. Giuseppe Pollio and Eng. Dario Chiacchia for operative testing support and data management are appreciated.
/) are thanked for the technical and operative support.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: The study was supported by the following projects: (a) ROCK-RESILIENCE: ROCKing-based strategies for RESILIENCE of reinforced concrete structures: conception, structural design, nonstructural components, efficiency and sustainability – CUP: E53D2301704001 (European Union Next-Generation EU – Piano Nazionale di Ripresa e Resilienza (PNRR) – MISSIONE 4 COMPONENTE 2, INVESTIMENTO N. 1.1, BANDO PRIN 2022 PNRR D.D. 1409 del 14-09-2022), (b) Progetto di sviluppo del Dipartimento di Strutture per l’Ingegneria e l’Architettura “Dipartimento di Eccellenza” 2023-2027 (Italian Ministry of University and Research, MUR), (c) Progetto di rilevante interesse nazionale (PRIN) 2020YKY7W4 “ENRICH: ENhancing the Resilience of Italian healthCare and Hospital facilities” (Italian Ministry of University and Research, MUR), (d) Progetto di rilevante di interesse nazionale (PRIN) 2020P5572N “FIRMITAS: multi-hazard assessment, control and retroFIt of bridges for enhanced Robustness using sMart IndusTriAlized Solutions” (Italian Ministry of University and Research, MUR), and (e) Progetto “RESIST RobustnEss aSsessment and retrofItting of bridgeS to prevenT progressive collapse under multiple hazards” (University of Naples Federico II and Compagnia di San Paolo through STAR Plus Programme 2020).
Ethical considerations
There are no human participants in this article and informed consent is not required.
Data availability
The data that support the findings of this study are available from the corresponding author, (D.D.), upon reasonable request.
