Abstract
Considerable interest lies in the growth in educational achievement that occurs over the course of a child’s schooling. This paper demonstrates a simple but effective approach for the comparison of growth rates, drawing on a method first proposed some 80 years ago and applying it to data from the Australian National Assessment Program. The methodology involves the derivation of a ‘meta-metre’ – a quantitative mode of variation in growth – which permits comparison between groups defined by time-invariant characteristics. Emphasis is placed upon the novel characteristics of the method and the valuable information it can provide. Unlike complex modelling procedures, the approach provides a parsimonious model of growth suited to comparisons between groups.
Introduction
Recent government, public policy and academic recommendations have highlighted the need to measure student achievement over the course of schooling, rather than concentrating on a specific timepoint (e.g. Department of Education and Training, 2018; Goss et al., 2018; Masters, 2020; McGaw et al., 2020).
Within the Australian educational landscape, the focus of school-age reporting of achievement has traditionally centred on performance at a single timepoint as measured by the National Assessment Program – Literacy and Numeracy (NAPLAN) (Department of Education and Training, 2018). While such approaches provide meaningful information on educational achievement relative to the population, they fail to address variation in the level of growth for students with differing levels of academic aptitude. They also mask the extent to which student proficiency increases relative to previous performance, particularly for students entering school with low levels of literacy and numeracy. Interest thus lies in a shift away from static measures to a focus on achievement over the course of a child’s schooling. Such views are highlighted in several recent reports (e.g. Goss et al., 2018; Masters, 2020; McGaw et al., 2020) and emphasise a need to focus on variation in levels of achievement in early years of schooling and the subsequent progress that takes place over time – features, respectively, referred to as a student’s ‘initial status’ and ‘growth rate’.
One area of interest raised by Masters (2020) is variability in growth between students from different backgrounds, with the possibility of increasing equity and inclusivity through targeted interventions. To implement such recommendations, whether through policy or pedagogy, methods appropriate for analysing growth between groups are required. Growth modelling approaches that accommodate varying starting points and trajectories, such as the Rasch Growth Model (RGM), are a logical consideration.
To effectively measure the rate of growth across schooling, appropriate methods are required. Growth-oriented methodological approaches accommodate varying starting points and trajectories but are often complex in their specification and interpretation (Curran et al., 2010), requiring specialist statistical software. An alternative methodology, first proposed some 80 years ago, involves the derivation of ‘salient features of growth’ (Rao, 1958, p.1). This paper demonstrates how the application of this approach to a set of longitudinal student assessment data can provide statistics that lend themselves to a presentation of easily comparable linearised results. The approach provides valuable information regarding the overall trajectories of groups of students and demonstrates utility in the evaluation of achievement over time. The visual representation of the model results, as well as their summative properties, makes the RGM particularly useful for communication of information regarding student growth.
The RGM proposes a mechanism for evaluating differences in growth across a variety of phenomena (Olsen, 2003). One application for which this approach may be appropriate is in the modelling of achievement trajectories for school-age children, an area for which Rasch’s probabilistic models are most commonly associated (e.g. Rasch, 1961). Since children’s literacy and numeracy abilities vary when they enter schooling and as a result of learning through schooling, it is feasible to measure and evaluate differences in these two features. To do so, one typically seeks to manifest a construct of interest through a set of assessment items (i.e. to elicit a behaviour which demonstrates the psychological phenomena, such as a performance of reading ability demonstrated through a reading test) and subsequently models the variation in change over time by applying a set of methodological approaches commonly referred to as growth modelling (Williamson, 2016). In doing so, the trajectories of achievement for individuals and groups may be compared.
The following paragraphs provide a summary of the interdisciplinary features of the RGM, its relationship to educational assessment and measurement, and its embedding within Rasch’s more widely recognised work. This approach is then described and applied, with the statistical and descriptive features of the model demonstrated using examples from NAPLAN.
The Rasch growth model
The RGM was first proposed by Georg Rasch in 1940 and subsequently articulated in a series of lectures presented at the 1951 meeting of the International Statistical Institute (Olsen, 2003). In both instances, Rasch’s focus was on the physiological growth of animals and an attempt to derive a ‘simple elementary growth law… [with] time expressed in the physiologically adequate unit of time’ (Olsen, 2003, p.65). Rasch later applied the same methodology in the analysis of economic data, at which time he emphasised the key statistical features associated with the model (Rasch, 1972). In each application of the RGM, irrespective of the domain to which it was applied, the core feature lay in the capacity to derive the primary features of growth for the purpose of efficient comparisons between groups (Rao, 1958).
The RGM approach attempts to derive an estimated rate of growth for each individual, proportional to the increase over time for the population (Rao, 1958). Underlying this is an assumption that observations represent functional changes alongside a continuous variable, time. Methods of this type provide a parsimonious summary of individual differences (McArdle & Nesselroade, 2003), albeit with known limitations in measurement at the individual level relative to more complex methodologies. The RGM aims to identify the principal sources of variation in growth, conveniently expressed as a set of derived variables. Such an approach is consistent with time-ordered analyses that incorporate and acknowledge the role of both individual and group-level differences (Duncan & Duncan, 1995). This has relevance within the context of evaluating differences in developmental trajectories, a point highlighted by Meredith and Tisak (1990) in their call for wider recognition of such procedures by both biological and behavioural researchers.
The Rasch measurement model
The Rasch Measurement Model (RMM) serves a comparative function in which data derived from a testing instrument (e.g. an assessment) can be compared to expectations set under fundamental principles of measurement (Andrich, 2004). These fundamental principles relate to the specific structure of relations between attributes and those relations being wholly attributable to the measurement act itself (i.e. not derived from other measures). Such properties permit a comparison in the degree of differences between two measures (i.e. additivity), but not in the ratio of them (i.e. multiplicativity). In this way, such relations may be considered similar to what Stevens (1946) described as ‘interval scale measurement’. Such features are common to the physical sciences (e.g. temperature as measured in Fahrenheit or Centigrade) and readily permit analysis using linear statistics.
The RMM imposes a priori restrictions on both the model and parameters used to account for the observed structure of data (Andrich & Marais, 2019). In this way, a data set – the numerical summarisation of the qualitative aspects of a testing instrument – can be said to meet the requirements for measurement when it conforms to the structure specified by the RMM, thus permitting quantitative conclusions (Duncan, 1984). It can be argued that the methodology applies an approach consistent with that espoused by Kuhn (1977), in which the merit of the procedure lies not in appraising the appropriateness of a set of models to fit the data, but instead assessing whether observed data suitably represent features expected under fundamental measurement through evaluation of conformity with a pre-specified model.
In the dichotomous RMM, the probability that a person will respond correctly to an item is dictated by the interaction between the person taking the test and the item used to measure the underlying construct. The person’s estimated level of ability determines the likelihood they will respond correctly to the item, given the item’s level of difficulty. This is achieved by calculating the difference between ability and difficulty under certain algebraic constraints. Such an estimate of ability can be conceptualised as the individual’s location along a trait continuum, varying according to their capacity.
A test conforming to the RMM can be used to ascertain an estimate of ability on a construct that is independent of the specific set of items used to make that assessment. This fundamental feature of the RMM – known as specific objectivity – states that the comparison of two people should be independent of the items used to assess them, and similarly that the comparison of two items should be independent of the individuals used for the comparison (Rasch, 1977). The algebraic separation of person and item parameters underlies this notion, ensuring that each parameter can be eliminated in its counterpart’s estimation. Interestingly, this same feature – consistent with requirements for objective measurement – is present within the RGM (Olsen, 2003).
Measuring growth in educational achievement
The RMM can be implemented to evaluate whether quantities under investigation conform to fundamental properties of measurement when growth trajectories are estimated (Williamson, 2017). By applying the RGM to valid and reliable measures of educational achievement that meet the requirements of the RMM, an attempt can be made to derive a set of summative parameters that characterise the growth trajectories of both individuals and groups. This is undertaken under the assumption that performances observed from psychometrically-sound measures of educational achievement represent functional changes associated with skill acquisition through structured learning (i.e. time spent at school). Such an assertion is consistent with developmental theories that posit an asymptotic decrease in the rate of educational achievement over time (Francis et al., 1996). However, like all approaches that are guided by substantive theory, proposed models require subjection to empirical evaluation through data analysis (Williamson, 2016).
Aim, objective and research question
This article addresses the following question: Can Rasch’s Growth Model be used to measure differences in achievement trajectories by representing growth as a function of time? Meaningful application of the RGM will be demonstrated through the use of an example comparing the initial status and growth rates of students in separate Australian states and territories for the reading domain of NAPLAN (i.e. the domain with the highest latent correlation between the NAPLAN assessment domains). Emphasis is placed upon the valuable information that can be derived about achievement trajectories using this model, as demonstrated by visualisation and comparison of growth parameters that are likely to be interpretable by a range of audiences.
Methodology
Data sources
NAPLAN provides annual, point-in-time information regarding student achievement across four domains – reading, numeracy, conventions of language and writing – at the level of the student, school, states and territories (referred to collectively as jurisdictions), and Australia as a whole. NAPLAN assessments are completed by students in Grades 3, 5, 7 and 9. The information collected as part of the program facilitates the monitoring and reporting of performance of specific groups and across all six states and two territories. While NAPLAN assesses performance across both primary and secondary education, the focus currently lies on reporting levels of achievement based on observations at a single timepoint, with limited supplementary reporting at two timepoints – between grades three and five and between grades seven and nine (Australian Curriculum, Assessment and Reporting Authority [ACARA], 2013). By adopting a longitudinal approach to data analysis that incorporates individual-level data measured across all timepoints, a more cogent understanding of educational achievement trajectories in each jurisdiction can be developed. Authorisation for additional analysis and research using this data was provided through formal ACARA channels.
Measures
The measures of interest – student achievement in reading, which typically demonstrates the highest latent correlation with other NAPLAN domains and similarly correlates with other standardised reading assessments, such as PISA (see e.g. Lumsden et al., 2015) – took the form grade-specific weighted-likelihood estimates (Warm, 1989) of reading ability measured in logits, derived using the RMM, as applied across multiple years of NAPLAN assessment (ACARA, 2020). This fulfilled the requirement stipulated by Williamson (2017) that fundamental properties of measurement be attributable to base quantities when growth trajectories are estimated. These estimates were equated onto the NAPLAN reporting scale, which became the unit of analysis. The NAPLAN reporting scale spans across all tested grades (i.e. Grades 3, 5, 7 and 9) and is standardised using a mean of 500 and standard deviation of 100, with scores ranging from approximately 0 to 1000 (ACARA, 2020). This process places reported results across assessment years (e.g. 2013, 2015, 2017, 2019) on the same scale, thus permitting comparability.
Data preparation and processing
Prior to model implementation, data appropriate for longitudinal analysis was sourced and prepared. Separate data sets containing matched ‘gain’ data 1 from the 2013, 2015, 2017 and 2019 NAPLAN assessments were provided to the author for the purpose of data linkage and analysis. As current reporting and analysis techniques do not necessitate the linkage of student data across NAPLAN assessments, a considerable degree of variability exists in the consistency and use of common student identifiers. As a result, a small selection of the large number of variables pertaining to each student were used in the matching process. This was undertaken with the view to maximise the likelihood of a correct correspondence in assessment information included across timepoints.
Data was first partitioned such that only cases from the appropriate grade and year were retained (i.e. only Grade 3 data was retained from the 2013 dataset, only Grade 5 data was retained from the 2015 dataset, etc.). The following variables were subsequently used in the matching process:
As the RGM requires complete data across all timepoints of interest, cases with missing data were removed. There was considerable variability in the number students with complete data across all four timepoints in each jurisdiction. To address these imbalances and potential privacy-related issues, a senate-weighted, random sampling procedure was then undertaken in which 1000 students from each jurisdiction with complete data were selected. Due to issues associated with data access and its linkage, data from one jurisdiction was excluded, resulting in a total of 7000 students being included in the analysis sample.
A single time-invariant variable was selected for further analysis – jurisdiction. Jurisdictional data was re-categorised using an anonymised, numerical identifier. Comparisons were subsequently made of descriptive statistics for each of the variables relative to the 2019 ‘gain’ data set, with approximate equivalence found between the two datasets. Computation of the level of negative ‘gain’ – in which student progress decreases over the measures of interest – was also undertaken. During this process, distribution checks for concordance with assumptions of analysis of variance (ANOVA) – including normality of distributions – were undertaken.
Analytic methods
The RGM proposes the existence of a quantitative mode of variation in growth, referred to as the ‘meta-metre’, that is common to all individuals, providing relevant information for the comparison of average growth curves (Rao, 1958). In deriving this ‘age transforming function’ as Rasch first described it (Olsen, 2003), we assert that the rate of growth of an individual is directly proportional to the meta-metre, thus allowing comparisons characterised by linear relationships (Rao, 1958). Such relationships can be expressed in terms of a single growth parameter, retaining dynamic consistency, whereby individual and group-level differences can be expressed by the same mathematical function (Keats, 1980). In this way, while there exists a trajectory for the population, each individual retains an estimate that varies relative to the group as a whole (McNeish & Matta, 2018). These variables are defined by the data, not a priori, allowing for the modelling of individual growth trajectories.
In the context of educational achievement, the RGM asserts that
This can be conceptualised as a single structural equation expressing growth rate, with coefficients
The linear relationship between growth and time, conditional on the meta-metre, allowed the representation of growth to be presented as straight lines on a plot. These straight lines are easily interpretable as continuous changes, serving useful purposes in the identification of trends (Peebles & Ali, 2015). This approach avoids the issue of non-developmental behaviour (i.e. negative growth) characterised in quadratic representations (Williamson, 2016). As per the recommendations outlined in Nese et al., (2013), sets of growth plots, which permit more readily detected differences in growth, were subsequently used to present variation between groups.
Differences in achievement may also be represented as a function of time, providing another effective and interpretable way to visualise growth trajectories (Singer & Willett, 2003). A feature consistent with the RGM is the capacity to transform time such that – by a common transformation – individual growth curves are linearised, with slight variations attributable to error (Rao, 1958). This can be achieved by taking each timepoint to its natural logarithm (i.e. the logarithm of each timepoint taken to the base of the constant e). To paraphrase Rasch, this process – utilising the meta-metre – allows one to measure time in a particular way, allowing a uniform description of growth curves for all individuals considered (Olsen, 2003).
Results
Estimates of initial achievement
Grade means for each jurisdiction in NAPLAN Scale Scores.
Differences in jurisdictional growth
Comparison of jurisdictional growth parameters
Comparison of growth rate and initial status for jurisdictions.
Paired-comparisons in jurisdictional growth rate as determined by Tukey’s HSD.
aMean difference is significant at the 0.05 level.
Paired-comparisons in initial status between jurisdictions as determined by Tukey’s HSD.
aMean difference is significant at the 0.05 level.
Visual representation of differences between jurisdictions
It is conceivable to represent differences in means simply by regressing the grade means for each jurisdiction on the overall sample means, as displayed in Figure 1. These correspond to the values obtained from Table 1. Grade means of each jurisdiction plotted against the overall sample means.
Alternatively, the linear relationship between achievement and time, conditional on the meta-metre, permits a representation of growth for separate groups as straight lines. Figure 2, Figure 3, Figure 4 and Figure 5 show the comparison of grade means for select pairs of jurisdictions, plotted against the transformed means. The line representing growth for each jurisdiction can be characterised by two parameters, identified within the equation on each of the plots, which are consistent with the growth rate and initial status estimates displayed in Table 2 with their corresponding x-axes obtained as the transformation of the sample means obtained from Table 1. By presenting the comparison of a pair of jurisdictions in each plot, the individual trajectories of growth for each jurisdiction may be compared. Grade means of jurisdictions 1 and 2 plotted against the transformed means. Grade means of jurisdictions 1 and 6 plotted against the transformed means. Grade means of jurisdictions 2 and 3 plotted against the transformed means. Grade means of jurisdictions 6 and 7 plotted against the transformed means.



Growth trajectories can similarly be represented as a function of time. Attempting to characterise the grade means of jurisdictions from Table 1 as function of linear time (i.e. across the four timepoints) resulted in poor model fit, evidenced by large residuals and shown in Figure 6. An alternative approach is to fit a quadratic model representing a curvilinear functional form, thereby minimising residuals and increasing the variance explained by the model due to the monotonically decreasing rate of growth. This representation is shown in Figure 7. Grade means of each jurisdiction plotted against time, assuming linear growth. Grade means of each jurisdiction plotted against time, assuming curvilinear (quadratic) growth.

Discussion
Characterising growth in two parameters
By applying the RGM to a longitudinal data set containing individual measures of educational achievement in the form of NAPLAN scale scores and using time-invariant variables pertaining to student jurisdiction, it was possible to derive a pair of parameters by which growth rates and initial status could be compared – the growth rate and initial status. These parameters are single-value summaries of the quality of the trajectories, allowing for a relatively straight-forward approach to comparison and interpretation. Utilising the independent separation of these parameters, univariate tests of significance and multiple comparisons (i.e. post hoc tests) were applied to determine the degree to which differences were statistically significant. These two parameter estimates were subsequently used to visually represent the variation in growth trajectories between each jurisdiction.
The degree of variability in the initial status and growth of jurisdictions was considerable. By combining the use of statistical methods and visual representation, it was possible to observe the degree to which differences exist simply by comparing the linear trajectories. Results, presented graphically, visually emphasised areas of variability while also demonstrating qualities associated with the RGM. For instance, Figure 2 clearly portrays that while jurisdiction 2 starts with a higher rate of mean achievement as reflected in the grade three mean, the rate at which achievement grows in jurisdiction 1 was considerably greater. Confidence in this assertion was provided via the use of ANOVA. Such findings can be contrasted against those shown in Figure 3, in which the rates of growth between jurisdiction 1 and 6 do not differ (i.e. no statistically significant difference in growth rate was found) as exemplified by the parallel trajectories. Such descriptive measures provide a visually alluring method that can be used to effectively communicate key observations to a range of stakeholders. Furthermore, while descriptive in their presentation, the results point to areas for further investigation using both qualitative and quantitative methods.
It is noted that there exists a wide variety of current growth models that incorporate individual variability and permit the modelling of complex developmental theories (Duncan & Duncan, 1995), including those embedded within an IRT framework (e.g. von Davier et al., 2011). Such approaches often draw on the traditions of hierarchical linear modelling (HLM; Raudenbush & Bryk, 2002) and structural equation modelling (SEM; Meredith & Tisak, 1990). While the RGM serves as an effective, niche method for summarising and describing results – providing what Goss et al. (2018) describe as a need for approaches that accommodate non-linear rates of growth for the purpose of comparing relative student progress – additional value could be sought through the comparison of group estimates derived under different methodologies. In the case of the RGM, the linearisation of growth rates via the instantiation of the meta-metre, conforming to properties consistent with objective measurement, provides a benefit that may not be realised in alternative approaches.
Noting the use of the meta-metre in deriving comparative estimates, a focus for further research lies in the impact of violations of this time transforming function, and the conditions under which these are likely. While Rao (1958) describes a process in which the existence of a common transformation can be empirically tested, the degree to which such violations may impact parameter interpretation has not been investigated. Due to the novelty of these methods, further inquiry into such conditions would be an appropriate next step in evaluating the application of the RGM. Future applications would also benefit from exploration into the error terms associated with the RGM parameter estimates. Citing Brody (1993), Stone (2020) highlights that the approach used in the model essentially averages over incidentals (i.e. individual variation and error) with the view to overwhelm individual levels of variation and move to a superordinate level of aggregation. While this may provide appropriate summative properties, the implications of doing so – as well as the consequences of the incorporation of measurement error into specific measurement types such as weighted-likelihood estimates (von Davier, Gonzales, & Mislevy, 2011) – requires evaluation.
Limitations of the RGM
While the RGM provides an effective method through which group growth can be described and compared, the model itself is not explanatory. The purpose of the model is largely descriptive; therefore, no attempt is made to provide clarity on the multitude of possible factors contributing to observed differences. While providing an effective method for reporting and lending itself to use as a preliminary step in research activities, the testing of complex theories is reserved for alternative models. As broad sets of such methodological approaches exist, it is recommended that further research compare estimates derived from such models against those of the RGM, while also considering the properties associated with their outcomes.
A further limitation of the RGM is the requirement for complete longitudinal data. While the model retains dynamic consistency (Keats, 1980) as a series of differential equations modelling rate of change as a function of the state of a variable, the incorporation of individual differences in the estimates of group-level summative parameters results in a requirement for non-missing data. One outcome of this is the possibility of correlations between rates of attrition and other variables of interest. While the current application pertains to a census-based assessment with very high participation rates (i.e. greater than 95%; ACARA, 2019), the possibility remains that high rates of withdrawal or attrition could feasibly give rise to systematic biases within sub-groups that may be explored, thus limiting the interpretability of results.
Similarly, while the present analysis was undertaken predominantly for demonstrative purposes, it should be noted that it was applied to a limited data set. While efforts were made to retain consistency with the original data through a comparison of the distribution of variables of interest, the analysis and results may include sampling biases that obscure true results. For example, as no data was available one jurisdiction, the present derivations are based on a sub-set of the total population. Equally, there were data linkage challenges that resulted in a loss of cases through the matching of students across each NAPLAN cycle – 2013, 2015, 2017 and 2019. To resolve this, the introduction of universal student identifiers for the purposes of data linkage would likely be required, particularly if such approaches were to be implemented at scale. While recommendations put to government have advocated for the implementation of these to serve the needs of longitudinal approaches (e.g. Department of Education and Training, 2018), such decisions may require further ethical examination prior to implementation (Arnold, 2013).
Conclusion
This research project investigated the application of the RGM for evaluating differences between jurisdictions. This came in response to calls for reporting of progress in student achievement throughout the course of schooling. It utilised the summative properties of the RGM and its capacity to represent growth in a manner interpretable to a range of audiences (e.g. educational researchers, statisticians and policy makers), to evaluate variation in NAPLAN reading achievement between jurisdictions.
Drawing on the desire to measure growth over the course of schooling, this process of presenting results visually, supported by significance testing, permits an ease of use for both government and academic audiences.
Despite the limitations associated with the uniqueness of this under-utilised method, the present investigation has provided support for the effectiveness and efficiency of the approach to growth modelling first proposed by Rasch. It has demonstrated that the RGM allows for the efficient comparison of jurisdictions in a manner that is both statistically rigorous and accessible. Such an approach provides both a novel and effective means by which growth may be reported and an avenue for further investigation into group-level differences via qualitative and quantitative research.
Footnotes
Acknowledgements
The idea for the application of the Rasch Growth Model to National Assessment Program – Literacy and Numeracy (NAPLAN) data took root following a presentation to the Australian Curriculum, Assessment and Reporting Authority (ACARA) by Professor David Andrich in July 2020. This presentation discussed the measurement of growth by the Danish Mathematician Georg Rasch. Rasch had applied his approach to growth of physiological variables, such as weight, which have a natural origin. Andrich showed how Rasch’s approach could be adapted to measured variables with an arbitrary origin, such as those derived using the measurement model carrying his name. Correspondence between the author and Andrich continued throughout the period of the development of the manuscript, with particular assistance provided in the derivation of the ‘meta-metre’. I extend my thanks and acknowledgement to David for his support and encouragement.
Declaration of conflicting interests
The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Funding
The author(s) received no financial support for the research, authorship, and/or publication of this article.
Notes
Appendix
It is possible to estimate a meta-metre for the comparison of rates of growth (Andrich et al., 2022) by taking the mean measurements at each timepoint t
Importantly, this can be expressed as
It is noteworthy that it is feasible to estimate
