A primer on continuous‐time modeling in educational research: an exemplary application of a continuous‐time latent curve model with structured residuals (CT‐LCM‐SR) to PISA Data

Introduction Longitudinal studies and methods are essential to many research questions in education sciences. Questions of learning, growth, and development are studied by examining change over time (Millsap, 2008; Singer, 2019). Moreover, if researchers are interested in causal relationships between several constructs, the temporal order, as realized in longitudinal studies, is one important prerequisite (among others) for drawing robust conclusions (Shadish et al., 2002). One challenge associated with longitudinal studies is the choice of the appropriate statistical model for the data analysis. The statistical models should correspond with the research questions at hand and the theoretical assumptions about the constructs and the processes underlying them (Baltes & Nesselroade, 1979; Collins, 2006; Little, 2013; McArdle, 2009). A close fit between theories of change and the models used to analyze the data is a prerequisite for valid inferences. Abstract

One class of models that is frequently applied in educational research is dynamic models, which include, for example, (vector) autoregressive models (e.g., Hsiao, 2014;Lütkepohl, 2005), cross-lagged panel models (e.g., Hamaker et al., 2015;Usami et al., 2019;, or latent change score models (McArdle & Hamagami, 2001). These models usually treat time as discrete, which is why they are often called discrete-time dynamic models (DTMs; e.g., Hsiao, 2014;Voelkle et al., 2018). DTMs, however, are limited by the widely recognized timeinterval dependency (e.g., Gollob & Reichardt, 1987;Ryan & Hamaker, 2021). This means that dynamic parameters of DTMs depend on a respective time-interval. Thus, they cannot represent the continuous nature of developmental processes and do not provide information regarding other time intervals, which is a considerable limitation in several other regards (e.g., Voelkle et al., 2012). For this reason, continuous-time dynamic models (CTMs) have been put forward in recent years (van Montfort et al., 2018).
In the present study, we adopt these ideas and outline potential advantages of CTMs over DTMs for longitudinal data analysis in educational research. In addition, we focus on how trends can be integrated into dynamic models. Besides the analyses of process dynamics, trends are additional characteristics of longitudinal data and are often of core interest in educational research. Several models have been proposed to capture trends and dynamics simultaneously and are discussed in ongoing debates (e.g., Asparouhov et al., 2018;Hamaker, 2005;Núñez-Regueiro et al., 2021;Usami et al., 2019). However, to date, most of them employ discrete-time modeling to represent the dynamic part (but see, e.g., Delsing & Oud, 2008). Dynamic models that provide information on trends and dynamics on a continuous-time scale, would overcome this limitation. In the present article, we propose a combined model that provides information on both the continuous-time dynamics and trends that we call the continuous-time latent curve model with structured residuals (CT-LCM-SR). The model is a continuous time version of the existing latent curve model with structured residuals (LCM-SR; Curran et al., 2014;Hamaker, 2005). We illustrate the application of the CT-LCM-SR with data from the PISA reading literacy assessment using the R package ctsem (Driver & Voelkle, 2018a, 2021. In doing so, two research questions are examined. (1) Is there a trend in the PISA countries' mean reading scores over the period from 2000 to 2018, and do the countries differ in the slope and direction of the trend? (2) How persistent are deviations from the trend? The latter research question tackles the stability of educational systems when deviations from the trend (so-called shocks or impulses) occur. Influences leading to temporary gains (positive deviations from the trend) could be, for example, temporary investments or short-term interventions initiated by a country's politics. Influences which might result in negative deviations can be political unrest, a shortage of teachers in a given cohort, or a global financial crisis or pandemic. Thus, the stability of a system could be thought of as the system's resilience against disturbances. Clearly, this is oftentimes a double-edged sword, as a high persistence of positive deviations is often desired, which would be the case in an instable or non-resilient system. Contrarily, negative disturbances are often desired to dissipate quickly, which would be the case if the system is stable or resilient.
In our illustration, we show how the CT-LCM-SR can be used to answer both research questions. Thereby, we compare the CT-LCM-SR to related models, exemplarily

How variables develop over time and how we measure this development
When longitudinal studies are conducted in educational research, the data are usually repeated measurements of the same constructs and target units (e.g., subjects). For instance, questionnaires or achievement tests are repeated multiple times, each after a certain period of time. Based on such snap-shots in time, we gain insights about the state of our target units (Fig. 1A). Between these measurement occasions, we have no data and thus no information about the changes of the variables under study. The stability, inertia, development, change, or longitudinal association of multiple variables in multivariate analysis must be inferred from these repeated snapshot-like data. In many theories of change, however, the observation units and variables of interest are not assumed to exist only at the measurement occasions or to change in discrete steps (Fig. 1B). Rather,  (C). Panel A Simulated data for a single observation unit examined at ten measurement occasions. Panel B DT autoregressive effects are usually interpreted by reference to a constant process mean. The process mean is represented by the horizontal dotted line. Panel C In theories of change, it is usually assumed that there is continuous existence of observation units and continuous change of variables, which is represented by the solid line connecting successive measurements. Panel D A linear trajectory smoothed through the data, as is done in (linear) growth-curve models. The model-implied trajectory interpolates any time point and can be used to predict values for any time-interval using time as a continuous predictor Page 4 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 we assume that observation units and variables under investigation exist and develop continuously over time (Fig. 1C). This is applicable to variables at the level of individuals, such as the development of students' competencies, cognitive abilities, self-concepts, attitudes, and motivations. Moreover, this continuous-time perspective is transferable to social systems and variables at the level of school classes, variables describing parent-child-teacher interactions, or the performance of entire education systems as studied, for instance, in the international large-scale assessments (LSAs) such as PISA and TIMSS. Although there are typically several years between each measurement occasion in LSAs, we do not assume that the characteristics of education systems cease to exist in the interim. Thus, from a theoretical point of view, it is desirable to adopt the idea of continuously developing variables also in the statistical models used. As shall become apparent in the following sections, this not only allows us to align models closer with theories of change, but also provides further advantages, such as greater flexibility for the procedures of data collection, improvements for cross-study comparisons (e.g., Voelkle et al., 2012), or the exploration of the unfolding of dynamic effects (e.g., Hecht & Zitzmann, 2021a).

Discrete-time dynamic models (DTMs)
Dynamic models represent the state of variables at time t i as a function of previous states (e.g., at time t i−1 ) of the same variables (Deboeck & Boker, 2015;Hsiao, 2014;Voelkle et al., 2018). Typical questions that can be addressed with dynamic models are questions related to the stability or inertia of constructs over time, to find out, for example, how fast the states of variables change (or can be changed). Dynamic models can also be used to address questions about the variability of states of a construct or questions about the predictability of future states (based on the current state) (Deboeck & Boker, 2015;Loossens et al., 2021). Well-known dynamic models in educational research are multivariate models, such as the cross-lagged panel models (CLPM). In terms of Granger causality, such models usually aim to provide insights into the strength and predominant direction of the relationship between the variables of interest (Granger, 1969;Hamaker et al., 2015). In educational research, many dynamic processes are modeled by first-order models such as the first-order autoregressive model (AR(1)) or its multivariate variant, the vector autoregressive model (VAR(1)) 1 (Sivo & Fan, 2008). In VAR(1) models, the present state of variables is predicted by the previous state. A discrete-time VAR(1) model can be represented by the following equation (cf. Oud & Delsing, 2010;Voelkle et al., 2012): Here, x is a column vector of length V (number of variables), observed at time t i (with i = 1, . . . , T , where T is the number of measurement occasions) and t i the time interval between two measurement occasions. A is a V × V matrix relating the observed variables over time (autoregressions on the main diagonal, cross-lagged effects on the (1) Lohmann et al. Large-scale Assessments in Education (2022) 10:5 off-diagonals), b is an intercept vector of length V , and w a vector of prediction errors or white noise (e.g., Deboeck & Preacher, 2016;Oud & Delsing, 2010). Covariance matrix Q t i contains prediction error variances and covariances. From this representation of a VAR(1) model, it is apparent that the coefficients of interest, in particular the autoregressive and cross-lagged coefficients of the matrix A , depend on the time-interval t i ( t i = t i − t i−1 ) between successive measurement occasions. For example, if this interval length is three months, the estimated coefficients are conditional on exactly this time-interval and are not valid for any other interval. Thus, the process described by the parameters corresponds to the development of a variable or an observation unit that exists only at the time points of measurement, illustrated in Fig. 1A. However, as discussed earlier, in most cases this representation is not consistent with theory that assumes that variables and observation units exist and change continuously over time (Fig. 1C). This timeinterval dependency is also referred to in literature as the interval problem of DTMs (e.g., Ryan & Hamaker, 2021). This dependency results in several drawbacks, which have been described in detail elsewhere (e.g., Hecht et al., 2019;Oud & Delsing, 2010;Voelkle et al., 2012) and will only be briefly mentioned here: (1) The information about the dynamics of the variables under study that DTMs reveal only relates to the particular time-interval used (e.g., Deboeck & Preacher, 2016;. (2) There is no information about how the dynamic effects change with increasing time-intervals (e.g., Hecht & Zitzmann, 2021a). Moreover, (3) (approximately) equal spacing between measurement occasions must be realized across all units in order to be able to properly apply DTMs 2 (see . Establishing equal spacing is a major challenge, especially when using flexible survey designs (de Haan-Rietdijk et al., 2017) whose importance has been increasingly emphasized for educational research in recent years (Zirkel et al., 2015). The time-interval dependency of DTMs also has disadvantages since (4) results from studies examining different time-intervals between measurement occasions are not directly comparable with each other, making it difficult to conduct meta-analyses on these results (Dormann et al., 2020). Continuous-time dynamic models (CTMs) resolve these problems.

Continuous-time dynamic models (CTMs)
The assumption of variables that exist and unfold continuously is consistent with CTMs (e.g., Oud & Delsing, 2010). By using CTMs, we obtain not only the same information provided by DTMs if all underlying assumptions are met, but also parameters that are independent of the length of the time-intervals of a given study. These continuous-time parameters allow for gaining insights into the underlying continuous-time process and for deriving discrete-time parameters for various time-intervals (Hecht et al., 2019;Voelkle et al., 2012). In addition, CTMs overcome the practical drawbacks and limitations of DTMs outlined above (e.g., Oud & Delsing, 2010;Voelkle & Oud, 2013). However, the implementation of CTMs is mathematically more challenging than that of DTMs because CTMs require differential calculus (Deboeck & Boulton, 2016;Voelkle et al., 2018). Fortunately, the application of CTMs has been greatly facilitated in recent years by software Page 6 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 packages like the R package ctsem , 2018a, 2021. As a result, the application of a wide range of CTMs is now possible in a relatively straightforward manner. The basic stochastic differential equation (SDE) used in ctsem can be represented as follows (Oud & Delsing, 2010;Voelkle et al., 2012; for an overview of different notation styles of the same equation see, e.g., Hecht & Zitzmann, 2021a): dt provides information about the rate of change of the variables in the column vector x of length V at the time t (Deboeck & Boulton, 2016;Ryan et al., 2018). The term can be thought of as the change dx in the variable x over the time-interval dt , which, in case of the derivative, is infinitesimally small ( t i → 0). The expression dx(t) dt is also referred to as velocity . Given the information regarding the rate of change at time t , we know how the process under study is changing at that time . The rates of change of the variables x , shown on the left side of Eq. 2, is explained by a deterministic and a stochastic part on the right side. The deterministic part is composed of the so-called drift matrix A , which is a V × V matrix and includes the continuoustime auto-effects (equivalent to autoregressive effects in DTMs but independent from the length of time-intervals) on the main diagonal and the continuous-time cross-effects on the off-diagonals, the state of the variables x at time t , and a V × 1 continuous-time intercept vector b . G dW(t) dt represents the stochastic part of the equation and can be described as a continuous-time error process (Oud & Delsing, 2010;Ryan et al., 2018). W(t) is the socalled Wiener process, which is a random walk in continuous-time and the Cholesky factor G indicates the effect of W on the change in x(t) (for details see Driver & Voelkle, 2018a;Oud & Jansen, 2000). Because the variance of the Wiener process depends on the length of the time-interval over which it is integrated, it can represent the accumulation of errors over longer time periods, resulting in larger error variances (Deboeck & Preacher, 2016;Oravecz et al., 2011). The associated V × V variance-covariance matrix Q = GG ′ is often called diffusion matrix and contains the process error variances on the main diagonal and the process error covariances on the off-diagonals.
Based on the continuous-time (CT) parameters of Eq. 2 it is possible to calculate the DT parameters for any given time-interval (Hecht et al., 2019;Oud & Delsing, 2010). To better distinguish between CT parameters and its corresponding DT parameters, in the following, DT parameters that are constrained to underlying CT parameters are denoted with an asterisk (*) (cf. Hecht et al., 2019). For an overview of corresponding DT and CT parameter labels, see also Hecht and Voelkle (2021). The CT drift matrix A is related to the DT autoregressive matrix A * t i via the following equation: The CT intercept vector b is related to the DT intercept vector b * t i via the following equation: (3) Page 7 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 Furthermore, the DT error covariance matrix Q * t i for interval ∆t can be calculated from the CT diffusion matrix Q (Delsing & Oud, 2008): where I is an identity matrix, ⊗ the Kronecker product, which results in a large matrix containing all possible products of the elements of the two initial matrices, row and irow are operators, the former puts the elements row-wise in a column vector and the latter puts them from a column vector into a matrix. To solve the system of Eqs.2, 3, 4, 5, the matrix exponential has to be used (e.g., Oud et al., 2018;Ryan et al., 2018). Details and examples of the link between the parameters and their interpretation will be described in the methodical and results section of the article. Before doing so, however, we address another challenge that often arises when analyzing longitudinal data in education sciences: systematic trends.

Trends
Analyzing longitudinal data in educational research is often also about identifying trends. For example, when studying students' achievement repeatedly over several months and years, their achievement typically continues to increase. Such systematic trends can also occur when examining other variables, such as motivational (e.g., Wigfield et al., 2006Wigfield et al., , 2015 or personality-related characteristics (e.g., Specht, 2017). Furthermore, besides the theoretical interest in trends, it is also often important for dynamic models to account for such trends. Unaccounted trends can affect the autocovariance structure and thus lead to biased estimates of the parameters (e.g., Asparouhov et al., 2018;Núñez-Regueiro et al., 2021;Walls & Schäfer, 2006). Therefore, there are several approaches to deal with trends in dynamic models, such as pre-detrending the data (e.g., Box et al., 2015;Walls & Schäfer, 2006). Another possibility is to model trends and dynamics simultaneously. Growth-curve models (GCM) are a well-known approach to model trends ( Fig. 1D) (e.g., Bollen & Curran, 2006). There are approaches that combine dynamic DTMs with GCMs, such as the autoregressive latent trajectory (ALT) models (e.g., Bianconcini & Bollen, 2018;Bollen & Curran, 2006;Curran et al., 2014;Hamaker, 2005). In addition to all the other disadvantages of DTMs already described, approaches combining elements from DTMs and GCMs come with a model-inherent unequal timeinterval dependency of parameters: While GCM-parameters are usually treated and interpreted as independent from time-intervals (Fig. 1D), the DTM parameters depend on the time-interval used in the study (Delsing & Oud, 2008). Thus, in combined models such as ALT models, the trend parameters describe a continuous process (Fig. 1D) and the dynamic parameters describe a discrete-time process (Fig. 1B). To remove this inconsistency from combined models, a promising solution is to use CTMs to make the dynamic parameters independent of time-intervals as well. One of the very first explicit formulations of a combined model in continuous-time is the continuous-time autoregressive latent trajectory (CALT) model proposed by Delsing and Oud (2008; see also Oud, 2010).
However, ALT and CALT models are associated with other considerable limitations. Trends and dynamics are not clearly separated in ALT and CALT models and can influence and "compete" with each other. An associated problem is that dynamic models are recursive, which means that dynamic processes depend on their previous values. This is not the case for GCMs. These problems lead to drawbacks, such as the need for special solutions to "start up" ALT and CALT models and uncertainty of how to interpret GCM parameters in ALT and CALT models (e.g., Bianconcini & Bollen, 2018;Hamaker, 2005;Jongerling & Hamaker, 2011;Little, 2013;Oud, 2010;Usami et al., 2019).
To clearly isolate trends from the dynamic part of the model, the discrete-time (DT) latent-curve model with structured residuals (LCM-SR) has been suggested as one suitable alternative (Curran et al., 2014;Hamaker, 2005). In the LCM-SR, the development of a variable under study is described by two different process components, trends and dynamics, which are-contrarily to the ALT or CALT models-clearly separated (Fig. 2). Because the trend component (which is similar to what is modeled in the GCM) accounts for the trends in the data, the dynamic component represents a trend-adjusted (detrended) and centered process in the LCM-SR (Usami et al., 2019). For this reason, systematic (linear) trends are disentangled from the autoregressive and cross-lagged parameters. An additional advantage of this separation is that the parameters of both processes can be interpreted as is typically done in CTMs and GCMs. Moreover, trend parameters and dynamic parameters do not compete in a joint process in the LCM-SR as in the ALT and CALT models (see Curran et al., 2014;Hamaker, 2005;Jongerling & Hamaker, 2011). 3 To our knowledge, an explicit formulation of a continuous-time version of the LCM-SR has not been proposed in the literature, although a continuous-time latent curve model with structured residuals (CT-LCM-SR; Fig. 2) comes with several advantages: (1) Using the CT-LCM-SR, information is obtained describing trends in the data as well as information describing the process' dynamics on a continuous-time scale. (2) Trends and dynamic processes are clearly separated in CT-LCM-SR, meaning that trends and dynamics can no longer influence each other. (3) The interpretation of the parameters follows the interpretation of CTMs and GCMs. (4) All parameters of the CT-LCM-SR describe the process under study independently of the length of time-intervals in a given study. This is not the case for the DT-LCM-SR in which the dynamic parameters always depend on the length of the time-interval (cf. Delsing & Oud, 2008;Oud, 2010). Finally, (5) both the standard linear GCM and the CT-AR(1) model (which is a CTM without trend components) are nested within the CT-LCM-SR which enables chi-square difference testing to compare models. In addition, with the CT version of the LCM-SR, we can take advantage of all the other benefits associated with using CTMs instead of DTMs.

The CT-LCM-SR
The basic idea of the CT-LCM-SR is to decompose a process into the two process components, trends and dynamics. This is technically done by introducing two continuous-time processes with special parameter values and constraints, so that one continuous-time process purely captures the trend and the other one the dynamics (see technical details below, especially Eq. 8 and 9). In the following, we therefore also refer Lohmann et al. Large-scale Assessments in Education (2022) 10:5 to the trend component as the "trend (or growth) process" and to the dynamic component as the dynamic process with the need to keep in mind that these are just technical terms for the two components of one single process variable. The univariate CT-LCM-SR is thus composed of two process components, one to describe the linear trend and one to model the dynamics of one process variable. When including more variables, each variable would need two processes to represent its two process components, for instance, a bivariate CT-LCM-SR would then include four processes, a trivariate CT-LCM-SR six, and so forth. The ordering of the two processes is arbitrary. We use the first process as the linear growth process that accounts for initial values and trends (Fig. 3A). The second process is then the dynamic process that provides there is the trend process (standard linear growth-curve process with random effects for intercept and slope) accounting for differences in the initial values and trends. The continuous-time dynamic process, which accounts for the dynamic structure is represented on the top of the squares. Estimated parameters: int mean intercept, int SD variance of the random effects of the intercept, b mean growth rate, b SD SD of the random effects of the growth rate, Cor int,b intercept-growth correlation, T 0dyn SD initial SD of the dynamic process; A drift matrix, Q diffusion matrix Page 10 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 information about the process' dynamics on a continuous-time scale (Fig. 3B). 4 Since the growth process accounts for nonzero initial values (intercept parameter) and for linear trends (growth parameter), the dynamic process can be considered as centered and linearly detrended (cf. Usami et al., 2019). By combining the two process components in one model, the values can be thought of as fluctuating around the linear trend ( Fig. 3C). Just as in growth-curve modeling, CT-LCM-SR can account for differences between observation units in intercept and growth parameters via random effects (e.g., Bollen & Curran, 2006;Little, 2013). Thus, the dynamic process is within-unit centered. If potential differences in trends and initial values between observation units are not taken into account, within and between level effects can be confounded (see also the discussion of the use of random effects in dynamic models by, e.g., Hamaker et al., 2015;Lüdtke & Robitzsch, 2021;Núñez-Regueiro et al., 2021;Usami et al., 2019).
Using the SDE (Eq. 2), growth-curve models can be specified as special cases (Driver, 2020). To illustrate the application of a CT-LCM-SR with the SDE, we first consider a single growth-curve process.
To obtain a GCM with the SDE, the CT auto-effect is set to a negative value very close to zero (e.g., − 0.0001). This means, that the corresponding DT autoregressive effect ( a * t i = e a t ) approximates 1. The freely estimated initial value x 0 . The dynamic part gives information on how much of the deviation from the trend on time T is expected to be carried over to future states with growing time intervals. A mean-reverting process is shown, which means that the CT auto effect is negative (corresponding to DT autoregressive effects between 0 and 1 for every time interval) Lohmann et al. Large-scale Assessments in Education (2022) 10:5 corresponds to the growth-curve intercept parameter, while the continuous-time intercept ( b * t i = a −1 e a t − 1 b ) serves as the linear growth parameter. To illustrate the functionality, we consider the deterministic part of Eq. 2 and insert the DT parameter constraints of Eqs. 3 and 4 (Oud & Delsing, 2010): The right part of the equation is composed of the two constraints relating the CT auto-effect to the DT autoregressive effect and the continuous-time intercept to the DT intercept. E[x 0 ] is the expected value of the freely estimated initial mean. To obtain the expected values for other measurement occasions, the value for t is changed. To illustrate this, let us assume t = t 1 − t 0 = 1 for the second measurement occasion t 1 : Compared to the initial measurement occasion, the expected value for the second measurement occasion increases by the additive component 1 × b . To obtain the expected value for the third measurement occasion t 2 , we set t = 2: The expected value for t 2 increases by the additive component 2 × b compared to the initial measurement occasion. This can be continued for any measurement occasion. Because the CT auto-effect is fixed to a negative value close to zero, which corresponds to a DT autoregressive effect approximating 1, the continuous-time intercept part linearly increases with longer time-intervals.
To obtain a univariate CT-LCM-SR (Fig. 2), we extend Eq. 6 by a second (dynamic) process with a freely estimated auto-effect a and a process mean of zero ( b = 0): The lambda matrix = [ 1 2 ] in Eq. 9 relates the observed values of variable x at time t to the two processes x lint and x dyn t , which represent the growth component and the dynamic component, respectively. The states of the two processes x lin and x dyn at time t are predicted by the state of the two processes at a previous time point t − t (Eq. 8). The autoregressive matrix is the first element on the right side of the equation. On the main diagonal, the autoregressive matrix has the restricted parameter e −.0001 t for the linear trend component and the freely estimated autoregression e a t for the dynamic Page 12 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 component. The off-diagonals are set to zero since the two process components are not assumed to directly affect each other. The intercept vector (the second element on the right-hand side of Eq. 8) contains the known restrictions for the growth process component at the top, and is set to zero at the bottom, because the dynamic process component is centered and detrended (fluctuating around the process-mean of zero). The model can be represented in the SDE (Eq. 2) notation as follows: Here, it becomes obvious that the trend component is deterministic because the random error term is zero. Therefore, the residuals are completely absorbed into the dynamic part and are accounted for by CT auto-effect a and CT random error variance Q . In addition, the first measurement occasion (T0) is estimated as where int MEAN represent the mean intercept of the trend component, int SD the standard deviation (SD) of the random effects of the intercept, and T 0dyn SD the residual SD of the first measurement occasion entering the dynamic process (the labels of the estimated parameters in Eq. 10 and 11 correspond to the labels used in the illustrated example later on and Fig. 2). In the random effects version of the CT-LCM-SR, there is an additional variance component for the slope of the trend component and a correlation of the random effects ( Cor int,b ).
To turn this CT-LCM-SR into a model without trend component (i.e., a standard CT-AR(1) model), we can set the freely estimated parameter b in Eq. 10 to zero.

An illustrative example of the CT-LCM-SR model using PISA data
In the following, we provide an illustrative application of the CT-LCM-SR to data from the PISA reading literacy assessment, including a step-by-step tutorial on how to set up the CT-LCM-SR in ctsem. To make it as understandable as possible, a simple example was chosen, which can, however, easily be extended to more complex models and research questions (as outlined in the outlook of this article).

Data
The data stem from the seven currently available PISA waves from 2000 to 2018. For each single wave, country-specific mean scores were calculated across all participating students in each country (see OECD, 2009). For this step of data preparation, the R-package intsvy (Caro & Biecek, 2017) was used and the resulting values were additionally checked with the SPSS macros provided on the OECD website. This was primarily done to ensure that the weighting procedures were accurate (IBM SPSS, 2020; OECD, 2021). Furthermore, because data from the achievement tests were processed with the Rasch model, the use of plausible values (PVs) is recommended (OECD, 2009; Page 13 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 Wu, 2005). Therefore, all analyses were first performed with each PV separately and the results were pooled afterwards (OECD, 2009;Rubin, 2004). In PISA, the achievement data are represented by five PVs. 5 Thus, five data sets were created, one for each PV (see OECD, 2009). The data sets had a hierarchical data structure, with repeated measurements nested within countries. For analyses with the ctsem package , the data sets should preferable be in the long format (see also the ctsem tutorial by Hecht et al., 2019). Therefore, the data were structured, with all observations of a variable being contained in the same column (in contrast to the wide format, where there is a separate column for each measurement occasion). The corresponding time of the measurement was put in a separate column. Information about which observations were related to which country was contained in another column (see Fig. 4).

Sample
The final sample examined in this study consisted of N = 56 PISA countries. It is the same sample selected by the OECD (2019a) for the presentation of countries developmental

Measures
PISA reading literacy The primary variable of interest are the countries' mean estimates in PISA reading literacy assessment. Because the reading scales were linked across all available PISA waves from 2000 to 2018, they are basically comparable and thus suitable to be analyzed longitudinally (OECD, 2019b).
Coding time In the data sets, the time point of measurement was represented as the years starting from 2000 as the first measurement occasion coded as t 0 = 0 and continuing in three-year intervals (2003 was coded as t 1 = 3, 2006 as t 2 = 6, and so on; see Fig. 4).

Data analysis procedure
To tackle the guiding research questions (1) whether there was a linear trend in the PISA countries' mean reading scores, whether countries differed in the slope and direction of the trend, and (2) how persistent the effect of a potential deviation from the trend (a shock) would be, we first computed a series of models. For this purpose, we used the ctsem package (Version 3.4.3; Driver & Voelkle, 2018a, 2021 in R (R Core Team, 2021). All models were compared with respect to their fit. Because all models were nested in the CT-LCM-SR, multiple chi-square difference tests were performed.
To address the question of whether there is a trend in PISA countries' mean reading literacy scores, we compared the CT-LCM-SR to a CT-AR(1) model, which is a CTM without trend component. To answer the question of whether there is also a dynamic process in the PISA data, we compared the CT-LCM-SR to a standard linear GCM (specified within the continuous-time framework), which is a linear trend model without a dynamic process. The dynamic process describes the effects of short-term influences (shocks) on countries mean reading literacy development. In addition, to answer whether there are significant differences in starting values and trends across PISA-countries, we compared fixed and random effects variants of all models (i.e., variances of the intercept and linear slope were freely estimated vs. fixed to zero). We present an exemplary interpretation of the estimated parameters of the most relevant model, the CT-LCM-SR, focusing on the CT dynamic parameters. We also present and visualize DT parameters for different time-intervals derived from CT parameters and study the impact of temporary deviations from the trends on the development of countries' mean PISA reading scores to answer the second research question. Moreover, we visualize trends and dynamics of certain PISA countries as examples and show how predictions for future states can be made for various time-intervals.

Models
In addition to the CT-LCM-SR, we also run a linear GCM and a standard CT-AR(1) model. We included fixed and random effects variants of all models, considering random effects for intercept and growth parameters but not for dynamic parameters (model specifications of all models can be found in the Additional file 1). These additional Page 15 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 models are nested in the CT-LCM-SR model. To obtain a ctsem version of a standard GCM from the CT-LCM-SR, we set the dynamic process component to zero, assuming a homoscedastic measurement error variance. 6 To obtain a standard CT-AR(1) model from the CT-LCM-SR, we fixed the growth parameter from the trend component to zero. To obtain the random effects variants of the models, we set the respective model parameters to be variant across countries (see the tutorial in the last part of this section).
Pooling of estimates and variance components across the five data sets was carried out using the mice package (van Buuren et al., 2015).

Fit Indices and Chi-Square tests
To be able to evaluate the relative model fits, we used Akaike Information Criterion (AIC; Akaike, 1973) and Bayesian Information Criterion (BIC; Schwartz, 1978) as well as the Deviance (twice the negative logarithm of the likelihood). For all three indices, lower values indicate better model fits. Furthermore, we run chi-square difference tests for nested models (e.g., Bollen, & Long, 1993). To pool the chisquare test statistics, the miceadds package (Robitzsch et al., 2017) was used.
Missing data Countries with less than five PISA participations between 2000 and 2018 were excluded from the analysis to ensure a sufficient data base for modeling trajectories. This procedure is in line with the analysis and presentation by the OECD (2019a). The remaining proportion of missing values in countries reading literacy scores was 9.4% and thus small. Dealing with missing values is basically model-inherent in CTMs because the length of the time-interval between two consecutive measurement occasions is taken into account (for a detailed discussion of missing data treatment in CTMs, see Oud & Voelkle, 2014). When missing values are left in the data set, they can be addressed using, for example, maximum likelihood approaches (under the MAR assumption; Enders, 2010), as it was done in the present application.

ctsem syntax for estimating the CT-LCM-SR
In the following, we give a step-by-step tutorial of how to specify and run a CT-LCM-SR with the data from PISA using the ctsem package 7  in R (R Core Team, 2021).
For continuous-time modeling with ctsem, two functions are crucial. The ctModel function is needed to set-up the required model, and the ctStanFit function is needed to fit the model to the data. Thus, in order to perform a CT-LCM-SR with the prepared PISA data, the first step was to specify the model. For setting up CTMs properly, the arguments of the ctModel function have to be understood. The following arguments are of key importance within the ctModel function Hecht et al., 2019): 6 The MANIFESTVAR matrix in ctsem was used to define the error term parameters of the GCM (see also Driver, 2020, and the specification in the Supplemental Material). Another more complicated alternative to estimate the standard GCM's homoscedastic error term (Fig. 3D) would be to constrain the variance of T 0dyn SD to q * t (Fig. 3D). While the latter approach better captures the nested structure of GCM and CT-LCM-SR, it requires changes in the source code of ctsem, which is why using MANIFESTVAR to do this is much easier to implement in ctsem. 7 To install the required ctsem package, we run the usual procedure in R (R Core Team, 2021): install.packages( "ctsem"). library( ctsem). Lohmann et al. Large-scale Assessments in Education (2022) 10:5 With the type argument, we chose the type of model we want to set-up. In ctsem the user has the option of type = "omx" specifying a CTM interfacing to OpenMx (Neale et al., 2016), which is the original procedure of ctsem using frequentist estimation. Meanwhile, however, this part is mainly outsourced to an extra package ctsemOMX (Driver & Voelkle, 2021). Model type "stanct" interfaces to RStan 8 (Stan Development Team, 2020) and allows to choose between maximum likelihood, maximum a posteriori, or fully Bayesian estimation , 2021. In the present application we chose type = "stanct" and maximum likelihood estimation for all models.
The argument n.manifest defines the number of variables to be analyzed in a given model. Because we wanted to analyze one single variable (countries' mean reading literacy scores), we set n.manifest = 1. Furthermore, with manifestNames, the names of the variables to be analyzed as labelled in the respective data sets are defined. Reading scores were labelled "READ" in our data sets, so we set manifestNames = "READ" (cf. Fig. 4). The argument n.latent determines the number of process components we need to analyze our variables under study. Because we needed two components for the CT-LCM-SR (one linear trend and one dynamic component) we set n.latent = 2. With the latentNames argument, labels for the process components can be chosen. The number of names has to correspond with the n.latent argument. We chose n.latent = c("lin", "dyn"). The first process component should account for the trends in the data (linear trend) and the second should account for the dynamic process (fluctuating around the linear trends; Fig. 3A-C). Next, the model matrices DRIFT, CINT, and DIFFUSION corresponding to Eq. 2 must be specified. The size of these three matrices corresponds to the number of process components (n.latent). Thus, in our application, the drift matrix A (n.latent × n.latent) was a 2 × 2 matrix. To model the linear trend component, the auto-effect of the first process had to be set to a negative value close to 0. In ctsem this specification is made automatically when a parameter on the main diagonal of the drift matrix is set to zero. In addition, the cross-effects between the two process components were also set to zero because trends and dynamics are independent in CT-LCM-SR. It was only the auto-effect of the 8 DTMs are also possible via type = "standt" (Driver & Voelkle, 2018a).
Page 17 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 dynamic process that was freely estimated by setting DRIFT = matrix(c(0, 0, 0, "a")). The diffusion matrix Q looks very similar because in the CT-LCM-SR, we also only have a single continuous-time error process: DIFFUSION = matrix(c(0, 0, 0, "q")). The continuous-time intercept vector b (n.latent × 1) has a special role in the CT-LCM-SR specification in ctsem as it serves as a slope parameter in the trend process and therefore has to be freely estimated. As explained above, the continuous-time intercept was set to 0 via CINT = c("b", 0). Because the first measurement occasion cannot be regressed on a previous occasion, dynamic models must be initiated somehow. By default, ctsem uses a "predetermined" model with freely estimated parameters at the first time point . To model the linear trends of the CT-LCM-SR in ctsem (using Eq. 2), T0MEANS plays a special role and serves as the (freely estimated) intercept. Since the second process is (within-unit) centered in the CT-LCM-SR, it was restricted to be zero. So we set T0MEANS = c("int", 0). Because the covariances of the initial time-points of both processes should be uncorrelated in the CT-LCM-SR, we set the 2 × 2 matrix to T0VAR = matrix(c("intSD", 0, 0, "T0dynSD")).
The two further arguments MANIFESTMEANS and MANIFESTVAR can be used to specify manifest components such as residuals or measurement errors (for more details, see . In our application, we did not account for measurement error and set both to zero. Finally, the LAMBDA (cf. Eq. 9) matrix relates the observed scores to the process components of the model (components are to be represented in the columns and the manifest variables in the rows): LAMBDA = matrix(c(1, 1)). 9 The complete specification is stored in a model object and reads as follows: After model specification, we can print and check the model object with the command head(CT_LCM_SR$pars, 20). In addition, some further modifications can be made, such as fixing individual parameters to a certain value or specifying random effects (Driver & Voelkle, 2018b). In the current application, we tested if countries differed in terms of their initial level at the first PISA wave in 2000 (random intercept) as well as in terms Lohmann et al. Large-scale Assessments in Education (2022) 10:5 of direction and slope of the trends (random slope) and specified random effects for t0m (random intercept) and b_lin (random slope): CT_LCM_SR$pars[c(1, 7),]$indvarying < -TRUE. Any parameter that should vary across units (in this case parameter one, which is the intercept, and parameter seven, which is the slope) must be set to indvarying = TRUE in the model object.
Once all parameters were properly specified, the model could be fitted to the PISA data. For this purpose, the ctStanFit function was used. For maximum likelihood estimation we set the following additional arguments (see Driver & Voelkle, 2021): CT_LCM_SR_fit < -ctStanFit(datalong = PISA_PV1, ctstanmodel = CT_LCM_SR, optimize = TRUE, nopriors = TRUE).

Descriptive statistics
For the sample of N = 56 countries (level-2 units) included in the analysis the descriptive statistics of the mean reading literacy scores for each of the seven waves are reported in Table 1. In the present sample, the overall mean across the 56 countries and all seven waves was 473 PISA points (SD = 45.4), pooled over the five data sets.

Model comparisons and fit statistics
Model comparisons and fit statistics provided first insights regarding the research questions. In Table 2, the deviance, AIC and BIC of the six models are presented. The random effects models outperformed the fixed effects variants in all cases. This implied considerable differences in initial levels and trends across countries' trajectories of mean PISA reading literacy scores. Furthermore, all three information criteria showed the smallest values for the CT-LCM-SR with random effects, indicating the best model fit of all six models. In addition, we also performed chi-square tests for nested models. In all cases the chi-square test was significant at a two-tailed significance-level of 0.05, also indicating the best model fit for the random effects CT-LCM-SR. 10 Thus, the comparisons of The descriptive statistics were calculated based on the countries' mean scores that form the data basis of the longitudinal analysis in the present study. For the 2015 and 2018 waves, the achievement data in PISA were represented by 10 plausible values instead of 5. Due to the longitudinal modeling, only the first 5 plausible values were considered in the present study a The overall N represents the number of data points in the data set (number of countries in the sample * the respective number of repeated measurements). A total of 56 countries were included in the sample the inclusion criterion being at least 5 PISA participations between 2000 and 2018 Page 19 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 these models provided first support for the existence of trends, differences in trends and initial levels across countries, and a dynamic process.

Results of the CT-LCM-SR for the PISA countries' reading literacy scores
The parameter estimates of the random effects CT-LCM-SR, pooled over the analyses of all five data sets, are shown in Table 3 (parameter estimates of the CT-AR(1) and the GCM can be found in the Table 5 in Appendix). The CT-LCM-SR yielded eight estimated parameters, which were all statistically significantly different from zero. This supported the use of the CT-LCM-SR, which accounts for dynamics and trends in the data. Additionally, this also supported the use of the random effects CT-LCM-SR, which allows for differences in terms of initial values and slopes across countries.

Examining trends in PISA countries' mean reading literacy scores
Five parameters of the CT-LCM-SR were related to the trends, namely the mean intercept, the intercept standard deviation (SD), the growth rate, the growth rate SD, and the intercept-growth correlation. The interpretation of these five parameters corresponds

Table 3 Parameter Estimates of the CT-LCM-SR for PISA Countries' Mean Reading Literacy Scores Development
The random effects CT-LCM-SR contains an eighth parameter, the intercept-growth correlation, which was estimated to be − 0.61 (SE = 0.13, p < .001) a The labels refer to the parameter labels used in Eq. 10 and 11 as well as in the presentation of the R code (p. 23) and Fig. 2 Parameter ( Page 20 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 with the standard parameters of a GCM with random effects. The estimated mean intercept across all countries was 464.8 PISA points (SE = 7.37), and the corresponding intercept standard deviation was 53.50 (SE = 5.57). The estimated growth rate was 0.43 (SE = 0.20) per year, which implies an average increase of 1.29 PISA points per measurement occasion (3 * 0.43) across all countries. The corresponding standard deviation was SD = 1.06 (SE = 0.32). Thus, for instance, for a country with a growth rate of one SD below the average, a negative trajectory is expected and thus a loss of − 0.63 (0.43-1.06) PISA points per year on average (and − 1.89 for the three-year interval between adjacent PISA waves). The estimated correlation of − 0.61 (SE = 0.13) between intercept and growth rate indicated that countries with lower mean scores at the initial PISA wave in 2000 tended to make larger gains in mean reading literacy scores on the follow-up surveys than countries with higher mean scores in 2000. Thus, with respect to the first research question, we found a significant overall increase of 1.29 PISA points per wave or 0.43 PISA points per year in the sample of 56 PISA countries. However, countries differed significantly in the slope and even the direction of the trend (e.g., a country that is one SD below the average is expected to have a negative trend). In addition, we found that countries also differ substantially in terms of their initial values at the first measurement occasion in 2000, and that countries with lower initial scores tend to achieve larger gains in their mean reading literacy scores across subsequent PISA waves.

Examining the dynamics in the development of PISA countries' mean reading literacy scores
While the trend parameters of the CT-LCM-SR describe static change processes (Fig. 3A), the parameters of the dynamic process component describe fluctuations (Fig. 3B). These fluctuations were of substantive interest with respect to the second research question, which concerned the question of how long deviations from the trends (shocks) influence a country's development of mean PISA reading literacy performance. As mentioned above, such deviations can be caused by short-term support programs, or temporary investments in the education system as well as a financial crisis or a pandemic.
The five parameters related to the first process component provide information about the linear trends in the data and center the residuals that enter the dynamic process at the within-country level (Fig. 3A-C). Thus, the dynamic process can be considered as detrended as the growth component accounts for the linear trends in the data (e.g., Usami et al., 2019). The remaining three parameters of the CT-LCM-SR, namely the auto-effect, the diffusion variance, and the initial residual variance component, are related to the dynamic process. The initial variance component represents the residual variance of the first measurement occasion entering the dynamic process and, thus, yields the initial (predetermined) values of the dynamic process in the CT-LCM-SR. The diffusion variance represents the CT error variance, which is the part of the variance of the process that is completely random (Oud & Jansen, 2000). The CT auto-effect can be used to study the time course of the impact of temporary deviations (also referred to as shocks) on the process under study and was therefore the relevant parameter to answer the second research question.
The interpretation of CT auto-effects is different from autoregressive effects of DTMs. Typical for stable or mean-reverting dynamical processes is the negative value of the auto-effect . Translated to DTMs, negative CT auto-effects mean that the corresponding DT autoregressive parameters are between 0 and 1. Furthermore, DT autoregressive effects get smaller for longer time-intervals between consecutive measurement occasions in such stable processes. The negative CT auto-effect refers to the fact that if the value of a time series at time t takes a position far from the process mean the process subsequently tends to return to the process mean with the opposite sign to the deviation (assuming that there are no further shocks afterwards). Furthermore, a negative CT auto-effect indicates that the more distant a deviation from the process mean at a time t , the greater is the speed with which it tends to return to the process mean (assuming the same underlying auto-effect). Likewise, the more negative the autoeffect, the faster the process tends to return to the process mean after a deviation from the trend (for the same time-interval).
Because DT autoregressive effects for various time-intervals can be derived from the estimated CT auto-effect, this parameter is useful to answer the question of how persistently deviations from the trend affected countries' development of mean reading literacy achievement. The CT auto-effect was estimated to be − 0.39 (SE = 0.13). It is important to note that the auto-effect was the same for all countries of the sample in this model (no random effects), unlike the (country-specific) parameters of the trends. To better understand the dynamic effects and find an answer to the second research question, DT autoregressive parameters can be derived from the CT auto-effect for various intervals. 11 DT autoregressive effects can be interpreted as "the amount of within-person carry over effect" from one measurement occasion to the next (Hamaker et al., 2015, p. 10 l. 13 12 ). CT modeling additionally provides information on the dependence of the DT autoregressive effects on the time interval length between measurement occasions. Table 4 Discrete-time parameters for autoregression (AR) and process error for varying timeintervals derived from underlying continuous-time parameters CI confidence interval. A three-year interval represents the time between two PISA surveys. Values for the time-intervals 1, 2, 4 and 5 are interpolations, because there is no data for such time-intervals in the data sets

Time-interval (years)
DT AR effects 95% CI for AR effects DT process error (SD)

Deriving discrete-time parameters as functions of underlying continuous-time parameters
Using Eq. 3, we can calculate DT parameters for arbitrary time-intervals from the estimates of the underlying CT parameters. In Table 4 we present a range of DT autoregressive coefficients and the related confidence intervals. We can do the same for the diffusion variance and derive DT error variances, using Eq. 5 (Table 4). For longer timeintervals, the autoregressive coefficient becomes smaller, whereas the error variance becomes larger (Oravecz et al., 2018). DT autoregressive effects are usually interpreted in the literature as indicating the stability or the inertia of the construct under study (e.g., Hamaker et al., 2015). Thus, they provide information on how much of the deviation at time t i is carried over to t i+1 (relative to the process mean) on average. To address the second research question, the time course of the autoregressive effects and the associated confidence intervals can be used (Table 4). The confidence interval of the autoregressive effect for the five-year interval is the first that includes zero, that is, there is no evidence for a population autoregressive effect larger than zero. An autoregressive effect of zero for a time interval of five-years or longer implies that after this period earlier deviations from the trend have no predictive value. To visualize this effect, DT coefficients can be represented as continuous functions of the (time independent) CT parameters and the length of time-interval. As can be seen in Fig. 5, for increasing length of the time-interval between measurement occasions, the DT autoregressive effect approaches zero (Fig. 5A), while the error standard deviation approaches about 13 (Fig. 5B).

Representing countries-specific expected trajectories and making predictions for future states
Unlike DTMs, in which processes can only be represented with respect to the investigated interval (see Fig. 1B), CTMs such as the CT-LCM-SR allow to represent the time course of developmental trajectories. Combining trends and dynamics, Fig. 6 shows the country-specific trajectories of four PISA countries in the sample. One major advantage of CTMs like the CT-LCM-SR is that predictions can be made with respect to any timeinterval. If PISA reading literacy scores were analyzed using a DTM, the predictions of future states would be limited to the three-year intervals (and their multiples), because  Table 4 column 3 Page 23 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 of the time-interval dependency of DTMs. Moreover, unlike predictions made by a simple GCM, the CT-LCM-SR uses the information of the residuals, which are fluctuating around the static trends (the grey dotted lines in Fig. 6). Predictions of a GCM would always lie exactly on the static trend and would not take into account the dynamics of the residuals. Figure 6 illustrates the mean reverting dynamic process fluctuating around the country-specific static trends. The size of the autoregressive effect decreases as the length of the time-intervals increases, representing the impact of deviations (shocks) on later states over time. The autoregressive effect reaches zero after a certain length of the time-interval (see also Fig. 5A) indicating that there is no further influence of the previous deviation. For long time-intervals the autoregressive effect is zero and the best prediction lies on the linear trend. Thus, the point predictions return to the static linear trend process after a certain interval (Fig. 6).

Discussion
The aim of the present study was to discuss advantages of CTMs, especially the newly proposed CT-LCM-SR, for longitudinal data analysis in the education sciences and to provide the reader with an easy-to-follow introduction to specifying and interpreting CTMs with the help of an illustrative example using PISA data.
In the first section, we discussed that CTMs are often better suited to capture the continuous nature and development of constructs in educational research than DTMs. Using CTMs, we can thus better align our statistical models with our theories of change  Lohmann et al. Large-scale Assessments in Education (2022) 10:5 that are the subject of educational research. This includes not only variables at the level of individuals but also variables that describe characteristics of entire education systems as in the exemplary application with PISA data. In addition, we outlined other advantages of CTMs over DTMs associated with the time-interval dependency of DTMs, such as limitations in cross-study comparisons, limitations in the procedures of data collection, and the lack of relevant information regarding the dynamic processes. Based on this, we developed the CT-LCM-SR, a CTM variant that is suitable for a range of typical longitudinal research questions. The CT-LCM-SR takes into account both the dynamic process as well as trends in the data. Trends are often of substantive interest in the education sciences and are also relevant for dynamic models because unaccounted trends can lead to biases in parameter estimates (Asparouhov et al., 2018;Núñez-Regueiro et al., 2021;Walls & Schäfer, 2006). The process' dynamics can provide information about the stability or variability of constructs, their self-related structure, and about reciprocal effects of multiple variables in multivariate models. In addition to the advantage that (1) the CT-LCM-SR provides information on both trends and dynamics, the use of the CT-LCM has further advantages. (2) Trends and dynamic processes are clearly separated in the CT-LCM-SR, meaning that trends and dynamics cannot influence or "compete" with each other. (3) The interpretation of the parameters follows the interpretation of CTMs and GCMs. (4) All parameters of the CT-LCM-SR describe the process under study independently of the length of time-intervals in a given study (Delsing & Oud, 2008). Finally, (5) both the standard linear GCM and the CT-AR(1) model are nested within the CT-LCM-SR, and thus, these models can be tested against the CT-LCM-SR using chi-squared difference testing. In addition, with the CT-LCM-SR, we can take advantage of all the other benefits associated with CTMs.
In the second part, we demonstrated the application of the CT-LCM-SR with the R package ctsem , 2021. The analysis was guided by two exemplary research questions. First, we wanted to examine whether there is a significant trend in the mean PISA reading literacy scores of countries over the period from 2000 to 2018, and whether countries differ in the slope and direction of trends. Second, we aimed to study the stability of education systems when deviations from the trend occur. Examining the persistence of deviations is relevant to study the system's resilience against disturbances caused by different influences.
Regarding the first research question, we found an overall increase in PISA countries' mean reading literacy scores. However, countries differed considerably in the slope and the direction of trends. Moreover, countries also differed with respect to the initial levels at the first measurement occasion in 2000. We found that countries with lower scores at the first measurement occasion tended to achieve larger gains in mean reading literacy scores on the follow-up surveys than countries with higher initial scores.
Regarding the second research question, we found a significant continuous-time dynamic process of fluctuations around the trends. From this dynamic process it can be inferred how long temporary deviations from the trends are associated with later states of the variable(s) under study and how much of the deviation is carried over to future Page 25 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 states depending on the time that has passed. Using the CT auto-effect, we derived DT autoregressive effects for various time-intervals and found that carry-over effects (represented by DT autoregressions) for five-year or longer time-intervals were not significantly different from zero. This means that both positive and negative influences on countries' mean reading literacy scores that led to temporary deviations from the trends disappeared on average after about five years.

Limitations and future directions
By introducing the CT-LCM-SR, the current article suggests and illustrates an advanced method with several benefits and potentials for longitudinal studies in the education sciences. However, the presented work is not without limitations. First, it is necessary to consider that the application presented is a simplified example chosen in order to provide an easy introduction to the potential benefits, practical modeling, and interpretation of CTMs. Therefore, the substantive value of the results is limited. Only a single variable was analyzed, the mean PISA reading scores of countries. For some applications, such as predictions of future states, univariate models can already provide useful information (e.g., Bulteel et al., 2018). Of course, however, research questions are often more complex. In the present application, for example, it is only possible to speculate on why temporary deviations from trends occur, as we are studying shocks of unknown cause. In practical applications, it might be of interest to examine the impact of a specific intervention (or crises). This requires the inclusion of an input that occurs at specific points in time in the model (see Driver & Voelkle, 2018b for the implementation of such models in ctsem). Such models make it possible to examine, for instance, when the input reaches the maximum effect on the process under study and after what period of time the effect of an input vanishes.
Moreover, the dynamic interplay between two or more parallel processes might be of researchers' interest (see, e.g., Jindra et al., under review, to be published in the same special issue applying a bivariate CT-LCM-SR). Such models allow, for instance, to explore the time interval in which one process has a maximum effect on the other and vice versa (e.g., Hecht & Zitzmann, 2021b). A typical question for educational research might be after how much time a change in a student's achievement does develop its maximum effect on his or her self-concept. R code for studying the reciprocal effects of two variables over time in a bivariate CT-LCM-SR is provided in the Additional file 1.
In addition, the use of covariates plays a crucial role in practical research, including the study of trends in countries' mean PISA scores, for which so-called adjusted trends have been reported meanwhile (OECD, 2014). Such more complex CTMs with predictors and covariates can be implemented with ctsem (e.g., Driver & Voelkle, 2018b).
It is also important to keep in mind, that development at the country-level was examined, not at the student level. While short-term interventions might not change the performance of the entire education system permanently, they might have effects on the students' cohort. However, statements concerning developmental trajectories of students require repeated measurements of the same students. A first application of a bivariate CT-LCM-SR on the student-level is provided by Jindra et al. (under review) to be published in the same special issue. Another possibility in ctsem is specifying measurement models (e.g., Hecht et al., 2019). In the presented application with PISA data, a typical three-step procedure was chosen to address measurement error (cf. OECD, 2009). We first calculated aggregate country means for each of the five plausible values and for all seven available PISA waves and created five data sets (so that all plausible values could be considered). These data sets were then used to conduct all analyses separately. Finally, the obtained results were pooled across the five separate analyses (Rubin, 2004).
The CTMs presented must also be considered with some limitations and model assumptions in mind. In the present article, we have restricted ourselves to univariate stable dynamic processes, also called mean-reverting processes . That implies that the corresponding DT autoregressive effects are between 0 and 1 for any time-interval. In principle, however, CTMs for explosive processes, meaning DT autoregressions greater than 1 (e.g., Driver & Voelkle, 2021), or processes with negative DT autoregressions, may be of interest as well (Fisher, 2001). In this article, we furthermore restricted ourselves to the representation of first-order CTMs, which should be suitable for representing many processes studied in educational research (Sivo & Fan, 2008). However, modeling higher-order models is also possible and might be useful in certain applications (e.g., Lüdtke & Robitzsch, 2021;Oud, 2010;Oud et al., 2018). Furthermore, in the models presented, the dynamic effects were assumed to be invariant over the period studied. This means, for example, that the CT auto-effect does not change and the DT autoregressive effects change only as a function of the length of the time-interval but not with time itself. However, more complex models with time-varying dynamic effects are also possible (e.g., Driver & Voelkle, 2021;Oud & Jansen, 2000). In addition, in the application presented, the dynamic effects were assumed to be the same across all observation units. Random effects were only included in the trend components but not in the dynamic process. This is a limitation that can be overcome within ctsem because fully hierarchical models can be specified in which all parameters vary across units (Driver & Voelkle, 2018a). Future simulation studies are needed to further investigate the estimation performance of CTMs in general and the CT-LCM-SR in particular with respect to different influencing factors, such as sample size and length of individual time series (for first sample-size recommendations and a discussion of the compensating effects of sample size N and length of time series T in CTMs, see Hecht & Zitzmann, 2021b).
In the CT-LCM-SR, we proposed to consider linear trends. The use of linear trends is not without criticism and can usually only be justified for the duration of specific time periods, as they always tend toward infinity in the long run (e.g., Oud, 2010). However, linear trends have a clear and well-known interpretation, which can be an advantage. Nevertheless, other types of trends might be considered and in future research, continuous-time models that incorporate other than linear trends could be developed or further refined (e.g., Oud, 2010;Voelkle & Oud, 2015). For example, in the PISA example, we also tested a GCM with nonlinear (quadratic) components, which did however not reach Page 27 of 32 Lohmann et al. Large-scale Assessments in Education (2022) 10:5 significance. We provide example code for setting up GCMs and the CT-LCM-SR with quadratic trend components in ctsem (similar code can be found on the GitHub page of ctsem; Driver, 2022).
In the CT-LCM-SR as applied in the present article, it remains uncertain how a given shock affects the trend. Models, which allow a shock to influence the direction and shape of the trend, are also possible (Driver & Voelkle, 2018b).
In educational research, specific data situations may occur which demand appropriate analysis strategies like resampling and missing data imputation techniques (e.g., Weirich et al., 2021). Future research should generate guidance on how to address such educational research specific challenges when applying CTMs. Furthermore, estimation of CTMs might become very time-consuming, especially in educational large-scale assessment contexts. One promising optimization approach to reduce run times for continuous-time models has been proposed by  based on the work of .
Finally, although CTMs provide information about the development of dynamic effects, interpolations to time-intervals that have not been investigated should be considered with caution. While it may be useful to explore intervals of maximum effects based on CTMs (Hecht & Zitzmann, 2021a), it is not advisable to draw conclusions about change processes that occur at much smaller intervals, for example days or hours, based on annual surveys, for instance. Thus, even when using CTMs, the underlying theory of change should still guide decisions about the design of longitudinal studies. However, equal time-intervals between subjects and measurement occasions are not necessary when using CTMs, which allows for much greater flexibility in data collection than when using DTMs (see Voelkle & Oud, 2013, for an argument why it can even be beneficial to choose time-intervals of different length between individuals and measurement occasions).

Conclusion
In the present article, we presented continuous-time modeling as a suitable approach to align statistical models for longitudinal data analysis with theories of change, especially to represent the continuous nature of variables and observation units. It is essential to align theories and statistical models to obtain valid answers to the research questions. In addition, the use of CTMs comes with other considerable advantages which were mentioned. Based on this, we proposed the CT-LCM-SR, which is a CTM variant and a suitable model to address many research questions in the education sciences. The CT-LCM-SR yields parameters, which are independent of specific time-intervals, thus represents developmental processes on a continuous-time scale. We illustrated the implementation and interpretation of the CT-LCM-SR with PISA data and, thus, gave the reader an introduction to the use of CTMs and the advantages and functionality of the newly proposed CT-LCM-SR. Lohmann et al. Large-scale Assessments in Education (2022) 10:5 Table 5 Parameter Estimates of the CT-AR(1) model, the Standard GCM and the CT-LCM-SR The parameter estimates of the CT-LCM-SR were already represented in Table 3 and are repeated here to facilitate comparisons a The labels refer to the parameter labels used in Eq. 10 and 11 as well as in the presentation of the R code (p. 23) and Fig. 2 b This column indicates, which parameter belongs to which process component