Is inequitable teacher sorting on the rise? Cross‑national evidence from 20 years of TIMSS

the gap between and Abstract Unequal access to qualified teachers for children of different socioeconomic status— also known as inequitable teacher sorting—has been increasingly put forth as one potential factor contributing to the socioeconomic achievement gap. Despite this, few studies have investigated cross-national differences in teacher sorting, and none have examined it within-countries over time. International large-scale assessments in education are uniquely positioned to answer such questions due to their longitudinal nature at the system level. This study uses six waves of data from the Trends in International Mathematics and Science Study (TIMSS) from 1999 to 2019 for 32 education systems. We compare differences in grade 8 mathematics teacher qualifications for each country at each time point, across top and bottom groups on the student socioeconomic spectrum. Results show that on the whole many countries display negligible gaps in access to teacher quality, with some key exceptions. With respect to inequity in novice teacher sorting, the problem is most prevalent in low- and middle- income education systems (i.e. in Turkey, Morocco, Tunisia and Indonesia). Inequity in sorting based on mathematics education is less common, with no clear pattern in regards to level of economic development (i.e. in Chile, Australia, New Zealand, and Chinese Taipei). Socio-economic inequality in teacher sorting has also remained broadly stable over time. Based on experience and mathematics education, less than a handful of systems show systematic upward trends in teacher sorting inequity (i.e. in Chile, Morocco, Singapore, and New Zealand). Given the increasing focus on inequity in access to teacher competence, these results have economic and policy implications for tackling the socioeconomic achievement gap. examine whether the phenomenon of teacher sorting is global, and find evidence of either within- or across-school teacher sorting in several of 32 participating systems.


Inequitable teacher sorting: evidence from past research
The phenomenon of teacher sorting has generally been attributed to teachers selfselecting into more preferable working conditions coupled with labor market conditions (Engel et al., 2014). Accounting for sorting inequity in education systems where the allocation of teachers is more centralized is less straightforward, but may be due to increasing socioeconomic segregation, accountability measures, or ability grouping within schools (Han, 2013;Luschei & Jeong, 2019). In most cases, however, the first part the problem is attracting the best teachers to schools or classrooms with a high proportion of disadvantaged students. This type of sorting has generally been measured at the national level, by examining differences in average qualification levels of teachers between high-and low-SES students, schools or classrooms. In the US, Goldhaber et al. (2015) find evidence of inequities in teacher qualifications across schools in Washington for all input measures. This has been confirmed by studies in New York, North Carolina and California, and across racial disparities as well (Clotfelter et al., 2005(Clotfelter et al., , 2007Goldhaber, 2018;Lankford et al., 2002). In England, Sims and Allen (2018) find evidence of inequities for teacher experience, certification, and subject matter specialization. This may in part be due to teacher motivation (novice teachers tend to be more ideologically Page 3 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 motivated to 'make a difference'), coupled with the fact that there are generally more job opportunities for novice teachers in disadvantaged schools (Sims & Allen, 2018). National studies from Turkey and Chile also show evidence of sorting (Meckes & Bascope, 2015;Özoğlu, 2015). The next part of the problem relates to keeping teachers in disadvantaged settings. Here, the role of salary incentives has mixed evidence (Clotfelter et al., 2008;Sims & Allen, 2018). Teachers may prioritize working conditions over salary (Bacolod, 2007), but this is difficult to generalize, as there is variation in teacher salaries and working conditions worldwide. Teachers that are more competent may exit the profession or certain schools due to increased accountability practices (Feng et al., 2010). We must also consider the ongoing debate over whether teacher sorting occurs more frequently between-or within-schools, but so far the evidence is mixed (Hanushek & Rivkin, 2012;Kalogrides et al., 2013;Luschei & Jeong, 2019). This phenomenon cannot be investigated using the TIMSS dataset, as there is frequently just one classroom sampled per school.
Up until now we have discussed nationally-focused studies investigating inequitable teacher sorting cross-sectionally. A handful of national studies from the US have examined teacher quality gaps over time and show either stable or decreasing trends (Adamson & Darling-Hammond, 2011;Boyd et al., 2008;DeAngelis et al., 2010;Goldhaber et al., 2018). There are also few studies that investigate sorting from an international perspective, though none take this perspective over time. Akiba et al. (2007) report differences between high-and low-SES students, particularly for students in Syria, Chile, Taiwan, the United States and Hong Kong using TIMSS data from 2003. Using the Programme for International Student Achievement (PISA) data, Han (2018) finds that school SES is associated with teacher shortages as well as the proportion of certified teachers in a school. Using data from the Teaching and Learning International Survey (2013), Luschei and Jeong (2019) examine whether the phenomenon of teacher sorting is global, and find evidence of either within-or across-school teacher sorting in several of 32 participating systems.

Teacher quality
Underpinning our discussion of the teacher qualification gap is the ongoing debate regarding the measurement of teacher quality and competence. Teachers are consistently found to be one of the most important school-level inputs for achievement, particularly for disadvantaged students (Darling-Hammond, 2000;Goe, 2007;Nilsen & Gustafsson, 2016;Wayne & Youngs, 2003). However, despite decades of research, there remains debate as to which teacher characteristics matter most, especially from a cross-national perspective (Blömeke & Delaney, 2012;Blömeke & Olsen, 2019). There is limited evidence supporting teacher personality traits and characteristics such as self-efficacy and preparedness for test scores (Goe, 2007;Nilsen & Gustafsson, 2016). However, these teacher characteristics may be important for behavioural outcomes associated with long-term success. Chetty et al. (2014) find that being assigned to higher value-added teachers have long term impacts for students' earnings and higher education graduation rates. Using national data from the US, Jackson (2018) finds that the effects of teacher characteristics are ten times larger for behavioural outcomes such as intention to attend college than on test scores. Other scholars, however, argue that observable teacher Page 4 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 characteristics have failed to account for significant variation in student achievement and should be rejected altogether in favor of value-added models (Hanushek & Rivkin, 2012). Value-added models are not a panacea, however, as evidenced by a recent paper by Bitler et al. (2014). Perhaps more importantly, they are limited by the differential assignment of teachers to students based on ability and in many cases cannot identify which aspects of a teacher are most important. International large-scale assessments do not allow researchers to create value-added measures of teacher effectiveness. Additionally, putting value-added models aside, amenable teacher characteristics such as selfefficacy and preparedness (Goe, 2007) are likely to be influenced by teaching contexts, and therefore present an endogeneity problem when attempting to determine whether students in disadvantaged contexts receive less qualified teachers because of their socioeconomic background.
Due to the number of empirical issues raised above, we move our focus towards teacher qualifications. While certain teacher qualifications have been shown to be the most stable and consistent predictors of student achievement, they are also the subject of debate. The first and most frequently cited qualification is teacher experience, which has been shown to matter predominantly in the first three to five years, while recent work has shown its potential importance much longer into a teacher's career (Goe, 2007;Papay & Kraft, 2015;Podolsky et al., 2019;Rice, 2003;Rockoff, 2004). We therefore construct the category of 'novice' teachers, which includes all teachers in their first five years on the job. We use five years to ensure maximum statistical power and minimize sampling error in order to increase the representativeness of teachers in TIMSS dataset (in contrast to using one, two, or three years of experience for novice designation, for instance). The next important qualification includes teacher mathematics education. Several studies have confirmed the importance of teacher a teacher's knowledge in mathematics and science (Baumert et al., 2010;Goe, 2007;Sancassani, 2021;Hanushek et al., 2014). Such qualifications may also predict teaching practices such as cognitive activation and classroom management (Nilsen et al., 2018). There is a theoretical distinction (and plausible differences in teacher effects) to be made regarding some teachers having pedagogical training in mathematics and those who have a post-secondary degree in mathematics with no teacher training. These are referred to pedagogical content knowledge (PCK) and content knowledge (CK), respectively (Shulman, 1986). We create the category of 'out-of-subject' teacher for the primary reason that it is the most validly comparable category across education systems due to differences in requirements and curricula for teaching degrees, thus limiting our focus to teachers with no mathematics education (in other words, without CK or PCK). While there is limited evidence for the importance of teacher certification and level of education (Goe, 2007;Goldhaber et al., 2000) we exclude the former due to its absence from the TIMSS dataset, and the latter as the variation in teachers with and without a bachelor (or masters) degree would be little in a vast majority of education systems. Last, there is generally a consensus that teacher competence matters most for mathematics and science (Goe, 2007). This is due to the fact that while students may practice reading or language skills at home, learning in mathematics and science is generally confined to the classroom. For this reason, we focus on mathematics teachers only. The vast majority of studies related to teacher effectiveness focus on upper middle or secondary school, but only a handful of studies focus on differential importance of teacher competence at various grade levels (Goe, 2007;Nilsen & Gustafsson, 2016). Teacher qualifications in particular seem to matter most in the later grades (Goe, 2007).
Against this background, we focus our efforts on two observable teacher characteristics which are not amenable to the teaching context, and assume that teachers in their first five years on the job ('novice teachers') or without training in mathematics or mathematics education ('out-of-subject') will have lower competence levels. We use these categories to answer the following research questions: 1. In which education systems do a significantly higher share of socioeconomically disadvantaged students have teachers with lower qualification levels based their most recent TIMSS cycle? 2. How has this phenomenon of 'inequitable teacher sorting' changed over time across and within education systems?

Methods
The data for this study come from six waves of the Trends in International Mathematics and Science Study (TIMSS) carried out by the International Association for the Evaluation of Educational Achievement (IEA) from 1999, 2003, 2007, 2011, 2015 and 2019. 1 We take data from a total of 32 countries participating in at least three of these six time points, amounting to a total sample size of 904,309 students and 36,446 mathematics teachers for grade 8. TIMSS employs a two-stage stratified sampling design, which samples schools according to previously determined strata proportional to their size as well as whole classrooms within the schools to cover a range of nationally representative educational contexts. 2

Experience
Teacher experience was measured by the number of years of a teacher has been teaching altogether. As mentioned, we create the category of 'novice' teachers (those in their first 5 years after qualification). 3 Here, we emphasize the evidence of a 'learning by doing' effect, in which teacher effectiveness improves substantially the first few years on the job. Based on past research, we argue that when taught by novice teachers, students are likely to retain and accrue less knowledge (Podolsky et al., 2019;Rice, 2003;Rivkin et al., 2005;Rockoff, 2004).

Mathematics education
The category 'out-of-subject' teachers was determined by which subject teachers reported to study during their post-secondary education. There are vast differences in the requirements to become a math teacher across countries. Teachers citing 'mathematics' or 'mathematics education' or both thus may have some level of formal training in mathematics (see Shulman, 1986;Blömeke & Olsen, 2019) depending on the requirements for teaching across educational systems. We therefore designate a teacher as 'outof-subject' if they have none of these. Several studies have mentioned the importance of CK and PCK particularly for mathematics (Baumert et al., 2010;Goe, 2007;Shulman, 1986;Wayne & Youngs, 2003). Some of the strongest evidence comes from Baumert et al (2010) for mathematics, and Sancassani (2021) for science, who employs a within student approach and finds a significant causal effect for teachers with CK on student achievement. While some studies propose a composite measure of teacher quality, our analysis did not provide evidence in support of such a measure, as factors loaded negatively onto the construct in some education systems and positively in others. In our view, these measurement properties limit the usefulness of a composite approach, particularly with regard to their relevance for policymakers.

Number of books in the home
The number of books in the home is an ordered categorical variable from (1) 0-10, (2) 11-25, (3) 26-100, (4) 101-200, (5) more than 200 books. It is often described as a proxy for 'cultural capital' and generally one of the strongest predictors of student achievement (Hanushek & Woessman, 2011). While this variable is widely used in large-scale assessment research, some have questioned its validity. Some of the strongest criticisms come from Engzell (2019), who alert researchers to a number of endogeneity issues; namely, that children who are more studious tend to accrue more books, and that low achieving students are more likely to misreport lower numbers of books. There is also a lower level of agreement between parent and child reporting on this indicator in particular for students tested at ten years old (though at grade 8 level this risk is much lower) (Jerrim & Micklewright, 2014). Nevertheless, as cultural capital is an important part of our construction of socioeconomic status, we include it within our pool of SES indicators, with these cautions in mind.

Parental education
Parental education is an ordered categorical variable, with (1) some primary or lower secondary, (2) lower secondary, (3) upper secondary, (4) post-secondary, non-tertiary, (5) short-cycle tertiary, (6) bachelor's or equivalent, (7) postgraduate degree. After the year 2011, another category was added to indicate differences between postgraduate and doctoral degrees. We have made them into one category for all cycles in the study, reducing the categories to seven. Parental education generally has higher child-parent agreement than books in the home, but this too varies across countries (Jerrim & Micklewright, 2014). While the International Standard Classification of Education (ISCED) is a widespread measure of SES for economists and sociologists, it is also not without complications related to cross-national comparability. For instance, the meaning of having a parent with a bachelor's degree in a Western OECD country may differ substantially from the meaning of having a parent with such a degree in a country with a lower level of economic development. While estimating SES scores within countries and years may mitigate these issues, they can never be entirely removed.

Socioeconomic status composite score
We create a composite socioeconomic measure (factor score) for each country-year comprised of student reported number of books in the home, student reported mother's level of education and student reported father's level of education in a confirmatory factor analysis. Measuring socioeconomic status through a composite score is a widely used approach to measuring socioeconomic status construct, but there is some debate as to the validity of combining the indicators. Ideally, such a composite SES indicator should reflect a family's general level of education, income and occupation. TIMSS does not include information about parental occupation or income across all cycles, but includes a 'Home Educational Resources' (HER) scale comprised of home possessions, parental education, and the number of books in the home, which is available in later cycles but not all of them. 4 We do not use home possessions in our scale as they varied over each of the cycles and there is considerable debate about the comparability of this measure across countries and time (Pokropek et al., 2017). 5

Top/bottom SES percentiles
Following Chmielewski (2019), we employ the percentile method for each country-year. This allows us to 'compare students at the top and bottom relative position within a socioeconomic distribution, even as the absolute meanings of these positions change' (Chmielewski, 2019, p. 525). We take students who score below the 33rd and above the 66th percentile of the SES scale to represent those in low-and high-socioeconomic status families. We use thirds instead of quartiles so as to ensure maximum statistical power and minimum sampling error for each country-year estimate. Over and above examining gaps from the most recent TIMSS cycle, we also examine pooled teacher qualification gaps at 33/66 and 10/90 percentiles.

Analysis
In order for TIMSS data to be country representative, inferences must be made at the student level (Rutkowski et al., 2010). TIMSS is also one of the few international largescale assessments in which teachers can be directly linked to students. We use multivariate imputation by chained equation using the 'mice' package (van Buuren & Groothuis-Oudshoorn, 2011) in RStudio to account for missing data in the student socioeconomic variables. We include mathematics/science achievement (plausible values) and student possessions as auxiliary variables in the imputation model, and create five imputed datasets for each country-year, resulting in a total of 5 imputed datasets × 189 country-years = 945 imputed datasets. 6 To determine the final teacher estimates from each imputation, we average estimates from Eqs. (1) and (2) below across the five imputed datasets and pool the standard errors following Rubin (1987) and Gonzalez (2014).
In order to estimate the socioeconomic status scores for each student and for each imputed dataset we conduct a factor analysis with polychoric correlations to estimate factor loadings and extract the single factor scores. This has been done using the function 'fa' in the RStudio 'psych' package. Next, we group students into top and bottom thirds and deciles within each country on this scale to determine their relative SES position, applying the student weights.
To determine the teacher qualification gaps for each imputed dataset, we estimate the proportion of students in high-and low-socioeconomic contexts with novice and outof-subject teachers in each year and each country, applying the mathematics teacher weights to account for students with more than one mathematics teacher. The formula is as follows: where: TQ D/AD : the proportion of novice or out-of-subject teachers for socioeconomically disadvantaged (D) or advantaged (AD) students, l = country, t = year.
Both teacher quality estimates TQ (D) and TQ (AD) equate to the percentage of students with novice and out-of-subject teachers in country l at year t and is an estimate between 0 and 1. We do this for each year, SES group (33/66 and 90/10) and for each teacher qualification indicator, clustering the standard errors by school using the 'ClubSandwich' (Pustejovsky, 2020) package with the heteroscedasticity-consistent 'CR0' variance estimator in Rstudio. 7 To determine the teacher qualification gap, we calculate TQ D -TQ AD . 8 For 33/66 gaps, we include point estimates for each country-year in Additional file 1: Appendix Table C1 and C2.
To examine linear trends in teacher sorting over time, we again use the 'ClubSandwich' (Pustejovsky, 2020) package in RStudio to estimate the standard errors with the same variance estimator 'CR0' . We predict positive or negative changes in teacher sorting with the following logistic model: Page 9 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 where l is the log odds that a teacher is new/out-of-subject for student i in school g, x 1 is the low-SES term, t is time, 9 and x 1 t their interaction term indicating change over time. β 3 is therefore our coefficient of interest. As the results are intended to be descriptive and examine the association between teacher qualifications and student socioeconomic contexts, we do not use additional controls in the model. Furthermore, we restrict the analyses to linear trend lines. Due to the risk of sampling error and the relatively small number of time points, it is not possible to determine with accuracy non-linear trends. 10 Here, we follow the same imputation approach as outlined above. Estimates are reported as odds ratios (the exponents of the unstandardized beta coefficients). Here, an odds ratio of 1.05 denotes that a one unit increase (time in years between TIMSS cycles) is associated with a 5% increase in the odds that a low SES student will receive a teacher with lower qualifications.

Estimating the magnitude of inequitable teacher sorting across countries
For descriptive statistics and missing data proportions, see Tables A and B in the Additional file 1: Appendix. In order to investigate countries in terms of the extent of inequitable teacher sorting over the past two decades (i.e. an average across the past 6 TIMSS cycles), we use data from each wave of TIMSS to determine a pooled estimate for each country. We do this for both teacher quality indicators and student socioeconomic groups. 11 Please refer to Figures A1 and A2 in the Additional file 1: Appendix for a visual depiction of the pooled gaps. We now focus our attention on the extent of inequity in teacher sorting in the most recent TIMSS cycles of participating education systems. Figure 1 displays estimates for students in the 33rd and 66th socioeconomic tertiles in terms of their exposure rates to novice and out-of-subject teachers from the most recent participating TIMSS cycle. For countries showing positive gap magnitudes, a greater share of low-SES students have teachers with lower qualifications than their high-SES counterparts. For countries showing negative gap magnitudes, a greater share of high-SES students are exposed to teachers with lower qualification levels. We report confidence intervals for these estimates for both groups based on the standard errors adjusted for school clustering. Gap magnitudes-the proportion of socioeconomically more affluent students with under-qualified teachers subtracted from the proportion of socioeconomically disadvantaged students-are depicted by the grey bars.
Before discussing the details of these plots, it is important to highlight an immediate conclusion from these figures. For both sorting by experience and mathematics education, the teacher qualification gaps do not reach statistical significance in a majority of countries. While novice teacher sorting is clearly more prevalent, less than half of the participating education systems display gap magnitudes over 5 percentage points. By and large, this indicates that many countries are successful in their allocation of teachers by experience and subject matter education in mathematics, and that lower-SES students face a much more modest disadvantage with regards to their access to qualified teachers than the literature suggests. Referring to the pooled gaps in Figure A1 of the Additional file 1: Appendix, a similar pattern of countries is observed for sorting across both teacher qualification dimensions (i.e. Turkey, Tunisia, Morocco, Iran, Thailand for novice teacher sorting and Chile, New Zealand, Australia and Chinese Taipei for out-of-subject sorting). These figures give a better idea of the extent to which education systems have struggled with inequity in sorting over the past two decades. Figure A2 (in the Additional file 1: Appendix) shows the pooled qualification gaps between students at more distant ends on the socioeconomic spectrum. The exposure rates of these students show statistically significant inequities in many more education systems for novice teacher sorting (i.e. Singapore, Israel, USA, and Romania), but not for sorting by mathematics education.
To reiterate, Fig. 1 (panel 1) shows that while novice qualification gaps are positive in many countries, they are in most cases small and not statistically significant. This is in part due to the TIMSS sampling procedure and the very large standard errors produced Page 11 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 when adjusted for school-level clustering. Nevertheless, the median novice qualification gap across the countries is around 3 percentage points, meaning that just three percent more low-SES students have newly qualified teachers in comparison to the higher-SES group. 12 There is also considerable heterogeneity in terms of overall teacher qualification levels across the education systems. In Morocco, more than half of the students in the low-SES group are taught by novice teachers, compared to just five percent of students in Romania or Hungary. There are however some education systems for which novice teacher sorting is dramatic. Turkey, Tunisia, Morocco and Indonesia show very large gap magnitudes between 15 and 27 percentage points. Given the average returns of the first 5 years to teaching experience from past research (Rice, 2003;Rivkin et al., 2005;Rockoff, 2004), a 15 percentage point difference in exposure to novice teachers would yield a significant disadvantage in learning opportunities for lower-SES students. Sorting by mathematics education is much less common across educational systems. The median out-of-subject qualification gap (Fig. 1, panel 2) is around 1 percentage point, also demonstrating that many countries do not show inequity in their allocation of these teachers. In addition, many countries have almost no out-of-subject mathematics teachers in the workforce, such as Russia, Lithuania, Slovenia, Romania, Korea, and Hungary. In Ontario, however, over 2 thirds of students have a mathematics teacher without training in mathematics or mathematics education. Figure 1 (panel 2) demonstrates that just Australia and New Zealand reach conventional levels of statistical significance, with Chile and Chinese Taipei showing larger qualification gap magnitudes as well. Our results do not imply that teacher sorting is not a problem in education systems which do not reach statistical significance. As we have pointed out, this is in many cases largely due to the uncertainty introduced by the TIMSS sampling structure. A teacher qualification gap magnitude of between 5 and 10 percentage points (as in Ontario, Thailand, Hong Kong, Quebec, and the USA) is still nevertheless noteworthy and given the evidence on subject matter education, likely to systematically disadvantage lower-SES students (Sancassani, 2021).
It is also worth calling attention to the heterogeneity in gap magnitudes across the teacher qualification dimensions. Save for Chile, not a single education system depicts inequities across both experience and subject matter education. This is in line with previous findings (Luschei & Jeong, 2019), and emphasizes that the determinants of sorting differ.

Estimating within-country trends of teacher sorting (1999-2019)
Turning our focus to the next research question, we plot the trends for countries participating in TIMSS in at least three time points between 1999 and 2019. In Fig. 2, we zoom in to two particular education systems-Chile and the Republic of Korea-and display the evolution of sorting by mathematics education. Dashed grey regression lines represent SES deciles, solid grey regression lines represent bottom SES deciles, and blue and red regression lines represent top and bottom SES tertiles, respectively. The vertical distance between the lines represents the teacher opportunity gaps for each given year. This up-close comparison between the two education systems highlights several features which we wish to emphasize. First, education systems may have a consistently low Page 12 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 proportion of teachers with no mathematics education as in the case of South Korea. Since 1999, no more than around 5 percent of students (regardless of SES) have been allocated out-of-subject teachers. This demonstrates at once a high level of equity as well as a high overall qualification level of the South Korean teacher workforce. On the other hand, Chile shows greater variation across the years, with the overall exposure rates to out-of-subject teachers trending downwards since 1999. Despite this, Chile shows a consistently high (or even widening) gap magnitude across the TIMSS cycles. Next, trends for all education systems are displayed in graphical form in Figs. 3 and 4. While many education systems vary from cycle to cycle in terms of the overall exposure rates to novice teachers, most show little or no variation over time in terms of the teacher qualification gap magnitudes. Certain educational systems show large but relatively stable gap magnitudes, such as Turkey, Tunisia, and Indonesia. A clear pattern of increasing gap magnitudes is displayed in Morocco and Singapore. In some cases, more subtle increases in inequity exist. For example, although Sweden points to an overall decline in novice teachers, the country shows increasingly pronounced sorting until 2015 when the trend starts to reverse. However, the largest qualification gaps in Sweden are still quite small and unlikely to reach statistical significance (at about 4 percentage points). As Sweden is among the most socioeconomically equitable societies in the world, this trajectory is nevertheless noteworthy and in line with reports of increasing school segregation in the country (Yang Hansen and Gustafsson, 2016).
Similar to the novice teacher sorting trends, almost no changes are displayed when we consider out-of-subject sorting. Australia and Quebec display stable gap magnitudes, with Quebec showing an overall decline in the proportion of students with recently qualified teachers. Again, few education systems appear to show a very clear upward trend in inequity (Chile, New Zealand and Chinese Taipei).
Taken together, these plots show that most educational systems do not show increasing signs of inequity in teacher allocation, even at the more extreme ends of the socioeconomic spectrum (90/10 deciles). In addition to this positive picture, in most cases, there do not appear to be increases in the overall share of students with teachers with lower qualification levels, regardless of their socioeconomic background. The proportion Page 13 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 of students with newly qualified or out-of-subject teachers is not increasing across the countries included in our sample. In a next step, we estimate trends in the above-plotted teacher qualification gaps to determine whether countries are displaying significantly growing or narrowing inequities in teacher allocation. Table 1 displays odds ratios for logit models regressing the likelihood of having a novice or out-of-subject teacher on the interaction between the low-SES student group and time as compared to the higher-SES student reference group (between each TIMSS cycle).
Here, we find more positive news to substantiate the aforementioned figures. Only a handful of countries show statistically significantly increasing opportunity gaps. For sorting by experience, just Chile, Morocco, and Singapore reach statistical significance at the 5% level. 13 In fact, more education systems show significant downward trends Page 14 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 in novice teacher sorting inequity, including Hong Kong, Romania, Slovenia, Thailand, USA, and Quebec. While we do not test the 90/10 gaps, Figs. 3 and 4 confirm that in a vast majority of cases, they follow the exact same trend patterns as 66/33 gaps. Similarly, out-of-subject teacher sorting is stable in almost all systems, with just two countries displaying statistically significant upward trends (Chile and New Zealand). 14 Table 1 Estimated linear trends for novice and out-of-subject teacher sorting by education system  OR, odds ratio per 4 year TIMSS cycle; SE, cluster-robust standard error (clustered by school); N, total number of observations; T, number of time points included in analysis * p < .05, **p < .01, ***p < .001 Page 15 of 20 Glassow and Jerrim Large-scale Assessments in Education (2022) 10:6 However, in this case, just one educational system trends towards more equitable teacher allocation (Thailand).

Discussion
It is widely claimed that students with disadvantaged socioeconomic backgrounds have less competent and qualified teachers (OECD, 2018;Strietholt et al., 2019). Using relatively straightforward descriptive analyses, this paper addresses two important gaps in the international literature on teacher sorting. First, it provides up-to-date knowledge about the extent of teacher sorting cross-nationally using data where students can be linked to their teachers. Second, this paper provides the first cross-national evidence for changes in socioeconomic teacher sorting over time.
Though recent reports have raised the alarm about SES-based teacher sorting internationally (Luschei & Jeong, 2019;OECD, 2018), our analyses of 32 countries between 1999 and 2019 suggest a more positive picture. This is particularly true for out-of-subject teacher sorting. Just three to four countries show substantial differential exposure rates, including Chile, Thailand, Australia, and Chinese Taipei. Aside from these education systems, even at the most extreme ends of the socioeconomic spectrum, low-SES students were no more likely to receive out-of-subject mathematics teachers than those coming from high-SES backgrounds. This positive news extended to our findings in the trends over time, where just two systems showed an increase at conventional levels of statistical significance (Chile and New Zealand). These findings were somewhat surprising, as mathematics teacher shortages are discussed more and more in relation to increasing and better paid opportunities for the quantitatively inclined outside of teaching (TALIS, 2018). In fact, we find very few countries with overall increases in the share of students taught by out-of-subject teachers, which we would expect to find in countries with substantial or increasing shortages. Future research may investigate whether changes in teacher working conditions are compensating for such a potential change (for example, through larger class sizes). Importantly, Thailand is the only country showing decreases in such teacher sorting over time, indicating that where mathematics education inequities are found, they tend to persist. It is here worth re-emphasizing that outof-subject teacher sorting is a problem in a very small number of education systems, but where it does exist, (Chile, Australia, Chinese Taipei, Quebec) it has persisted over the past two decades.
The findings for students with novice teachers are slightly less positive, with roughly one third of the countries in our sample showing at least some degree of sorting inequity. Low-and middle-income countries tended to be most affected, including Tunisia, Turkey, Iran, Indonesia, and Morocco, but inequities in novice teacher sorting were found between top and bottom student SES deciles in Romania, USA, Australia, Israel and Singapore as well. However, there is positive news here too. Since 1999, inequity increased in just three systems (Chile, Morocco, Singapore), and decreased significantly in six (Hong Kong, Romania, Slovenia, Thailand, USA, Quebec). Many more education systems show differences over time in novice sorting as compared to sorting by mathematics education, highlighting once again that their respective determinants likely differ.

Heterogeneity in teacher qualification gaps and trends
There appears to be limited patterns across groups of education systems by economic development level or geographic region. In the case of teacher sorting based on mathematics education, it was not possible to identify a pattern based on level of economic development or region. As mentioned, the most evident pattern is the prevalence of novice teacher sorting across lower income education systems. The institutional and demographic differences across such systems make it difficult to speculate on such determinants, however. It may suggest that overall working conditions in schools and classrooms with more socioeconomically disadvantaged students are more preferable in higher-income countries, and therefore more experienced teachers stay or even seek out such positions. It may also be the result of system-level educational characteristics, such as hiring practices or school choice conditions (OECD, 2019). Han (2018) finds a link between school autonomy and inequity in teacher sorting, but no difference in this link for higher-or lower-income countries. However, given the intuitive connection between more autonomous hiring practices and inequity in teacher allocation, there is also the puzzling finding whereby some systems with centralized teacher allocation showed clear inequities. For example, such was the case in Turkey. According to Özoğlu (2015), teachers in Turkey receive high 'seniority scores' at a faster rate by being allocated to teach in disadvantaged regions, and subsequently have greater choice over where they want to transfer, leading to higher turnover rates and proportions of inexperienced teachers in such schools. South Korea also has a centralized teacher allocation system, and is frequently heralded for its teacher rotation policy and excellence in the teacher workforce (Han, 2018;Kang & Hong, 2008;Luschei, Chudgar and Rew, 2013). However, although the estimate does not reach conventional levels of statistical significance, South Korea shows a slight increase in novice teacher sorting since 1999. 15 Luschei and Jeong (2019) also find a higher-than-expected level of sorting inequity in South Korea, and offer rising socioeconomic inequality alongside accountability and ability grouping as potential explanations. We do not compare our findings in a more detailed way to those two previous studies on teacher sorting using large-scale international data for various reasons. 16 There is also the case of Chile, which displays high 'equity' in novice sorting until 2019 when the trend reverses. Teachers in more disadvantaged schools are less likely to leave their jobs and retire much later in Chile (Meckes & Bascope, 2012), which does not point to equity but rather a particular teacher mobility pattern in the Chilean educational system. In 2007, Akiba et al. reported that the USA had one of the highest teacher 'opportunity gaps' in the world. Our findings show that it has been decreasing since 1999 for novice teachers. This is in line with results from other research examining the impact of the No Child Left Behind (NCLB) act of 2001 (Boyd et al., 2008;DeAngelis et al., 2010). Under NCLB, teachers were newly required to be fully licensed, have a bachelor's degree and a demonstrated competency in their subjects taught, and states were mandated to eliminate the inequitable teacher distribution. While the novice teacher sorting trend may lend support to the NCLB policy, mathematics education sorting has remained generally stable or even increased. Overall, few educational systems display large teacher qualification gaps and rising sorting inequity. Clear exceptions to this pattern include Turkey, Tunisia, Morocco, Chile, Singapore, Australia, and New Zealand. There is also a question regarding the substantive significance of the teacher qualification gaps. Goldhaber et al (2018) raise the alarm about inequities when the proportion differs by just 1 to 5 percentage points. Many educational systems in our sample show much larger differences, despite not reaching statistical significance.

Limitations
As with all empirical research, this study has some limitations. First, TIMSS employs a stratified sampling procedure which results in greater uncertainty and larger standard errors. Although TIMSS and other international large-scale assessments provide unique opportunities for researchers to study cross-nationally focused questions, there is a high risk of sampling error which is compounded when examining population subsets. There are variations in sample sizes for each country and cycle, as well as differences in the degrees of missing data. Importantly, the data are not representative of teachers but rather students taught by mathematics teachers in the eighth grade. The data, therefore, cannot tell us anything about the national characteristics of teachers (such as mean teaching experience). While we have attempted to highlight and address as many issues related to the quality of the data as possible through the use of multiple imputation (at each country-year point) and conservative indicators of teacher quality to maximize statistical power and cross-national comparability, it is not possible to remove them entirely. Next, the SES measures in our study do not include parental occupation as it is not available from the data. Moreover, the SES-measures are reported by the students and will necessarily include some degree of error. The study also only addresses socioeconomic inequities, and not those based on student migration background, language spoken at home, geographic location, or ethnicity. Our findings also provide a picture of teacher sorting based on just two indicators of teacher quality. While we posit that a 'novice' or 'out-of-subject' teacher designation will have a significantly detrimental impact on student outcomes, there are of course other factors contributing to teacher competence which we do not address. For instance, Sweden has reported a gap in access to certified teachers (Hansson & Gustafsson, 2016), and while Sweden shows specialization relatively equitable pattern regarding the indicators in this study, there may be clear inequities in the distribution of teachers based on other measures. Last, the study does not speculate on the determinants of teacher sorting trends across these educational systems, but this would indeed be a worthy topic for future research.

Concluding remarks
Based on the indicators in our study, teacher sorting by student socioeconomic status is an issue in a select group of education systems, and is widening in only about a handful. Such education systems should take this issue seriously, and focus on closing