Skip to main content

An IERI – International Educational Research Institute Journal

School socio-economic context and student achievement in Ireland: an unconditional quantile regression analysis using PISA 2018 data

Abstract

Background

The existence of a multiplier, compositional or social context effect is debated extensively in the literature on school effectiveness and also relates to the wider issue of equity in educational outcomes. However, comparatively little attention has been given to whether or not the association between student achievement and school socio-economic composition may vary across the achievement distribution. Furthermore, with limited exception, comparatively little use has been made of unconditional quantile modelling approaches in the education literature.

Methods

This paper uses Irish data from the Programme for International Student Assessment 2018 and employs ordinary least squares regression and unconditional quantile regression empirical approaches to examine the association between school socio-economic composition and achievement. Reading and mathematics achievement are used as outcome variables and models control for a rich set of school and student characteristics.

Results

Findings from the ordinary least squares regression show that, on average, there is a significant negative relationship between school socio-economic disadvantage and student achievement in reading and mathematics having controlled from a range of individual and school-level variables. From a distributional perspective, unconditional quantile regression results show variation in the strength of the relationship between school socio-economic disadvantage and student achievement, particularly in reading, with a stronger association at the lower end of the achievement distribution. Findings illustrate the need to give nuanced consideration to how students with varying levels of achievement may experience a socio-economically disadvantaged context at school. Our findings also draw attention to the benefit of examining variation in the association between achievement and explanatory variables across the achievement distribution and underscore the importance of moving beyond an exclusive focus on the mean of the distribution. Finally, we emphasise the importance of drawing population-level inferences when using the unconditional quantile regression method.

Introduction

Improving opportunities for all students to succeed in education regardless of individual or home background characteristics is a key concern of policy makers worldwide (Organisation for Economic Co-operation and Development [OECD], 2010, 2020a). There is international variation in policy approaches intended to support equality of opportunity but a common aim is to limit the effects of student background on educational outcomes (OECD, 2021). Students from socio-economically disadvantaged backgrounds (henceforth disadvantaged students) are one of the key focus groups in this regard, given the enduring associations between student socio-economic status (SES) and achievement, attainment and other outcomes (see e.g., Blanden et al., 2022; Chmielewski, 2019; Coleman et al., 1966; Cullinan et al., 2021; Sirin, 2005; Woessmann, 2016). Drawing on a meta-analysis of articles published between 1990 and 2000, Sirin (2005) reports a medium level of association between SES and achievement at the student level and a large association at the school level.Footnote 1 Findings from the Programme for International Student Assessment (PISA) have highlighted the role that school SES may play over and above that of individual student background, with the school-level influence shown to “far outweigh” the relationship between individual SES and education outcomes in a majority of OECD countries (OECD, 2010, p. 14).

The issue of school average SES having an effectFootnote 2 over and above individual SES—variously termed a “school composition”, “school-mix”, “multiplier” or “contextual” effect—has been debated extensively in the literature on school effectiveness (see e.g., McCoy et al., 2014; Nash, 2003; Raudenbush & Bryk, 2002; Sciffer et al., 2020; Teddlie et al., 2000; Willms, 1992). Most previous studies in this area employ statistical methods (typically hierarchical linear modelling) that provide estimates of the average effects for the relationship between dependent variable, y and independent variable, x (McCoy et al., 2014; Raudenbush & Bryk, 2002). Using approaches that provide average effects do not take into account potential heterogeneity in the relationships of interest across the distribution of the outcome variable (Rios-Avila & Maroto, 2022). Specifically, these methods cannot examine the possibility that school SES may have different effects at different points of the performance distribution, whereby effects may be larger or smaller for higher or lower achieving students, or vice versa. As noted by Perry et al. (2022), a more nuanced understanding of the association between school SES and student achievement has important implications for policy makers and families, with the potential to inform policy on school segregation as well as school-choice decisions.

In this context, the present study makes a number of contributions. Firstly, it examines the extent to which there is evidence of a socio-economic compositional (SEC) effect in Irish post-primary schools, after taking into account a rich set of individual student characteristics as well as some classroom/school variables. Secondly, our findings highlight the value of approaches that allow consideration of the association across the distribution and show how focusing exclusively on the conditional mean may hide heterogeneity across high- and low-achieving students. Such distributional analyses using large-scale educational datasets are comparatively rare in the literature.

Findings from our analysis show that, on average, after controlling for a rich set of individual, parental and school characteristics, there is a significant negative relationship between school disadvantaged status and student achievement in reading and mathematics. From a distributional perspective, results show a differential SEC effect, particularly in reading, with a stronger effect at the lower end of the achievement distribution. This analysis, which is unique in the Irish context and rare in the international literature, has important implications in this policy space. The paper is structured as follows: firstly, we review the relevant literature; secondly, we present our data and methods; next, we outline the main empirical results; and finally, we conclude with some policy implications of our results.

Background and relevant literature

Compositional effects on student achievement

Since the prominent work of Coleman et al. (1966), the existence of a “multiplier”, “compositional”, “school mix” or “social context” effect is debated extensively in the literature on school effectiveness, with some commentators suggesting that these are only statistical artefacts or a result of methodological weaknesses (see e.g., Harker & Tymms, 2004; Marks, 2015; Nash, 2003). In contrast, others suggest that with the use of appropriate methods, compositional effects can be detected free from measurement error (Sciffer et al., 2020) and represent an important and substantial influence on educational outcomes (Benito et al., 2014; OECD, 2010; Van Ewijk & Sleegers, 2010).

Defining and distinguishing between similar terms related to compositional and peer effects is complex. Wilkinson (2003) distinguishes between compositional and “true” peer effects, noting that the terms are not synonymous. He defines compositional effects as “the effects of the aggregate characteristic of a student group (e.g., mean level of ability) on a student’s learning outcomes over and above the effects on learning associated with that student’s individual characteristics” (Wilkinson, 2003, p. 397). He recognises that compositional effects may arise from measurement artefacts in study design or differences in resources, climate, teacher practices or peer effects. Peer effects may be defined as “the influences of normative and comparative reference-group processes, student–student interactions, and certain dynamics of instruction on learning outcomes” (Wilkinson, 2003, p. 398). Of the compositional effects examined, including ethnicity (Fekjær & Birkelund, 2007) and academic ability (Opdenakker & Van Damme, 2001, 2007), SEC effects have perhaps received the largest share of attention (e.g., Sciffer et al., 2022; Teddlie et al., 2000; Van Ewijk & Sleegers, 2010) and are the focus of the current paper. Compositional effects have also been examined from a school effectiveness viewpoint, with Steinmann and Olsen (2022) finding that while schools with a more privileged student composition had higher achievement levels than less privileged schools, their school effectiveness did not usually differ significantly.

Findings from a meta-analysis by Van Ewijk and Sleegers (2010) show that the size of compositional effects tends to relate strongly to how SES is measured. Smaller effects are associated with dichotomous measures of SES (e.g., eligibility for free school meals) while larger effects are found when a composite measure is used that captures several SES-dimensions. Their study shows that effects tend to be overestimated when controls for prior achievement are not included and when the potential for omitted variable bias is not addressed. In contrast, the effect can be underestimated when a large set of poorly thought-out covariates is included.

Internationally, some studies have shown that the relationship between social context and achievement is mediated by school, teacher or classroom factors, such as teacher expectations, quality of instruction, or adequacy of school resources (Liu et al., 2015; Opdenakker & Van Damme, 2001, 2007; Rumberger & Palardy, 2005; Thrupp et al., 2003; Willms, 2010). Many of these relate to aspects of classroom or school climate, underscoring the need to consider school or classroom climate in any study of compositional effects.

In the Irish primary school context, the achievement gap between pupils attending disadvantaged schools and their peers in less disadvantaged schools has been shown to reflect differences between the two school contexts in teacher experience and turnover, the concentration of additional learning needs, absenteeism levels and children’s engagement in school (McCoy et al., 2014). Focusing on the reading and mathematics achievement of 9-year-olds, McCoy et al. examine how the effects of school disadvantaged status on achievement change having accounted for a range of school, teacher and pupil variables. Their findings show differences between pupils attending urban and rural schools with no SEC effect for pupils in rural disadvantaged schools once individual social background is taken into account. Only the most disadvantaged urban schools have a significant SEC effect for both reading and mathematics which the authors take as evidence of a “threshold” effect rather than a linear effect.

At post-primary level in Ireland, school-average SES has been shown to be associated with achievement, after controlling for a wide range of individual student, teacher and school variables. For example, Shiel et al. (2022) describe how a one-standard deviation in school-average SES is associated with a 31 point (about one-third of a national standard deviation) increase in PISA reading achievement, after controlling for student demographic and educational background variables, teacher instructional support, parental engagement and support, student literacy attitudes and practices, and student endorsement of reading literacy strategies. Earlier work drawing on achievement in state examinations also provides evidence of significant SEC effects in Ireland (Sofroniou et al., 2004; Weir & Kavanagh, 2018) although a limitation of the population datasets is the very small number of student-level variables available to include as controls.Footnote 3

The need for heterogeneous analysis of the school socio-economic composition effect

Two examples in the literature that examine heterogeneity in SEC effects using PISA data are supplied by Rangvid (2007) and Schneeweis and Winter-Ebmer (2007). Using PISA 2000 data for Denmark, Rangvid (2007) shows that for reading, school SEC effects are stronger for students in the lower quantiles and statistically insignificant at the very upper end of the distribution. In contrast, for mathematics the school SEC effect is similar for high- and low-achieving students (Rangvid, 2007).

Also using PISA data (from 2000 and 2003), Schneeweis and Winter-Ebmer (2007) consider how the association between student achievement in reading or mathematics and the average SES of the student’s peer group may vary by individual achievement or individual SES. Asymmetric peer SES effects on reading achievement were found in favour of lower-achieving students; i.e., lower achieving students appeared to be more affected by the SES profile of their peers than higher achieving students. Findings also show that in reading but not mathematics, a stronger peer group SES effect was found for students from a low SES background. Examination of interactions showed that peer SES effects on reading were highest for low and median achievers from a low SES background.

Recently, Perry et al. (2022) have used a Conditional Quantile Regression (CQR) approach with data from PISA 2018 in Australia to study compositional effects. Their findings show that the school SEC effect is substantial and similar for all students, regardless of their levels of achievement. School SES was found to be a stronger predictor of achievement than student SES. They also note that the school SES effect is larger for higher SES students, regardless of achievement level. The authors note the need for further examination of the extent to which their findings are generalizable to other national contexts. Costanzo and Desimoni (2017) also identify the need for further analysis of the differential effects of class- or school-level variables across the achievement distribution.

Cullinan et al. (2021) provide an example of the Unconditional Quantile Regression (UQR) approach using Irish data on student achievement in their terminal examinations of upper secondary education. While they did not directly capture school socio-economic mix, they included in their analysis a dummy variable for school disadvantaged status based upon school participation in the Delivering Equality of Opportunity in Schools (DEISFootnote 4) programme (discussed in more detail in the next section). They found that the “penalty” faced by students in disadvantaged schools was concentrated at the lower end of the performance distribution. However, given their generic outcome variable (total points achieved across Leaving Certificate examinations), they were unable to analyse this relationship across different subject domains. Also, given that their sample consisted only of those that made it to their final year of schooling, a more complete picture of compositional effects may be biased by student dropout at earlier stages of secondary education.

Context, data and methods

Study setting

Ireland is the setting for the current study—a country characterised by high levels of achievement in international educational assessments (especially in reading, see e.g., Eivers et al., 2017; McKeown et al., 2019) and recognised for having a strong focus on equity in education (European Commission, 2019; Hepworth et al., 2021). National policy related to educational disadvantage explicitly indicates that the existence of a “multiplier effect” provides a rationale for providing additional resources to schools with the highest concentrations of students from disadvantaged backgrounds (Department of Education and SkillsFootnote 5 [DES], 2017) although policy does not consider whether the impact of concentrated disadvantage might vary across different student groups. As explained later in this paper, the quality of PISA 2018 data in Ireland is high, with strong response rates at student- and school-level. For these reasons, it is of interest to draw on Irish data to illustrate the use of UQR.

Since 2005, the Delivering Equality of Opportunity in Schools (DEIS) programme is the main policy response to educational disadvantage in Ireland. It provides additional supports to schools with the highest concentrations of students from lower socio-economic backgrounds (Department of Education [DoE], 2022a; DES, 2017). All DEIS schools receive additional grant aid as well as various other supports, with some variation in supports between primary and post-primary levels. At post-primary level, supports include additional grant aid and funding, a more favourable staffing schedule, access to Home School Community Liaison Services, access to the Schools Meals Programme, access to supports under the School Completion Programme, and priority access to professional development supports. A new approach to identifying schools for DEIS was introduced in 2017 and finalised in 2022 (DoE, 2022a; DES, 2017).

The education system in Ireland comprises primary, post-primary, third level and further education, with 2 years of free pre-school provision for children prior to entry to primary school. Primary school comprises 2 years of pre-primary followed by Grades 1 to 6 and a child must have started formal education by the age of 6 years. Almost all second-level schools in Ireland are state-funded and belong to one of three broad types: voluntary secondary schools, schools (or community colleges) in the Education and Training Board sector, or community/comprehensive schools (DES, 2020). Each of these school types offers a similar education. Readers interested in further detail should see DES (2020).

Data

The data used in the current study are from Ireland, collected in the 2018 cycle of PISA, which examines students’ knowledge in science, reading and mathematics and what they can do with what they know (OECD, 2019a).Footnote 6 The assessment focuses on 15/16-year-old students and tests how well they apply their knowledge in everyday life situations. Furthermore, the dataset includes a wide range of information about individual characteristics and school contexts, gathered through student and school principal questionnaires. In 17 participating countries, including Ireland, parents also completed a specific questionnaire (OECD, 2019a).

The Irish data were gathered from an achieved sample of 5577 respondents in 157 different schools, with both the school-level (100%) and student-level (87%) response rates above the OECD requirements (McKeown et al., 2019).Footnote 7 Using only those cases with data available on variables of interest for the current study leaves an estimation sample of 4923 individuals, representing 88 percent of the Irish dataset.

As in other similar large-scale educational assessments (including the Progress in International Reading Literacy Study [PIRLS], Trends in International Mathematics and Science Study [TIMSS] and the Programme for the International Assessment of Adult Competencies [PIAAC]), PISA uses Item Response Theory to represent student achievement using plausible values. Then, using this model, in PISA 2018, a sample of ten values was extracted (in previous editions this number varied), which are ten “plausible”, probable values for that student in each of the three subject domains. These scores are provided in the international database, with a mean of about 500 and a standard deviation of about 100 across OECD countries, weighting each country equally.Footnote 8 Given the element of randomness in the questions faced by each student in the test, Item Response Theory is utilised to help account for students having answered different tests. This allows for the estimation of a student's knowledge function, and the subsequent sample generation of the plausible values. A full description of the technical procedures used in PISA is provided in OECD (2020b).

In the current paper, achievement in reading and mathematics are used as outcome variables. These domains are chosen given the emphasis placed internationally and in Ireland on achieving adequate levels of literacy and numeracy for all students (United Nations, 2018), and in particular those experiencing socio-economic disadvantage. Using both outcome variables allows comparisons to be made between findings for the two domains.

For our study, the socio-economic background of students in each school is of central importance. We proxy student SES by using the economic, social, and cultural status (ESCS) variable within PISA. This is an index variable constructed from students’ responses to questions regarding the highest level of education of their parent(s) converted into years of schooling; parental occupation as measured by the International Socio-economic Index of Occupational Status; and amount of home possessions, including educational possessions at home.Footnote 9 In PISA 2018, the three components are weighted equally. The scale has an international mean of 0 and standard deviation of 1, across equally weighted OECD countries (OECD, 2020b). In line with OECD (2020a), the current paper describes a student as socio-economically “disadvantaged”, if the student’s value on the ESCS index is in the bottom 25 percent nationally.

The international PISA dataset does not include an indicator of whether or not a school participates in the DEIS programme; i.e., whether or not a school has been identified in Ireland as having a high concentration of students from a disadvantaged background for the purposes of receiving additional resources from the DoE. Rather, for the purposes of the current analysis, an indicator of school disadvantage is constructed by aggregating from individual student ESCS. In constructing this indicator, we adopt a similar methodology to OECD (2020a) and assume that a school is disadvantaged if the average ESCS index among the students sampled within a school is in the bottom quartile of the national distribution on the index. Using this approach, 43 schools are identified as disadvantaged. This amount tallies well with that from Ireland’s PISA national study centre which identified 41 DEIS schools out of the 157 schools participating in PISA 2018 (Gilleece et al., 2020).

Various studies have shown that the relationship between social context and achievement is mediated by school, teacher or classroom factors, such as teacher expectations, quality of instruction, or adequacy of school resources (Liu et al., 2015; Opdenakker & Van Damme, 2001, 2007; Rumberger & Palardy, 2005; Willms, 2010). Furthermore, school climate as measured across four domains (academic, community, safety and institutional environment) has also been shown to have associations with student outcomes (Wang & Degol, 2016). The choice of variables selected for inclusion as controls in the current analysis was influenced by this framework and other relevant literature on factors mediating the association between school composition and achievement as well as by the availability of data, with variables selected in order to be theoretically relevant and to minimise the loss of cases from missing data. Table 1 outlines the full set of variables used in our analysis. Student background variables used as controls in the current analysis are: gender, student ESCS, school year/grade and native Irish status (i.e., the student and at least one parent born in Ireland).

Table 1 Variable descriptions

Student perceptions of the classroom learning environment are captured by a measure of teacher-directed instruction and the level of teacher support provided in English classes. While the classroom learning environment measures pertain to conditions in English classes, in the current analysis, variables are included in both models of reading and mathematics for consistency across models and on the assumption that there may be some degree of overlap within a school in the teaching and learning conditions across subjects. We consider students’ perceptions of teacher-directed instruction and teacher support to represent academic aspects of the school climate, using the Wang and Degol (2016) framework. Principals’ perceptions of staff shortages, teacher behaviour, and the proportion of teachers who have attended professional development in the three months prior to PISA administration are also included in our model, in the context of academic aspects of school climate.

Turning to the institutional component of school climate, our models include an indicator of whether the school is single sex or mixed sex, student-staff ratio, school size, percentage of students with Special Educational Needs (SEN), school sector, location, and principal perceptions of the extent to which instruction is hindered by the quality of teaching material. Safety is represented by the extent to which student behaviour impacts on instruction. The community component of the school climate model is represented by principal perceptions of parental involvement in local school governance.

To examine differences in these characteristics across school type, Table 2 presents summary statistics separately for students attending disadvantaged schools and those attending non-disadvantaged schools. We see that there are substantial differences in the characteristics of these two school types. Disadvantaged schools are more likely to have mixed-sex enrolment; i.e., the percentage of students attending disadvantaged schools that are single-sex (21.6%) is considerably lower than the corresponding percentage for non-disadvantaged schools (43.8%). Disadvantaged schools are more likely to have reported staff shortages, smaller enrolment size and lower levels of parental engagement. Table 2 also illustrates some wide variations in individual-level socio-economic background between the groups, with students attending disadvantaged schools characterised by a considerably lower ESCS score relative to their peers in non-disadvantaged schools.

Table 2 Sample descriptive statistics

Table 2 also presents the raw mean scores in PISA for reading and mathematics, separately for students attending disadvantaged and non-disadvantaged schools. Unsurprisingly, these show that students attending non-disadvantaged schools have higher scores on average in reading and mathematics than their counterparts in disadvantaged schools. Both in reading and mathematics, differences are statistically significant (p < 0.01) and the magnitude of the gaps are very similar to those previously reported between students attending DEIS and non-DEIS schools where a statistically significant gap of about 51 points was noted in reading and about 44 points in mathematics (Gilleece et al., 2020).

To initially explore heterogeneity in the reading and mathematics test scores, we also present kernel density functions of one of the plausible values of each subject (reading and mathematics) by school disadvantaged status. Figures 1 and 2 illustrates the distribution in performance for reading and mathematics respectively, with those attending non-disadvantaged schools more heavily concentrated towards the upper end of the performance distribution relative to those in disadvantaged schools.

Fig. 1
figure 1

Source: Analysis of PISA data for 2018

Distribution of reading scores (for plausible value 1), by school disadvantaged status.

Fig. 2
figure 2

Source: Analysis of PISA data for 2018

Distribution of mathematics scores (for plausible value 1), by school disadvantaged status.

Methods

In order to appropriately model the relationship between achievement in PISA and attendance at a disadvantaged school, it is necessary to acknowledge the impact of PISA’s two-stage sampling procedure and subsequent complex data structure (students nested within schools) which requires specific calculations to obtain reliable standard errors (Rutkowski et al., 2010). Jerrim et al. (2017a) outline some of the potential problems arising from the complex data structure and how to overcome them. Commonly, empirical approaches such as multi-level models have been used to account for the clustered nature of these data. However, as noted by Jerrim et al. (2017b) an alternative approach is to adjust for sampling issues such as stratification and clustering by using replicate weights within estimations. A key advantage of the latter approach in the context of this study is that it facilities an examination of heterogeneities across the distribution of achievement. Therefore, we conduct all estimations using the REPEST function within STATA as designed by Avvisati and Keslair (2014). REPEST uses the balanced repeated replicate weights method within estimations, as proposed by Jerrim et al. (2017a) and the OECD (2009). It can be used when presented with plausible values as a dependent variable. Thus, the average value of the estimations is obtained and the imputation error is incorporated into the variance of the estimated parameter. This permits the running of models, such as standard linear regressions or quantile regressions, that are technically robust. As Jerrim et al. (2017b) note, using a replication-weight procedure such as this has little effect on standard errors relative to models that account for the school-level clustering but allows us the flexibility to estimate models using the ten plausible values that datasets such as PISA utilise as outcome measures. From the point of view of this study, it also enables us to undertake models focused on distributional analysis.

As previously noted, selection bias may be an important issue when considering the relationship between performance and school SEC. As a result, our models control for a range of observable student and school characteristics likely to be correlated with performance in PISA and attending a disadvantaged school. Thus, we first estimate two separate standard linear regressions, such that:

$${PISA \,Reading}_{i}={\beta }_{0}+{{\beta }_{1}\mathrm{Disadvantaged}\_\mathrm{School}}_{i}+{\upgamma \mathbf{X}}_{i}+{\varepsilon }_{i}$$
(1)
$${PISA\, Maths}_{i}={\beta }_{0}+{{\beta }_{1}\mathrm{Disadvantaged}\_\mathrm{School}}_{i}+\gamma {\mathbf{X}}_{i}+{\varepsilon }_{i}$$
(2)

where PISA Readingi and PISA Mathsi represent the PISA scores for reading and mathematics of student \(i\). \({\beta }_{1}\) represents the difference in achievement between a student who attends a disadvantaged school and one who attends a non-disadvantaged school, all else being equal. \({\mathbf{X}}_{i}\) is a vector of student and school characteristics (such as individual socio-economic background and school resources), with \({\varepsilon }_{i}\) representing the error term. It is also worth noting that as the linear regression specification assumes a linear relationship between the dependent and independent variables and adheres to the law of iterated expectations, we can make inferences not only at the individual level but also regarding the average unconditional changes in achievement across the population (Rios-Avila & Maroto, 2022).

A key aim of this study is to go beyond a mean analysis such as that provided by the Ordinary Least Squares (OLS) estimation of Eqs. (1) and (2) and examine the relationship between performance and school SEC from a distributional viewpoint. There are a number of empirical approaches available when undertaking a distributional analysis. Most common among them are the CQR as described by (Koenker & Bassett, 1978) and the UQR as outlined by Firpo et al. (2009). Papers such as Rios-Avila and Maroto (2022), Borgen et al. (2022), Wenz (2019), Porter (2015) and Maclean et al. (2014) provide valuable summaries of the differences between these alternatives.

The CQR can be used to study the relationship between variables across the conditional distribution of an outcome. However, as noted by Rios-Avila and Maroto (2022), estimates within the CQR should be interpreted as effects experienced by groups that are defined by a set of characteristics, i.e. conditional effects. Given this, CQR models are relatively easy to interpret with just one single independent variable but the interpretation of coefficient estimates changes when multiple covariates are included. In other words, estimations can be seen as the relationship between two variables given a set of individuals with specific characteristics (e.g., students with the same socio-economic background, gender, etc.) and cannot be generalised as effects that would affect the unconditional statistic of interest. In contrast, the UQR provides an alternative approach where the definitions of quantiles are not affected by individual values of model covariates, as they describe a characteristic of the distribution of the outcome variable as a whole.

Porter (2015) provides a useful education-related example of these issues. It involves estimating a CQR at the median with mathematics proficiency as the hypothetical dependent variable. Dummy variables for gender and for taking a developmental mathematics programme are included as independent variables. With this CQR specification, the coefficient for the mathematics developmental is interpreted as the effect at the median of the distribution for males and at the median of the distribution for females, rather than as the average effect at the median of the entire test score distribution. Therefore, if females score lower than males (or vice versa) such that these medians differ considerably, the CQR coefficients represent the effects of the developmental mathematics programme at these different medians for the different groups, i.e. high achieving boys and low achieving girls (or vice versa). Adding more independent variables to the specification makes interpretation even more complex. For ease of interpretation, we would ideally like to know what is the relationship between the developmental programme and mathematics achievement at the median of the unconditional distribution. In other words, we are interested in the effect for the population of students who perform at the median of the overall score distribution, and not for students who perform at the median of groups defined by the independent variables included in the model (Porter, 2015).

The UQR model, proposed by Firpo et al. (2009), addresses this problem by marginalising the effect over the distributions of the other covariates in the model and therefore provides estimates that are more interpretable. However, as highlighted by Rios-Avila and Maroto (2022) and Borgen et al. (2022), inferences within the UQR are only valid when analysing small changes in the distribution and so caution is needed when interpreting the effects of a binary variable (such as our key variable of interest in the current analysis). This suggests that the appropriate interpretation of binary variables with the UQR is similar to that of an incidence rate (Rios-Avila & Maroto, 2022). For example, this may entail referring to a certain percentage point increase (or decrease) in the share of students enrolled in disadvantaged schools in the sample. Such an interpretation is more akin to examining population-level effects rather than effects at the individual level. Given that policy decisions surrounding disadvantaged schools and school SEC are typically made at the population-level rather than based upon specific (conditional) groups of students, and we are interested in how the unconditional distribution of performance is influenced by school SEC, we utilise the UQR.

The UQR technique is based on the use of the re-centered influence function (RIF) with the RIF of the dependent variable (Reading or Mathematics achievement) calculated, where the RIF for the τth quantile is given asFootnote 10:

$$RIF(Y;{\widehat{q}}_{\tau })={\widehat{q}}_{\tau }+\frac{\tau -D(Y\le {\widehat{q}}_{\tau })}{{\widehat{f}}_{I}({q}_{\tau })}$$
(3)

In Eq. (3), \({\widehat{f}}_{I}({q}_{\tau })\) is the marginal density of Y at point \({q}_{\tau }\) estimated by kernel density methods, \({q}_{\tau }\) is the sample quantile, and \(D(Y\le {\widehat{q}}_{\tau })\) is an indicator function determining whether the outcome variable is less than the τ-th quantile or otherwise. As noted by Agyire-Tettey et al. (2018), a key feature of the RIF approach as developed by Firpo et al. (2009) is to replace the outcome variable with the estimated RIF and then regress this against a set of explanatory variables. Furthermore, Firpo et al. (2009) show that the RIF quantile regression model may be estimated using OLS. Thus, this approach allows the estimation of partial effects for each covariate at various points across the distributionFootnote 11.

For our study, this will correspond to the marginal impact of our covariates on Reading or Mathematics achievement at a given percentile. For the purpose of our analysis, we present results at the 10th, 30th, 70th and 90th percentiles (focusing on four to facilitate presentation and interpretation). The 10th and 90th percentiles were selected as they correspond to scores of very low and very high achievers, respectively. The 30th and 70th percentiles were selected to illustrate comparative points along the performance distribution. While these four separate percentiles are presented initially, Figs. 3 and 4 (discussed in detail later) provide an illustration of the relationship of interest across a fuller range of the achievement distribution.

Fig. 3
figure 3

Source: Analysis of PISA data for 2018. Confidence Level for CI 95%

Reading Achievement—Unconditional quantile estimates for disadvantaged school dummy.

Fig. 4
figure 4

Source: Analysis of PISA data for 2018. Confidence Level for CI 95%

Mathematics achievement—unconditional quantile estimates for disadvantaged school dummy.

Results

Table 3 presents results of our OLS models for the two different PISA outcome variables (Reading and Mathematics). Of central importance to the current paper is the finding of a negative association between achievement (in both reading and mathematics) and school disadvantaged status, having controlled for other variables in the models, including individual ESCS. On average, there is a gap of about 22 points in reading and about 18 points in mathematics between students in disadvantaged and non-disadvantaged schools, all else being equal. Notably, these gaps are about two-fifths the size of the raw gaps evident in Table 2, suggesting that the raw compositional effects are at least partly explained by the independent variables included in the models.

Table 3 OLS estimates of PISA test performance for reading and mathematics

While these results can be considered to be the conditional partial effects and are of importance, we are also interested in the unconditional partial effects that reflect a more population-level interpretation. To do so, we draw on Rios-Avila and Maroto (2022) and Rios-Avila and de New (2022) and apply a transformation to the estimate associated with our dummy variable for disadvantaged schools. This transformation allows us to estimate the expected change in achievement associated with a specific change in the percentage of students in the sample attending disadvantaged schools. We are illustrating this on the basis of a 15 percentage-point decrease in the share of students attending disadvantaged schools in the sample. One reason for focusing on the impact of a decrease in the share of students attending disadvantaged schools is that policy in this space often aims to reduce the concentration of those from lower socio-economic groups in schools or conversely to increase the social mix in schools. From this perspective, our estimates suggest that if the share of students attending disadvantaged schools in the sample decreased from 25 to 10 percent, average performance in reading would increase by 13.5 points, while average performance in mathematics would increase by 10.9 points.Footnote 12

Table 3 also presents parameter estimates and associated standard errors for the remaining variables included as controls in our models. We do not discuss these findings in detail, given that associations are typically in the expected direction and are not of direct relevance to the main focus of this paper.

Although these results suggest a considerable difference in performance for mathematics and reading for those attending disadvantaged schools on average, the current analysis extends beyond the average by considering heterogeneity across the distribution. Tables 4 and 5 present the estimates of unconditional quantile regressions for the 10th, 30th, 70th and 90th percentiles of PISA performance, for reading and mathematics respectively. Results are estimated using the same specification as those in Table 3.

Table 4 Distributional analysis of PISA 2018 reading achievement
Table 5 Distributional analysis of PISA 2018 mathematics achievement

With regard to our key variable, Table 4 illustrates a statistically significant 31-point gap in reading performance at the 10th percentile between students in disadvantaged and non-disadvantaged schools, having controlled for other variables in the model. However, as previously noted, when the variable of interest is binary, the UQR is best used to identify how small changes in the distribution of independent variables affect the distribution of the dependent variable. Therefore, similar to the calculation applied for the OLS presented in Table 3, we can adjust the interpretation of this to be interpreted as the expected change in reading achievement at the 10th percentile associated with a 10 percentage-point decrease in the share students attending disadvantaged schools in the sample. From this viewpoint, our estimates suggest that, if the share of students attending disadvantaged schools in the sample decreased from 25 to 10 percent, performance in reading for those at the 10th percentile would rise by 18.9 points.

The results also show that the achievement gap between students attending disadvantaged and non-disadvantaged schools decreases in size at higher levels of achievement. At the 90th percentile, the gap in reading achievement between students in disadvantaged and non-disadvantaged schools is statistically insignificant (Table 4). More specifically, our estimates suggest that if the share of students in disadvantaged schools in the sample decreased from 25 to 10 percent, performance in reading for those at the 90th percentile would not change significantly.

Figure 3 helps to illustrate this gradient across the achievement distribution using the estimated coefficients from Table 4 with a larger “penalty” for attending a disadvantaged school at the lower end relative to the top. Figure 3 also shows how estimates across the distribution vary from the conditional mean estimates seen in Table 3. Visual inspection of Fig. 3 shows that there is overlap in the confidence intervals across the some of the quantiles examined. However, subsequent statistical analysis finds some significant differences; i.e., estimates at the 10th, 15th, 20th and 25th percentiles are significantly different from those at the 85th and 90th.

Results using mathematics as the outcome variable are presented in Table 5 (again estimated using the same specification as those in Table 4). Similar to reading achievement, results show that the gap between students in disadvantaged and non-disadvantaged schools decreases as achievement increases, but to a lesser extent to that seen with reading. That is, at the 10th percentile, there is an estimated gap of almost 25 points between students in disadvantaged schools and those in non-disadvantaged schools. As outlined above, an alternative interpretation of this would be that if the share of students attending disadvantaged schools in the sample decreased from 25 to 10 percent, performance in mathematics for those at the 10th percentile would rise by nearly 15 points. Similar to reading performance, at the 90th percentile, the gap between students in disadvantaged and non-disadvantaged schools in mathematics is statistically insignificant. This gradient is illustrated in Fig. 4 but shows a flatter trend across the achievement distribution relative to that seen in Fig. 3 for reading. Subsequent analysis finds that differences between the various percentiles are not statistically significant.

Similar to Firpo et al. (2009), Cullinan et al. (2021) and Rios-Avila and Maroto (2022), it may be helpful to illustrate the differences between the UQR and the CQR models within the context of our analysis. Therefore, we also estimated the conditional quantile model for the 10th, 30th, 70th and 90th percentiles with both reading and mathematics achievement as dependent variables. We present these in Tables 6 and 7 respectively in the Appendix.

As noted by Rios-Avila and Maroto (2022), the use of a CQR model versus an UQR will depend on the relevant research question and variable of interest. For example, if the purpose is to consider the association between attending a disadvantaged school and performance for students at particular quantiles of the conditional distribution of y given x, the CQR may be most appropriate. In contrast, if the purpose is to analyse how changes in the overall proportion of students in disadvantaged schools in the population is associated with changes in the unconditional distribution of achievement, the UQR is most useful. Therefore, the results in Tables 6 and 7 illustrate the estimated gap in reading and mathematics performance at different points of the achievement distribution across two groups of students who are identical in terms of our observed characteristics, except that one group attends a disadvantaged school and the other group does not.

In our application, comparing the estimates for school disadvantaged status in Table 4 with those in Table 6 suggest similarities in point estimates for quantiles closer to the median. However, differences are somewhat larger at the lower and upper tails of the distributions. For example, the estimated coefficient at the 90th percentile associated with the disadvantaged school dummy is nearly twice as large in the CQR model compared to the UQR approach and presents as statistically significant in the former and not in the latter. These types of differences in estimated effects are also evident for the same variables with mathematics as the dependent variable, thus helping to illustrate the care needed when interpreting quantile regression results from different estimation approaches.

Conclusions and discussion

The existence of a “multiplier”, “compositional” or “social context” effect is debated extensively in the literature on school effectiveness. Some researchers have posited that the relationship between social context and achievement is mediated by school, teacher or classroom factors, such as teacher expectations, quality of instruction, or adequacy of school resources. In this context, this paper examines the extent to which there is evidence of a social context effect in Irish post-primary schools, after taking into account various school and classroom variables. Despite controlling for such variables, we find that, on average, there is a statistically significant negative relationship between school disadvantaged status and student achievement in reading and mathematics. The finding that school disadvantaged status is statistically significant having controlled for school climate factors indicates that school and classroom processes included in the current models do not fully explain the social context effect in Ireland.

From a distributional perspective, results show a differential “effect” of school disadvantaged status, particularly in reading, with a stronger association at the lower end of the achievement distribution. This suggests that for reading in particular, the “penalty” associated with attending a disadvantaged school is concentrated towards the bottom of the distribution. Rangvid (2007) also noted a stronger SEC effect at the lower quantiles of achievement in reading for Denmark.

In mathematics, findings of our analysis show that the achievement gap between students in disadvantaged and non-disadvantaged schools decreases at higher levels of achievement, but to a lesser extent than in reading. The finding of a weaker distributional effect in mathematics mirrors that of Rangvid (2007) who showed that for mathematics, the school SEC effect was similar for low- and high-achieving students. Our main UQR findings for reading and mathematics are also comparable with those of Schneeweis and Winter-Ebmer (2007). In contrast, our results are somewhat different to those of Perry et al. (2022) who described a more homogenous effect of school socio-economic status across the distribution. However, a direct comparison is difficult, given that Perry et al. use a different measure of school socio-economic status (based on a continuous variable) and also use the CQR model.

It is useful to consider some policy implications of our findings. Firstly, our OLS analyses show that even having controlled for individual SES and other student and school characteristics, students in disadvantaged schools have lower scores on average in reading and mathematics than their counterparts in non-disadvantaged schools. This underscores the need for ongoing supports for schools serving the highest concentrations of young people from disadvantaged backgrounds and provides some validation of the Irish policy approach of providing additional resources to schools with the highest concentrations of disadvantage (DoE, 2022a).

Secondly, findings of our distributional analyses show that the achievement gap between students in disadvantaged and non-disadvantaged schools is wider at lower levels of achievement, particularly in reading. This underscores the need for particular targeting of low achieving students in disadvantaged schools, especially in reading literacy. In the policy context of our study setting, namely Ireland, this finding is of policy importance for the forthcoming national strategy on literacy, numeracy and digital literacy. Recent findings from school inspections in DEIS schools highlight that while a number of evidence-informed initiatives are used in Irish primary schools targeting low achievers in reading and mathematics, there appear to be fewer such programmes in use at post-primary level (DoE, 2022b). Findings of the current analyses lend support to the DoE recommendation to provide post-primary DEIS schools with specific guidance on the teaching of literacy and numeracy and the use of effective teaching strategies.

Thirdly, in a more general sense, we believe that our study highlights the need to think beyond the mean when designing or evaluating education policy or examining differences between student groups. While not the main focus of our paper, it is noteworthy that our analysis illustrates variation around conditional mean effects that can exist with respect to reading and mathematics achievement and other variables such as gender. Such differences are not identified using traditional OLS approaches. Yet, these are of central importance when designing policy responses to gender differences in achievement or uptake of STEM subjects. In the context of secondary analysis using international large-scale assessment datasets, we suggest that future research should consider the full variety of empirical approaches, including those that permit distributional analysis. Currently there is relatively limited use of such methods in the education literature but recent software developments and contributions to the literature should enable greater ease of estimation and interpretation.

In considering these results, we highlight some limitations. For example, a recognised limitation of the current study is that the PISA dataset does not include a measure of prior achievement, with potential overestimation of the SEC effect as a result (Van Ewijk & Sleegers, 2010), although it has also been suggested that controlling for prior achievement underestimates SEC effects (Sciffer et al., 2020). Related to this, there may be other unobserved individual-level characteristics that may impact reading or mathematics achievement (e.g., non-cognitive student-level attributes) which could result in omitted variable bias if they are also correlated with the selection of disadvantaged versus non-disadvantaged schools. Therefore, we present our results as associations, rather than causal effects. Furthermore, it may be contended that composition measured at school level is a less reliable approximation than composition measured at the class level and that measuring at school level results in attenuation bias towards zero. This problem is unavoidable in our analysis given the age-based sampling used in PISA.

Finally, our analysis is limited to Ireland and it may be difficult to generalise our findings around school socio-economic composition to other countries if these associations are context specific. However, it is notable that our findings are similar to those found in Denmark by Rangvid (2007) and so may have broader application.

Despite these limitations and given the range of observable individual and school-level characteristics we utilise, this study makes a valuable contribution to the extant literature on school SEC effects. Our findings specifically underscore the importance of educational policy supporting social mix across schools. Recent legislation in Ireland aims to ensure that school admissions policies are legitimate, reasonable and fair and prohibit the use of waiting lists as a means of selecting students for admission. A transparent approach to school admissions should support social mix across schools which is to be welcomed in the promotion of educational equity.

It is also worth noting that in our study the outcomes examined were reading and mathematics. Increasingly, policy makers nationally (e.g., Government of Ireland, 2019) and internationally (e.g., Organisation for Economic Co-operation & Development, 2019b) place a high degree of importance on a broader range of outcomes. Therefore, future research could usefully exploit data from international large-scale assessments to consider how non-cognitive outcomes, including wellbeing and educational aspirations, may be associated with compositional effects and/or examine non-cognitive outcomes from a distributional perspective. Future research in this space might also usefully consider the role of gender and the potential for differential experiences of girls and boys.

Availability of data and materials

The dataset analysed during the current study are available in the PISA 2018 data repository https://www.oecd.org/pisa/data/2018database/.

Notes

  1. Sirin (2005) recognises that using aggregated SES measures risks introducing an ecological fallacy; i.e., an incorrect interpretation at the individual-level on the basis of group aggregated data.

  2. Most studies in this area draw on cross-sectional data that do not support causal interpretations of the association between school socio-economic context and student achievement. We use the term “effect” to refer to the magnitude of the association between the variables of interest and do not intend a causal interpretation.

  3. There is also previous literature suggesting that girls and boys experience the school social context effect differently. Some findings have pointed towards a stronger social context effect for boys than for girls while other findings are more ambiguous with respect to gender differences (Sofroniou et al., 2004; Legewie & DiPrete, 2012; van Hek et al., 2018).

  4. DEIS is the Irish language word for ‘opportunity’.

  5. The Department of Education and Skills was renamed as the Department of Education in October 2020.

  6. Gubbels et al. (2020) and Avvisati (2020) provide notable further summaries and empirical uses of the PISA data.

  7. It is also worth highlighting that the 2018 Irish PISA data is based on a weighted final sample as a percent of the target population of 84%, placing it higher than countries such as the UK, Denmark, Canada and New Zealand (Jerrim, 2021).

  8. To facilitate interpretation of scores, the scale scores were originally designed to have an average of 500 points and a standard deviation of 100 (OECD, 2001). Ireland’s standard deviation on reading achievement was about 91 and about 78 on mathematics (McKeown et al., 2019).

  9. See Avvisati (2020) for more details on this socio-economic index.

  10. The user written STATA commands rifreg (Firpo, Fortin & Lemieux, 2009) or rifhdreg (Rios-Avila, 2020) can be used to estimate the UQR in conjunction with the aforementioned repest command.

  11. Rios-Avila and Maroto (2022), Borgen et al. (2022), and Wenz (2019) provide greater detail on the technical aspects of the UQR.

  12. Similar to Rios-Avila and de New (2022), we calculate this by multiplying dummy coefficients in Table 3 by -Change in dummy share/(Current share of dummy *100). For example, with reading as the outcome measure, we arrive at the value of 13.5 points by multiplying the estimated coefficient of − 22.44 by − 15/(0.25*100), given that 25 percent of students in our sample attend disadvantaged schools.

References

Download references

Acknowledgements

Not applicable.

Funding

Not applicable.

Author information

Authors and Affiliations

Authors

Contributions

All authors made a substantial contribution to the design and writing of this paper. DF conducted the statistical analysis. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Darragh Flannery.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

Not applicable.

Competing interests

The authors declare that they have no competing interests.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix

Appendix

See Tables 6 and 7.

Table 6 Conditional quantile regression of PISA 2018 reading achievement
Table 7 Conditional quantile regression of PISA 2018 mathematics achievement

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Flannery, D., Gilleece, L. & Clavel, J.G. School socio-economic context and student achievement in Ireland: an unconditional quantile regression analysis using PISA 2018 data. Large-scale Assess Educ 11, 19 (2023). https://doi.org/10.1186/s40536-023-00171-x

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1186/s40536-023-00171-x

Keywords