Skip to main content

An IERI – International Educational Research Institute Journal

The relationship between students’ socio-demographics and the probability of grade repetition in Brazilian primary education: is it decreasing over time?


This article aims at a better understanding of the Brazilian education system’s performance concerning the quality and equity over the decade 2007–2017. It examines the extent to which students’ sociodemographic characteristics are related to schooling trajectory without failure in primary education and how such relationships have changed over time. Multilevel logistic models are applied to cross-sectional student assessment data (Prova Brasil), considering the hierarchical structure of four levels. The total number of students covered is 12.4 million. The results suggest a pattern of educational inequity marked by socioeconomically disadvantaged status, gender, and self-declared race/ethnicity. The analyses show that the gender gap increased and the differentials by socioeconomic status and race/skin colour decreased over the decade. The estimates also suggest that the effect of school socioeconomic composition was reduced, but the effect of the proportion of repeating students per school on the individual probability of success was reinforced. In addition, the probability of students’ success varies randomly across schools, municipalities, and states, and such educational disparities across states are increasing over time.


In countries marked by educational and social inequalities, such as Brazil, the definition of strategies to reduce them involves incorporating scientifically based knowledge to answer the central question: How can governments guarantee the fundamental right of quality in education to all, and what consistent measures to reduce inequalities are needed? The role of governments is central in mentoring and implementing public policies to ensure that such right is effectively guaranteed for all children, adolescents, and young adults. Achieving quality and equity in basic education is the central objective of the Brazilian educational system, according to the Federal Constitution and the Law of Guidelines and Bases of National Education.

The debate on educational quality and equity and their relation with grade repetition goes back to the middle of the twentieth century, and the knowledge production has been quite extensive since then (Alves & Ferrão, 2019; Armitage et al., 1986; Ferrão et al., 2017; Ferrão, 2022a, 2022b; Gomes-Neto & Hanushek, 1994). According to the Census of Education the rate of grade repetition (taxa de reprovação) in primary and lower secondary education was 12.1% in 2007, and 7.4% in 2017 (INEP, 2020). Alves and Ferrão (2019) showed that from 2007 to 2017, grade repetition was sharply reduced without deterioration of quality in public education. Leighton et al. (2019) showed that twelve-year-old students who have been exposed to the social promotion policy since they were seven have almost 5 percentage points lower chance of being delayed a year or more in their studies than do students who attended schools with grade repetition. Nevertheless, the phenomenon of grade repetition is still marked by individual sociodemographic characteristics, such as gender, socioeconomic status, and race/skin colour. Based on PISA 2012 data, evidence reported by Ferrão et al. (2017) showed the association between students’ socioeconomic status and their probability of grade repetition, and suggested that there is a cumulative risk of grade repetition at ISCED 2 level after an early repetition (at ISCED 1 level). These authors also found that students’ probability of grade repetition seems to be more related to school socioeconomic composition and school proportion of repeaters, no matter a school is private or public. Given the target population of PISA survey, inference refers to the population of 15-year-old students. In fact, few Brazilian studies on educational inequality are focused on cohorts of younger students and take into account diverse sociodemographic variables such as students’ race/skin colour, gender and socioeconomic status. To our knowledge, none of them explores the relationship of such variables with students’ grade repetition in an integrated way. Our purpose is to fill in the gap. In addition, we examine the differentials of probability of grade repetition by race/skin colour, which is impossible to explore with PISA data since such demographic attribute is not collected by the OECD’s survey. Therefore, the article contributes to the field by pursuing two main objectives. The first is to contribute by having gathered empirical evidence about the relationship between grade repetition and race/skin colour, gender, and socioeconomic status, controlling for individual performances in maths and reading achievement test scores. Thus, we aimed to test the hypothesis that educational inequality still exists but it has been attenuated over time. The second purpose of this paper is to illustrate that, at early stages of education, there are learning differentials in maths and reading by race/skin colour groups, adding to the literature of social equity–learning interaction research. To our knowledge, this is the first study on grade repetition and equity in Brazilian education based on evidence provided by a decade’s worth of cross-sectional data with national coverage. Specifically, this study addresses the following research questions: (1) Is there still any evidence of individual sociodemographic characteristics, such as socioeconomic status, race/skin colour, and gender, influencing students’ success in primary education? (2) If affirmative, have those relationships been increasing or decreasing over the decade 2007–2017? (3) To what extent does a student’s success in primary education depend on school composition regarding the students’ socioeconomic level and the proportion of students who have experienced grade repetition? (4) Is there any interaction between students’ achievements (in maths and reading) and race/skin colour on the probability of success? Does such pattern change over time?”.

Although this study is focused on the Brazilian socio-educational context, the paper’s contribution is relevant to other countries where similar concerns on educational inequality have been raised, specifically those countries with high grade repetition rates. The remainder of this paper consists of four parts. The first constitutes a literature review, which comprises the topic of grade repetition in the international and Brazilian settings; the second includes the explanation of data collection, the design of the study, and the statistical models applied; the results and discussion are presented in the third section; and finally, the fourth contains some concluding remarks.

Recent research on grade repetition or school failure

A broad review on the topic of school failure and grade repetition has been published recently (European Agency for Special Needs & Inclusive Education, 2019), including evidence based on a 100 papers, that came out between 2011 and 2018, regarding several countries such as: Australia, Belgium (Flemish community), Canada, Finland, France, Germany, Iceland, Ireland, Italy, Japan, Netherlands, Qatar, Spain, Sweden, United Kingdom (England, Scotland and Wales) and the USA. According to the authors of the report, the term “school failure” is used in a variety of ways and is often used without being defined. They set the three major groups of definitions found, namely: (i) those focused on the learner dropping out of school or early school leaving; (ii) those focused on low levels of academic achievement or academic failure, including the terms of ‘grade repetition’, ‘over-age retention’ or ‘grade retention’; (iii) those related to poor transition to adulthood. Regarding the group (ii), authors highlight several policy, programmes, or interventions that may positively impact academic success and avoid grade repetition. They include community-based available resources for learners and for teachers, “school safety, teacher-learner relationships and perceptions, curriculum-based measures to identify learner progress and apply interventions (such as assessment for learning), evidence-based programmes to support learners, teaching cognitive skills from an early age and accommodations based on learner needs (e.g. seating arrangements or additional time in examinations)” (p. 18). At the student level, the need to address personal challenges, such as “low academic engagement, low expectations, low levels of self-efficacy, perceived unimportance of school, low levels of homework completion, poor behavioural expectations and challenging behaviour, special educational needs or other difficulties (i.e. high stress, hyperactivity, poor attentional control) and previous academic success” (p. 18). The review by Faubert (2012) focuses on why investing in overcoming school failure is worthwhile and on actions to be taken at the school level, particularly in low performing schools. Countries included are the following: Austria, Canada, Czech Republic, France, Greece, Ireland, Netherlands, Spain and Sweden. The author claims to have examined “the most relevant empirical literature on the subject of in-school practices for overcoming school failure” (p. 25), and in that context, he emphasizes the relevance of adequate provision of education by schools and school systems in order to address different needs students have to overcome learning difficulties. Such difficulties usually are “also an issue of equity” (Faubert, 2012, p. 3). Both reviews pay special attention to the role of prevention and intervention programmes used to reverse or reduce school failure effects, and on the compensatory strategies to be applied when the point of school failure is reached.

Concerning Asian, African and American countries, Momo et al. (2019) found that the most common causes for school failure include “lack of income, parents’ education and employment status, living in a single-parent household, being an illegitimate child, age, region of residence and school performance. Specifically, for Asia, immigration and ethnicity revealed to be important factors” (Momo et al., 2019, p. 496). In fact, students often do not complete high school for intricate causes (e.g. Lee-St. John et al., 2018; Riad et al., 2021), which usually manifest long before they are enrolled in high school. For example, Lee-St. John et al. (2018) showed that students’ tailored support at the elementary school may lead to meaningful effects on students’ latter schooling trajectory, showing the power of well-designed intervention programmes.

None of the aforementioned international reviews refers the Brazilian educational system. Thus, besides the objectives of this paper, it also contributes to inform the literature on the equity achievements Brazilian educational system had from 2007 to 2017.

On the Brazilian system of education

Brazilian basic education is organized into three stages: early childhood education (ISCED 0) serving children 0 to 5 years of age; elementary education, divided into two levels (ISCED 1 and ISCED 2) lasting 5 and 4 years, respectively; and high school, lasting three or four years (ISCED 3). The Brazilian Constitution guarantees free and compulsory basic education from 4 to 17 years of age. According to the Census of Education (INEP, 2019b), in 2017, there were 49 million students enrolled in basic education and 15 million in the first level of elementary education (ISCED 1), with 81% of students in public schools.

Performance of Brazilian educational system over the decade 2007–2017

In order to improve the country’s educational standards, the Brazilian Ministry of Education created the Index of Development of Basic Education (IDEB), which sets biannual goals targeting progressive improvement to be achieved by 2021 (Brasil, 2007). IDEB was designed as an indicator of the quality of education that combines information from the performance of students in national assessments (at the end of school levels of primary, lower secondary and secondary education) with flow rates (Fernandes, 2007, 2016). Its purpose is twofold: (1) to identify schools or education subsystems whose students show a poor performance and low proficiency levels; (2) to monitor the progress of students’ performance in a time span. In order to achieve IDEB goals, schools need to greatly improve educational performance and regularize school trajectory by reducing grade repetition rates. According to Fernandes (2016), IDEB represents the results-based accountability in the Brazilian education system, that is “Prova Brasil, IDEB and the Plano de Metas Compromisso Todos pela Educação [Plan of Goals Commitment All for Education] form a system of accountability compatible with the existing federalism in the country” (Fernandes, 2016, p. 109).

IDEB’s goals have been reached at the ISCED 1 level (primary education), but not in the subsequent stages (INEP, 2018a). Along with other educational outcomes, IDEB averages are associated with the social composition and quality of education (Alves & Soares, 2013; Duarte, 2013; Daniel Abud Seabra Matos & Rodrigues, 2016; Soares & Xavier, 2013; Soares & Alves, 2013). Duarte (2013) found that having in the school the highest concentration of students receiving financial support from the federal program to reduce poverty (“Bolsa Família”) had a negative effect on IDEB. The study also showed that state’s investment in education (cost per student) and the location of the school in smaller municipalities are moderating factors of such relationship. Some studies suggest that IDEB tends to reproduce the country’s regional and socioeconomic inequalities (Padilha, Érnica, Batista, & Pudenzi, 2012; Soares & Xavier, 2013). Other studies report progressive improvement in education outcomes. Alves and Ferrão (2019) provided descriptive evidence that grade repetition rate declined over time without affecting the quality of education in primary and lower secondary education. They considered four learning levels (below basic, basic, adequate, advanced) as defined by the Secretaria de Estado da Educação of São Paulo in 2008 (e.g. Soares, 2009), and demonstrated that, in 2007, 25% of 5th students reached the adequate level in reading and 21% in mathematics, and in 2017 the percentages were 57% and 45%, respectively; for 9th graders, the percentages were, respectively, 16% and 9% in 2007 and jumped to 35% and 16% in 2017. However, they also found discrepancies in education outcomes across groups defined by socioeconomic and race/skin colour variables. The study shows that, in general, such gaps present a decreasing trend over time. Carnoy et al. (2015) analyzed the Brazilian sample of the Programme for International Student Assessment (PISA) from 2000 to 2012, and SAEB data from 1995 to 2013 in order to assess the “effectiveness of Brazilian primary and lower secondary education (grades 1–8/9)” (Carnoy et al., 2015, p. 450). Their findings suggest learning gains over the time span of the study, and they refer that nevertheless the data source, the gains “for more advantaged Brazilian students are lower than for those coming from families with lower educational resources” (Carnoy et al., 2015, p. 451). In addition, the study conducted by Rosa et al. (2019) on issues related to entry age into 1st grade of primary education, reports positive gains in the assessments of 2013 and 2015 of Prova Brasil.

Some educational prevention, intervention, or compensation programmes in Brazil

Recent findings (Brauw et al., 2015; Cenci et al., 2020; Hoffmann et al., 2021; Machado et al., 2020; Morton, 2019; Ryu et al., 2020; Weisleder et al., 2018) showed that education reforms jointly taken with prevention, intervention or compensation programmes might succeed in reducing the effect of social origin on students’ educational outcomes. For example, according to Ryu, Helfand and Moreira (2020), Brazil’s 2006 education reform has contributed to a sizable short-term impact in Mathematics and Portuguese-language learning gains, and in the medium term the impact was also positive. Such gains are consistent with the findings reported by Alves and Ferrão (2019) who also show greater learning improvements for the first stage of primary education (assessments at the 5th grade) than for the second stage (assessments at the 9th grade). Concerning the full-day school program, several researchers investigated its impact on students’ educational performance. For example, the study conducted by Rosa et al. (2022) in the state of Pernambuco, one of the poorest regions in the country, describes how the program was implemented from 2004 on, what human and material resources were made available to make the program possible. According to the authors, changes in the school’s physical inputs include the reform or construction “of spaces to be used for students. The government reformed labs and provided new inputs for schools”, “full-day school principals are more likely to be recruited through a competitive hiring process, and their wages are also higher than the wages of principals in regular schools” (Rosa et al., 2022, p. 5), and teachers were paid higher monthly wages for staying in the school. The number of instructional hours for math and language classes increased by 50% and 20%, respectively. Their findings show that “the full-day high school program positively and significantly affects student performance in math and language, by 0.22 SD and 0.19 SD, respectively” (Rosa et al., 2022, p. 13). In the state of Minas Gerais, the study by Soares et al. (2014) quantifies the effects of the program ‘Escola em Tempo Integral” (Full-time school), developed by the government of Minas Gerais since 2007 at public schools, on the results of large scale learning assessments in Maths and Portuguese. The findings suggest “positive results on the improvement of the proficiencies of the students who have a delay in the learning process”. The same is to say that there is evidence of a slight learning gain either for students who attended schools where the program took place or for their colleagues who attended other schools. For students attending low-performance schools at the beginning of the program, the gain was stronger.

In addition, regarding the non-repetition policy put in place in several Brazilian states, Machado et al. (2020) analyzed several kinds of information and data in order to identify the endorsement of state education systems regarding school cycles and continued progression regime at the end of specific grades, considering the timeframe from 2009 to 2018. They also investigated the degree of implementation of the regimes, and concluded that “Minas Gerais, São Paulo, Mato Grosso, and Distrito Federal were the education systems that showed the greatest adherence to non-repetition policies” (p. 1117). Therefore, the evidence given (Machado et al., 2020; Rosa et al., 2022; Ryu et al., 2020; Soares, 2014) suggests that several programs have successfully been implemented in the Brazilian education.

Conditional distribution of grade repetition and social disadvantage

The conditional distribution of grade repetition on students’ socioeconomic status shows that the higher the socioeconomic status, the lower the probability of repetition (Brophy, 2006; Ferrão, 2015; Ferrão et al., 2017; D.A.S. Matos & Ferrão, 2016). According to Ferrão et al. (2017), based on Brazilian sample of PISA 2012 data, regardless of the grade and the number of years of repetition, at the lower quintile of socioeconomic status assessed by HISEI (Ganzeboom, 2010), the probability of repetition was 0.26, while at the top quintile, it was 0.15. In addition, the comparison of these probabilities at the ISCED 1 level, which are 0.33 and 0.11, respectively, suggests that the relative risk of repetition is three times higher in the socioeconomic disadvantaged group. The distribution of late repetition (at ISCED 2) conditioned on early repetition (at ISCED 1) shows that in the group of students who were always promoted throughout the ISCED 1 level of education, 91% remained as such, while 9% repeated a grade at ISCED 2. The percentages are 62% and 38%, respectively, in the group of students who had experienced early repetition. The comparisons of 91% vs. 62% and 9% vs. 38% suggest a cumulative risk of failure after early repetition. Such results give strength to the need of scientific research conducted on early steps of education for policy and practice purposes. Riani et al. (2012) studied the effect of grade repetition between the 3rd and 4th grades of primary education on students' cognitive development, in public schools of Minas Gerais, by using data from the Literacy Assessment Program (Proalfa—Avaliação da Alfabetização da Rede Pública do Estado de Minas Gerais), which was the educational evaluation program promoted by the government of state, and carried out annually by external independent organizations. Their findings suggest that, on average, among two students at the same proficiency level in 2008, one of them having repeated a grade and the other not, the one who did not repeat tended to reach a higher score in 2009.

Race/skin colour and educational performance

Brazil is a multiracial country. The racial measurement in the Brazilian census and official statistics has always referred to individual’s self-classification in relation to five skin colour/race categories, namely White, Black, Pardo (mixed-race), Yellow (Asian) and Indigenous (Travassos, 2004). It is not allowed to select multiple classifications. The student’s questionnaire in Prova Brasil survey includes the following question:

Question 4. What is your colour or race?

A. White.

B. Black.

C. Pardo.

D. Yellow.

E. Indigenous.

F. I do not want to declare.

The racial disparities in education are a historical trait of Brazilian society (Silva & Hasenbalg, 2000) but have decreased dramatically since the universalization of education up to ISCED 2 level (Marteleto, 2012). However, according to data from the 2017 National Household Sample Survey (PNAD), the average number of completed years of schooling of Black and Pardo young adults (18 to 29 year old) remains below that of White and does not reach 12 years of schooling (Monteiro & Cruz, 2018). In addition, there is a persistent and complex gap between Black, Pardo, and White students regarding learning and school trajectory (Louzano, 2013; Soares, 2006; Soares & Delgado, 2016; Soares et al., 2016; Xavier & Alves, 2015). The composition of the school regarding the proportion of Black students also influences the students’ chances of school failure (Xavier & Alves, 2015). Even though the studies cited above are based on contextualized results models (OECD, 2008). Evidence obtained from value-added models confirms the existence of gaps by socio-demographic groups. For example, Ferrão et al. (2018) compared the estimates obtained from contextualized results models with those obtained from value-added models, considering the cohort of Brazilian students who attended the fifth grade of public schools in 2011 and who were steadily promoted to the ninth grade in 2015. They found that the students’ performance in maths and reading obtained at the end of ninth grade was related to their socioeconomic level, mother’s literacy, self-declared race/skin colour, and being a working student. Most of these marginal effects remained even after controlling for the students’ prior achievement but drastically declined in magnitude. Their findings suggest that such strong differentiation between race/skin colour groups might be partially explained by spurious covariance due to the omission of prior achievement in the contextualized results models.

In this study, we intend to demonstrate that, at the end of the fifth grade, the student’s probability of grade repetition in primary education is still marked by race/skin colour and, cumulatively, such probability is sharply increased for high-performing Black students when compared to White students, even controlling for students’ performance in maths and reading.


Participants and instruments

The data we use in this paper have two different sources: the Census of Education and the Avaliação Nacional do Rendimento Escolar (ANRESC)—well known as the Prova Brasil (INEP, 2018b). Prova Brasil was created in 2005 under the scope of the National System of Basic Education Assessment (SAEB) with the aim of assessing students’ learning at Brazilian public schools (Crespo et al., 2000; Fernandes, 2016; Pestana, 2016). According to Fernandes (2016; p. 102), the main difference between Prova Brasil and the former assessment system (SAEB) is that Prova Brasil has been intended to be applied to the universe of students, whereas the former SAEB was administered to a representative sample of students. In fact, Prova Brasil is applied to students at the fifth and ninth grades of primary and lower secondary education in schools with 20 or more students enrolled in these grades. It covers the Brazilian territory and is carried out every 2 years by the INEP, which is the organization responsible for the assessment and data collection.

Prova Brasil comprises standardized tests on Portuguese language (reading) and mathematics. Each test has 22 multiple choice items. The test booklets are prepared using a methodology called Balanced Incomplete Blocks (BIB), which allows a large number of items to be applied to a set of students, without each student needing to answer all of them. In other words, in each class many versions of the tests are administered. The application is supervised by professionals hired specifically for data collection and tests administration. Item response models for multiple groups are applied by INEP in order to obtain students’ scores. Klein (2003) briefly describes the methodology and estimation procedure for item calibration and for obtaining a unique equated scale by subject. More details are available in SAEB technical report (INEP, 2019a). Prova Brasil also includes questionnaires targeting students, teachers, and principals filled out by themselves, which collect data about social background and contextual factors. An additional questionnaire about schools’ infrastructure, conservation and safety is filled out by the person responsible for administering the tests, an external and independent professional.

For the purpose of this paper, we analyzed data concerning the fifth-grade students. Prova Brasil 2007 did not include rural schools (less than 0.05% of students), which began to participate effectively in 2009. In addition, we decided to exclude federal public schools that represent a minimum number of enrolments (less than 0.05% of students and schools) and have very different characteristics from other public schools. The number of students involved varies from 2.5 million to 3.1 million depending on the year (Table 1). The analysis of missing data was based on previous work reported in Ferrão et al. (2020).

Table 1 Number of students per education outcome variable (unit = 1,000,000)

For the purpose of this study, the variables are as follows: Y: Student’s successful trajectory in primary education (y = 1 if the student was always promoted from the first to the fifth grade; y = 0 otherwise). X1: Gender (1: female; 0: male); X2–X5: Dummy variables for race/skin colour, with “White” as baseline. X6: Student’s socioeconomic status, a composite variable which reflects different aspects of family background. It was measured by a score obtained from a graded response model (Samejima, 1997) applied to student’s questionnaire items, concerning their parents´ education (i.e., literacy status and educational attainment) and family income estimated indirectly from the possession of a goods or set of household items such as car, computer, washing machine, television set, refrigerator, freezer, and whether the household hires a maid service. According to the GRM model assumption, the latent factor—SES—is unidimensional, and it was tested through the polychoric correlation matrix and principal component analysis. Data for Y, X1, …, X6 are based on students’ self reports. X7 and X8 are standardized scores for the student’s mathematics and reading achievement at the end of fifth grade. These scores were estimated using an item response theory model. They have a normal distribution and were transformed to a scale with mean 250 and standard deviation of 50 points anchored at ninth-grade. Thus, performance scores of the fifth and ninth graders are ordered on a continuum scale, making possible the comparisons across grades and across time (editions of the Prova Brasil), both in mathematics and reading. Further details about standardized tests and scales are given by Klein (2003). X9–X16 represent interaction terms between race/skin colour and the student’s scores in maths and reading. X17 represents the school’s socioeconomic composition and it is quantified by the aggregated mean of X6; X18 is the school’s proportion of students promoted to the next grade in primary education. This variable is calculated by INEP based on Census of Education administrative records in the respective year for each educational stage. Some descriptive statistics are provided in Tables 2 and 3.

Table 2 Education outcomes in the fifth grade, 2007–2017
Table 3 Proportion of Successful Trajectory by socio-demographic variables

Table 2 includes, from 2007 to 2017, the proportion of students without grade repetition and the overall score averages for reading and mathematics. We highlight the notable improvement of the educational system suggested by such statistics. In fact, there was a sustainable increase in the proportion of students with a successful schooling trajectory, i.e. who were always promoted to next grade level, and in the averages of reading and mathematics proficiency.

Descriptive statistics presented in Table 3 show that girls had a higher probability of successful trajectory than boys, and this advantage pattern remained almost constant over these 10 years. Regarding racial differences, White students had the highest probability of successful trajectory, while the Black ones had the lowest. The patterns of inequality among racial groups were relatively constant, except for the gap between White and Pardo groups, which shrank over the period of analysis.

The distribution of successful trajectory given the student’s socioeconomic status shows that in the lower fifth of SES distribution, the proportion of students always promoted ranged from 0.53 to 0.68, while in the top fifth, the proportion ranged from 0.79 to 0.84. The gap in proportions between these extreme groups narrowed consistently over the decade.

Design of the study

Bearing in mind the four research questions, we decided to fit two different multilevel statistical models for every year from 2007 to 2017. This contribution examines the relationship between individual and institutional factors in influencing students’ tendency toward success. The first and second research questions are mainly supported by model 1, which considers the sociodemographic attributes as explanatory variables. Model 2 also includes the additive terms referring to the main effect of learning in reading and maths, the interaction term for learning and race skin/skin colour, and the terms for school composition variables. Thus, model 2 allows us to check whether there is any race/skin colour differential in the probability of students’ success that is due to another factor than the students’ learning throughout primary education. It also provides empirical evidence for the effect of school composition on the individual probability of grade repetition, supporting research question 3. The literature review presented above gives us evidence on the relationship between race/skin colour and students’ educational outcomes, controlling for SES. In general, the educational system is organized in order to promote to next grade students who achieve certain learning goals. Thus, the main effect of students’ performance and the probability of successful trajectory is expected to be positive. Moreover, model 2 includes the interaction terms between race/skin colour and performance in maths and reading, considering as reference the group of self-declared White students. In this way, the parameters related to such interaction terms support the fourth research question. The variables of performance in mathematics and reading are z-scores.

Statistical modelling

Multilevel logistic regression was applied with MLwiN (Rasbash et al., 2017). We fitted separately two models for every year 2007–2017, with the logit link function depending on the set of covariates included in the linear predictor, i.e., models 1 and 2 explained above, with equations and assumptions presented in the Appendix. The estimation procedure was the penalized quasi-likelihood of second order (Goldstein & Rasbash, 1996).

Furthermore, the variance partition coefficient (Goldstein et al., 2002), also known as intraclass correlation (ICC) in the sample survey literature, was applied to quantify the extent of clustering of students’ success across schools, municipalities, and states. In fact, the ICC is considered a key statistic in educational effectiveness research in order to indicate what proportion of the variance in students’ success is attributable to schools, municipalities, or states. It is denoted here by \({\tau }_{2}\), \({\tau }_{3}\), \({\tau }_{4}\), respectively, for each hierarchical level. For such purpose, we extended to four hierarchical levels the latent variable approach formulae usually presented for two level models (Constate-Amores et al., 2020; Harvey Goldstein et al., 2002) considering here the intercept random parameter estimates. In this sense, the proportion of level two (across schools) variance not explained by model 2 (full model) is given by (1). Similarly, the proportion of level three (across municipalities) is given by (2) and the proportion of level four (across states) is given by (3), where the parameters are going to be replaced by the respective estimates,

$${\tau }_{2}=\frac{{\sigma }_{u0}^{2}}{{3.29+\sigma }_{u0}^{2}+{\sigma }_{v0}^{2}+{\sigma }_{w0}^{2}}\times 100\%,$$
$${\tau }_{3}=\frac{{\sigma }_{v0}^{2}}{{3.29+\sigma }_{u0}^{2}+{\sigma }_{v0}^{2}+{\sigma }_{w0}^{2}}\times 100\%,$$
$${\tau }_{4}=\frac{{\sigma }_{w0}^{2}}{{3.29+\sigma }_{u0}^{2}+{\sigma }_{v0}^{2}+{\sigma }_{w0}^{2}}\times 100\%,$$

Ethical considerations.

Microdata used in this article were made available on the Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira’s webpage. Due to privacy protection restrictions subjects were previously pseudonymized.

Results and discussion

Tables 4 and 5 present the estimates (Est.) and standard errors (S.E.) of the fixed and random parameters of model 1 (grade repetition conditioned on students’ sociodemographic variables) and model 2 (full model), respectively, for every dataset from 2007 to 2017. The estimates show that the relationships among the students’ sociodemographic variables (such as gender and race/skin colour), SES, and probability successful trajectory are statistically significant at the level of 5%, with the exception of the estimate for indigenous students in 2015. In model 2, these relationships are conditioned on the interaction (and main effects) between performance and race/skin colour and on the school composition terms. In general, the estimates obtained from model 2 for such relationships are reduced in magnitude, but still support the sentence about the influence of individual sociodemographic attributes on the probability of grade repetition. Thus, both models suggest that female students had a higher probability of a successful trajectory, with increasing odds ratio over the decade. The odds ratio per gender ranged from a minimum of 1.6 (in 2009) to 1.8 (in 2017), i.e., the probability of non-repetition over the probability of repetition in the group of female students is close to twice that in the group of male students. Gender differences have been widely reported in the literature. Our results reveal that there are still relevant questions to explore. It seems that Brazilian educational system is unable to deal with these disparities and, surprisingly, the gap is increasing. Similar results were obtained in the USA showing that boys are more likely than girls to repeat a grade or more during primary/elementary education (Buchmann et al., 2008; Entwisle et al., 2007). In addition, according to the research developed by Martínez and Serna (2018), in mathematics, the gender gap in the USA is almost inexistent at the beginning of schooling, but between the first and fifth grades, it increases by approximately 61%. The explanation may be, at least in part, due to the differential school effectiveness by gender (e.g. Strand, 2010).

Table 4 Estimates of model for grade repetition conditioned on sociodemographic variables, from 2007 to 2017
Table 5 Full model parameter estimates, from 2007 to 2017

Regarding race/skin colour, in general, a White student was more likely to have a successful trajectory than a student from any other race/skin colour group. However, in general, we verified the decreasing trend in the gap over time. The odds ratio associated with Black students shows that they remained always at disadvantage. For Yellow and Indigenous students, the gap with White students was reduced until 2015, and then the coefficients slightly increased in 2017. In Brazil, race/ethnicity literature has a long record of findings pointing out inequality in education outcomes (Artes & Unbehaum, 2021; Silva & Hasenbalg, 2000; Soares & Alves, 2003). Most refer to dichotomies such as White/non-White or White/Black. Our results bring evidence on other minority groups that have been affected by discrimination.

The odds ratio associated with the students’ socioeconomic status declined in the period, from the maximum observed (in 2011) of 1.3 per SES unit to a minimum value of 1.2 (in 2017). The respective parameter estimates are 0.251 (S.E. = 0.002) and 0.213 (S.E. = 0.002). Considering the scale of this variable (0–10), the difference between a student with lower and higher SES on the probability of succeed still represents a sizeable educational inequality. According to our findings, there is a trend of progressive reduction of the relationship between SES and the probability of grade repetition. This confirms the narrowing social gap reported by Alves and Ferrão (2019) for the period 2007—2017, and also may be related to the evidence that the learning gains “for more advantaged Brazilian students are lower than for those coming from families with lower educational resources” (Carnoy et al., 2015, p. 451).

The answers to research questions 3 and 4 are provided by the results of model 2, presented in Table 5. The school socioeconomic composition estimates from 2007 to 2017 show a consistent decline, ranging from 0.109 to -0.085, while the school proportion of approval estimates tended to rise over the period, with a point estimate equal to 3.3 in 2017, representing an increase of 57% since 2009. The evidence appears to show that students’ success in primary education depended less and less on school composition regarding the students’ socioeconomic level, but the influence of the composition regarding the proportion of students who had never failed was reinforced over the decade. Regarding the association between school composition and the individual probability of success, our results confirm those obtained by Ferrão et al. (2017) with PISA data. The model suggests that students’ success in primary education depends on school composition regarding students’ socioeconomic level and also on the proportion of students who have never experienced grade repetition. Furthermore, the analysis suggests a decreasing influence of socioeconomic composition and an increasing influence of the proportion of approvals over time. This finding stands in line with the evidence that students’ achievement is influenced by the result of the school, in particular for “those students with a lower level of proficiency” (Riani et al., 2012; p. 635). Regarding the influence of SES school composition on students’ performance from the third to eight grade, similar results were reported by (Armor et al., 2018) who used three USA statewide (North Carolina, South Carolina, Arkansas) achievement datasets and found significant school SES effects when cross-sectional model were applied. They also report that such school SES effects largely disappear when value-added models were applied.

Interpreting and contextualizing our results in light of the Ferrão et al. (2018) recommendation reinforce unequivocally the argument that fighting against social and educational inequalities requires greater attention to the early years of schooling in order to the enhancement of socioeconomically disadvantaged subgroups, to minority groups such as self-declared Black students whose families have lower socioeconomic and educational level.

In fact, our results confirm the differential probability by race/skin colour. Regarding the interaction terms, it can be observed that in maths, an additional effect of race/skin colour on the probability of success exists, even after controlling for students’ performance in maths and reading. For example, in 2017, a self-declared Black student had his/her probability of successful trajectory reduced by, on average, 0.123 (logit scale) per standard deviation of performance in maths. This means that the 5% of Black high-performing students had more than 22% higher chance of being unsuccessful compared with their White colleagues, controlling for the main effects of race/skin colour and performance, gender, individual and overall school SES, and school proportion of approval. We can observe that, in general, such pattern did not change over the decade. For other groups of race/skin colour, a similar phenomenon exists, but the estimates show smaller magnitude. In reading, the effect of interaction terms is less pronounced than in mathematics.

It is difficult to explain such results. Why do Black students’ chances of success in school remain lower than those of White ones even after controlling for performance in maths or reading? We can only hypothesize based on some clues provided by Carvalho’s qualitative research (2005; 2009) in the city of São Paulo, Brazil. Regarding the relationship between students’ performance, race/skin colour, and gender, Carvalho observed that primary school teachers assessed children they perceived as Black more rigorously, independently of their level of family income, and they tended to consider the best-performing students to be White or to have lighter skin than the students’ own perception of their race/skin colour. In order to evaluate the students, the teachers “used repertoires and personal references, only relatively conscious, without fully realizing their arbitrary character, and thus reproduced values, ideas and symbols derived from socioeconomic hierarchy, gender and ethno-racial relations” (Carvalho, 2009, p. 838). Carvalho concluded that “we are not faced with a difference of learning but of behavior, along with a great lack of definition of evaluation criteria, which may be creating difficulties for boys, mostly blacks, who very soon construct an image of students unable to learn “ (Carvalho, 2009; p. 860).

The variance estimates show that the conditional probability of students’ success varied randomly across states, municipalities, and schools. According to Table 4, the variance across states more than doubled over the decade. However, the random estimates given by model 2 (Table 5) suggest that the unexplained variance reduction due to the inclusion of school composition terms and interactions is more prominent at level 4 than at other hierarchical levels. In fact, the linear predictor of model 2 contributes to reducing the residual variance per hierarchical level, in particular at the state level. The level-4 variance in model 2 is approximately half the estimate in model 1. Such reduction is due to the main effects and interactions for race/skin colour, as well as the composition variables. These results suggest that educational disparities across states are slightly increasing over time, and it seems that such disparities are based on sociodemographic characteristics of the population.

The variance partition coefficient or intraclass coefficient results are presented in Table 6. They suggest the variability of students’ success over time due the hierarchical structure of the educational system varies very little over time. At the level of schools it remains stable, at the level of municipalities it shows slight ups and downs, and at the level of states it shows a trend of increasing from 1% in 2007 to 3% in 2017.

Table 6 Variance partition coefficient for model 2 (full model) variance estimates

Conclusion, limitations, and further research

This study represents an attempt to understand the progress made by the Brazilian education system over the decade 2007–2017 towards increasing the educational equity. We applied logistic multilevel statistical models to Prova Brasil data in order to examine to what extent students’ sociodemographic characteristics were related to their schooling trajectory without failure in the 1st cycle of primary education, after controlling for individual and school composition variables, and how such relationships have changed over time. In general, our results suggest that there has been a decreasing fixed effect between individual sociodemographic characteristics and students’ success in primary education over the decade 2007–2017. We found narrowing gaps by socioeconomic status over time. The effect of students’ socioeconomic status declined in the period, but considering the index scale (0–10), the difference between a student with lower and higher SES on the probability of success still represents a sizeable educational inequality. As expected, the relationship between students’ performance and probability of grade promotion is positive, meaning that students with lower scores have a lower probability of a successful trajectory. The evidence sustains that there are differentials of learning in maths and reading by race/skin colour on the probability of students’ success, and that this pattern decreased until 2015. Regarding the interaction terms between performance and race/skin colour, our findings show an additional effect of race/skin colour on the probability of success, even after controlling for students’ performance in maths and in reading. Such interaction effect is more pronounced in mathematics than in reading. We also found evidence of a relationship between students’ gender and their probability of success in primary education, with an increasing trend in the odds ratio over the decade. In other words, female students have a higher probability of successful trajectory, with increasing odds ratio over the decade. Our results also provide evidence that students’ success in primary education depend on school composition regarding the students’ socioeconomic status and the proportion of students who have experienced grade repetition. Knowing that students’ success varies randomly across states, municipalities, and schools, our findings suggest that such educational disparities across states are increasing over time.

Prova Brasil data are used to calculate the IDEB, which in turn is an instrument used as results based accountability for schools, municipalities and states. One could hypothesize if Prova Brasil testing is affected by the phenomenon known as "grade inflation". If such phenomenon occurs in the Prova Brasil scores, the results presented in this article may be affected.

One important limitation of our study is that quantitative evidence comes from cross-sectional data modelling. Therefore, its inability to include measures of students’ prior achievement for more than one year constitutes an obstacle in order to better disentangle the school effects and students’ social origin. With longitudinal data, the use of value-added models would be a relevant asset. Data collected by INEP allow such methodological approach to be implemented (Ferrão et al., 2018; Ferrão, 2022a, 2022b). Although its implementation depends on the rules in place for data privacy protection, implying the use of the safety room in Brasília for microdata access. This entails a logistical requirement that is difficult to overcome without funding geared towards the purpose. We expect to be able to follow this topic of research in the near future. Given our findings on racial effects, the topic of school racial composition shall also be deeply investigated.

Nevertheless, our findings have major implications for researchers and policymakers alike. Too often education policies are not based on the best available research evidence. Since the evidence reported in this paper is based on ten-year of data, the patterns identified overcome the usual barrier in social science that is the lack of reproducibility of research and, in turn, its flawed use. Thus, the evidence presented in our work may be used to inform policy in its effort to reduce educational inequalities, and taking into account pertinent social, political, and economic realities at the level of country, states, municipalities and schools.

Availability of data and materials

The datasets analysed during the current study are available in the INEP—Instituto Nacional de Estudos e Pesquisas Educacionais Anísio Teixeira repository,


Download references


We thank Francisco Soares and anonymous referees for valuable comments and suggestions.


This paper is based on the second author’s postdoc fellowship activities, which took place at the University of Beira Interior, Portugal, financed by CAPES Foundation—Brazil (PVE88881.169888/2018-01) and supported in part by the project CNPq—Brazil (440172/2017-9). The first author was partially supported by CEMAPRE/REM—UIDB/05069/2020 FCT/MCTES through national funds.

Author information

Authors and Affiliations



Maria Eugénia Ferrão (MEF) and Maria Teresa Alves (MTA) conceptualized the study and conducted the literature review together. MTA mostly conducted the descriptive statistics, MEF mostly conduct the statistical modelling. They separately wrote parts of the first version of the manuscript. MEF and MTA interpreted the findings together. Both authors are accountable for the accuracy and integrity of the study. Both authors read and approved the final manuscript.

Corresponding author

Correspondence to Maria Eugénia Ferrão.

Ethics declarations

Ethics approval and consent to participate

Not applicable.

Consent for publication

The authors consent to the publication of the manuscript in Large-scale Assessments in Education.

Competing interests

None declared.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.



Let \(P\left({y}_{ijkl}=1\right)\) be the probability of a successful trajectory for student i, who attends school j, located in municipality k in federal state l, given by the equation and assumptions for model 2 as follows:

$$log\left[\frac{P\left({y}_{ijkl}=1\right)}{1-P\left({y}_{ijkl}=1\right)}\right]=\,{\beta }_{0jkl}+{\beta }_{1}{x}_{1\left(ijkl\right)}+\dots +{\beta }_{8}{x}_{8\left(ijkl\right)}+ {\beta }_{9}{x}_{7\left(ijkl\right)}*{x}_{2\left(ijkl\right)}+{\beta }_{10}{x}_{7\left(ijkl\right)}*{x}_{3\left(ijkl\right)}+{\beta }_{11}{x}_{7\left(ijkl\right)}*{x}_{4\left(ijkl\right)}+{\beta }_{12}{x}_{7\left(ijkl\right)}*{x}_{5\left(ijkl\right)}+{\beta }_{13}{x}_{8\left(ijkl\right)}*{x}_{2\left(ijkl\right)}+{\beta }_{14}{x}_{8\left(ijkl\right)}*{x}_{3\left(ijkl\right)}+{\beta }_{15}{x}_{8\left(ijkl\right)}*{x}_{4\left(ijkl\right)}+{\beta }_{16}{x}_{8\left(ijkl\right)}*{x}_{5\left(ijkl\right)}+{\beta }_{17}{x}_{17\left(ijk\right)}+{\beta }_{18}{x}_{18\left(ijk\right)}$$
$${\beta }_{0jkl}={\beta }_{0}+{u}_{0jkl}+{v}_{0kl}+{w}_{0l}.$$
$${u}_{0jkl}\sim N\left(\begin{array}{cc}0,& {\sigma }_{u0}^{2}\end{array}\right).$$
$${v}_{0kl}\sim N\left(\begin{array}{cc}0,& {\sigma }_{v0}^{2}\end{array}\right).$$
$${w}_{0l}\sim N\left(\begin{array}{cc}0,& {\sigma }_{w0}^{2}\end{array}\right).$$

In model 1, Eq. (4) reduces to the first line.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrão, M.E., Alves, M.T.G. The relationship between students’ socio-demographics and the probability of grade repetition in Brazilian primary education: is it decreasing over time?. Large-scale Assess Educ 11, 12 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: