Skip to main content

An IERI – International Educational Research Institute Journal

The relevance of the school socioeconomic composition and school proportion of repeaters on grade repetition in Brazil: a multilevel logistic model of PISA 2012


The paper extends the literature on grade repetition in Brazil by (a) describing and synthesizing the main research findings and contributions since 1940, (b) enlarging the understanding of the inequity mechanism in education, and (c) providing new findings on the effects of the school socioeconomic composition and school proportion of repeaters on the individual probability of grade repetition. Based on the analyses of empirical distributions and multilevel logistic modelling of PISA 2012 data, the findings indicate that higher student socioeconomic status is associated with lower probability of repetition, there is a cumulative risk of repetition after an early repetition, the school socioeconomic composition is strongly correlated with the school proportion of repeaters, and both are related to the individual probability of repetition. The results suggest the existence of a pattern that cumulatively reinforces the effects of social disadvantage, in which the school plays a central role.


Grade repetition is the practice of requiring a student who has been in a given grade level for a school year to continue at that level for another year. Grade repetition is a phenomenon that is found in many education systems, either in developed or developing countries (e.g. Brophy 2006). The international literature has shown that grade repetition is a major issue in the debate on how to improve education (Allen et al. 2009; Goos et al. 2013; Hill 2014). The causes and consequences of grade repetition in Brazil have been studied since the middle of the twentieth century (e.g. Almeida 1957; Freitas 1940, 1947). The scientific production and has been quite extensive and mainly published in Brazilian journals (e.g. Fernandes and Natenzon 2003; Freitas 2002; Gomes 2005; Riani et al. 2012; Silva and Davis 1993). For the first time in Brazil, and probably in Latin America, Freitas (1940) analysed census data in the form of students’ flow through the educational system, using the time series 1932–1939 of a 7-year-old child cohort. His analyses showed very high rates of repetition and dropout, i.e., the percentage of 1st grade repeaters varied from 57 to 61% (Freitas 1947). In the 80s, Klein and Ribeiro (1991) and Ribeiro (1991) reported the percentage of repeaters to be between 50 and 59%. Therefore, essentially, data had shown that fifty years after the first studies on the subject, grade repetition remained at the same severe level. Criticizing the indifference of educational leaders and administrators facing this situation, Ribeiro (op. cit. Ribeiro 1991) used the term “pedagogy of repetition” to classify the main problem in the Brazilian education system of that time. In fact, the Brazilian Ministry of Education was too slow to recognize the problem of repetition due to the inappropriate method applied for its quantification, which led them to conclude for so long that the major problem was instead student dropout (e.g., Gomes-Neto and Hanushek 1994). By the end of the 90s, according to the analysis of PNAD (Pesquisa Nacional por Amostra de Domicílios [National Household Sample Survey]) data, 96% of the 7–14 year-old population attended school (Ferrão et al. 2002a), and 44% of the students enrolled in school (at the primary, elementary or lower secondary classifications) were too old for their grade level (INEP 1999). In addition, there were significant regional factors among the data. For instance, the percentage of overage students was 62% in the Northeast region and 31% in the South. Furthermore, grade repetition is traditionally a phenomenon that is associated with socioeconomic status, with a higher incidence among self-declared black students as well as males. Analysis of data from the SAEB (Sistema de Avaliação da Educação Básica [System of Basic Education Assessment]) 2001 data regarding 4th grade students in the Southeast region identified 56% of repeaters among black students vs. 31% among whites, along with 41.3% of boys vs. 34.8% of girls (Ferrão et al. 2002a). Several authors have also noted that in every Brazilian region, the group of self-declared black students is lower-performing compared to any other group of race/skin colour (e.g. Ferrão et al. 2001; Laros 2012; Soares and Alves 2003).

Moreover, descriptive analyses of the SAEB data showed a considerable percentage of 4th grade students who reported that they had already repeated one or more times over their very short schooling trajectory. Considering SAEB (2003), Klein (2006) reported that approximately 60% of 4th graders had no educational delay, while the percentage was 55% in the 8th grade.

For policy purposes, the comparison of individual student performance between repeaters and non-repeaters is a central issue. Specifically, if grade repetition represented the additional time needed for the cognitive development of a given student who, ultimately, after 2 years in the same grade performed at least at the same level of a non-repeater, then this would justify the additional time spent in the same grade level. According to this assumption, repeating a grade could improve the academic achievement of lower-performing students by exposing them to additional teaching. However, the literature to date has failed to confirm this assumption. The performance averages presented by Klein (2006) and illustrated in Fig. 1 illustrate the reduction in student performance relative to the schooling delay (number of years).

Fig. 1
figure 1

Source: adapted by the authors from the estimates presented by Klein (2006; Table 22)

Relationship betwwen student’s performance in Maths and number of years late at school

These results corroborate those obtained by previous research (Barbosa and Fernandes 2001; Ferrão and Beltrão 2001; Ferrão et al. 2001, 2002a, b) based on multilevel modelling of the SAEB 1997, 1999, 2001 data series. Note that by the year of 1999, most of overage students were due to grade repetition. Ferrão and Beltrão (2001), Ferrão et al. (2002b) demonstrated that the marginal effect of age-grade lag on student’s performance (assessed by standardized tests and scales fitted with item response models), varies randomly across schools according to a second-degree polynomial function. In other words, the evidence from the predictive equation suggests that after a year of repetition, the students’ performance can be between 5 and 45 points lower (compared to the always-promoted students), depending on the school he/she attended. These findings confirm the general conclusion that, even if repetition could contribute to students’ learning, the gain obtained would not be enough because their achievement stills remains below the expected mean for that grade. This type of evidence was also stated by Gomes-Neto and Hanushek (1994), who concluded “this [grade repetition] is an expensive policy, and it is quite likely that there are alternative and less costly ways to improve achievement”.

Concerning the early assessment of students at risk for repeating, (Ferrão et al. 2002b, p.58; Ferrão and Fernandes 2003) reported that most teachers of 4th and 8th graders, who had fully taught the syllabus to the date of application of SAEB-2001, taught in classes with a lower proportion of repeating students. Teachers who reported teaching less than half or slightly more than half of the syllabus had a higher proportion of repeating students in their classes. Those authors stated “here it seems to be the educational deficit cumulatively associated to repeating students” and considered that it “is necessary, timely and with continuity that some reinforcement programs be implemented in these classes so that the planned syllabus can be fully taught for, and learned by, all students”. The effect of class composition on student achievement has been evaluated in the literature for a long time (e.g., Hoxby 1998). Research findings suggest that the type of criteria used by school principals and leaders for class composition matters. However, research conducted on the topic in Brazil has been inconclusive about what type of criteria better promote the quality of learning for all students (c.f. Alves and Soares 2007; Ferrão et al. 2001). Based on SAEB’s 2001 data, Laros (2012) mentioned that the percentage of repeaters in a class is the most important school-level variable to explain the variability of student performance in Portuguese (mother tongue). Because the multilevel model applied by those authors consisted of two levels, i.e., students clustered in schools and the percentage of repeaters in class representing a variable classroom attribute, the actual effect of class composition on student performance is not yet fully understood.

A literature review on primary education policies over 15 years conducted by Gomes (2005) has shown that accelerated learning projects presented positive results concerning non-retention, yet strong resistances were found as long as educational changes were sought. Fernandes (2004) suggested that the school context, in terms of violence and discipline, influences the decision of school leaders towards the adoption of non-repetition policies. Based on direct contact and reports on the reality of many schools organized in cycles, Alavarse (2009) said that the polarization of pros and cons of automatic promotion or learning cycles is more rhetorical than empirical. The results of the cumulative influence of many programs and changes may be observed in official education statistics. In fact, the performance of the educational system has sharply improved. In 2010, the annual rate of approval, defined as the number of students promoted to the next grade divided by the total number of students enrolled (×100%), was 95.8% for 1st grade, 88.9% for 2nd, 86.2% for 3rd, and 90% in the remaining years of primary education (INEP 1999). From 2007 to 2013, the greatest improvement occurred at the 2nd grade level, suggesting a pattern of repetition depending on the cycles of educational trajectories. For instance, the lowest approval rate occurred for the 6th grade, coinciding with the transition from primary to lower secondary school (see Fig. 2).

Fig. 2
figure 2

Rate of approval per grade, 2007–2013. Source: INEP (2015) data, elaborated by authors

Despite previous efforts, even schools organized by learning cycles have failed to meet the objectives pursued by their mentors. Several authors have additionally addressed the topic of grade repetition in terms of parents’ acceptance (Jacomini 2010) and its effect on student performance (e.g., Carvalho and Firpo 2014; Ferrão and Beltrão 2001; Ferrão et al. 2001; Ferrão et al. 2002a; Koppensteiner 2014; de Riani et al. 2012). For example, Carvalho and Firpo (2014) evaluated the impact of a non-repetition policy on the distribution of students’ academic achievement in elementary Brazilian public schools. Their results revealed that grade repetition did not seem to increase students’ efforts, particularly for older students, corroborating results obtained by other authors. Thus, based on the short-term effects, the cumulative research findings give no support in favour of grade repetition as the educational solution for students who failed to meet learning objectives within a given grade level. Nonetheless, most public schools in Brazil have continued the practice. These days, grade repetition is the main issue concerning the guarantee of quality education provided to all students. Brazil reached the full coverage of education for the 7–14-year-old population in the 90s. The issue now is a matter of educational effectiveness of the Brazilian educational system.

This paper adds to the existing literature in a number of ways: (1) we show that early repetition is associated with late repetition; (2) we demonstrate a pattern that strongly contributes to the reinforcement of the cumulative effects of social disadvantage; and (3) we estimate the extent to which school socioeconomic characteristics and peer composition influence the individual probability of repetition. In this way, we contribute to answering the following research questions: How much is the probability of repetition dependent upon individual and contextual factors of social disadvantage? How much early repetition is likely to influence late repetition? Are such probabilities related to the proportion of school-level repeaters within a given grade? Thus, we calculate the student conditional distribution of repetition in lower secondary education given the repetition in primary education, calculate the relative risk of early repetition comparing the first decile of socioeconomic status to the top decile, and finally, apply a multilevel logistic model to investigate these relationships while controlling for students’ demographic variables, school composition in terms of socioeconomic status and concentration of repeaters, and other school characteristics that are beyond school educational policy and intervention.

The rest of the paper proceeds as follows: “PISA 2012 data and variables” section describes data and variables. “Multilevel statistical modelling” section specifies the statistical methods in use. “Results and discussion” section reports empirical results and provides discussion.

PISA 2012 data and variables

We used the Programme for International Student Assessment (PISA) 2012 data set (OECD 2014). PISA is a cross-sectional complex survey involving multistage sampling, unequal sampling probabilities and stratification. The target population in each of the 65 PISA 2012 participating countries consisted of 15-year-old students attending educational institutions in grade 7 and higher. It was a two-stage stratified sample design where the primary sampling unit consisted of schools having 15-year-old students. Schools were sampled systematically from the school sampling frame, with probabilities proportional to a measure of the school size, which was a function of the estimated number of PISA-eligible 15-year-old students enrolled in the school. The second sampling unit contained students within the sampled schools. For each country, a target cluster size of typically 35 students was set, so that from each list of students a sample of 35 students was selected with equal probability. For lists of fewer than 35 students, all of the students on the list were selected (OECD 2014, p. 66).

Descriptive statistics based on the valid cases of the outcome variable and student and school characteristics are listed in Table 1. The number of Brazilian students that participated in PISA 2012 was 19 204. The table shows that for the outcome variable, grade repetition, the percentage of students that repeated at least once was 37%. Regarding students’ demographic characteristics, Table 1 also shows that 48% of the 15-year old students were male, and the average age of students who participated in PISA 2012 was 15.9 years (SD = 0.3). The average age at which students started ISCED 1 was 7.2 years (SD = 2.3). Eighty-one percent of students had attended ISCED0 or had a pre-school education. The percentage of students who reported grade repetition during <ISCED 1> was 22%, while for ISCED 2 the percentage was 20%.Footnote 1 The highest occupational status of the parents (HISEI) (Ganzeboom 2010) in Brazil varied between 11 and 89 points, with average 42.1 and standard deviation of 22.

Table 1 Student and school demographics for Brazil

Seventy-one percent of students attended schools in which there was no diverse ethnic background, and 87% of students attended public schools. The variable “Learning Hindrance” represents the perception of the student on ethnic diversity at school and it is used in the model as proxy for ethnic diversity at school.

The HISEI PISA school average was 41.9 (SD = 12.3). Finally, the school proportion of repeaters was 0.38 on average, while the average school proportion of repeaters at ISCED1 was 0.23 and 0.21 at ISCED2. All of the results reported take into account PISA’s complex survey design.

Multilevel statistical modelling

Given the research questions, we are most interested in who repeats a grade most frequently and in the multilevel logistic modelling of student repetition to estimate the relationship between the school composition and peer effects on the student’s probability of repetition. Multilevel modelling is especially suitable for the purpose of this paper because it accounts for the hierarchical structure of students within schools while avoiding aggregation bias and the mis-estimation of standard errors (Bryk and Raudenbush 1992; Ferrão 2003; Goldstein 2003). In fact, selected students attending the same school cannot be considered as independent observations because they are usually more similar to one another than to students attending other schools. Multilevel modelling accounts for that dependency by partitioning the total variance in the data into variation between and within school-level units. The variance partition coefficient, also known as the intra-class correlation coefficient in the literature, quantifies the proportion of the total variance accounted for at each hierarchical level. Goldstein et al. (2002) presented four different methods to measure the variance partition coefficient when the response variable is discrete.

The PISA sampling design consists of unequal probabilities of selection at any level of the multistage sampling. If standard multilevel modelling is used without incorporating such a sampling design, the estimators of parameters may be biased. Pfeffermann et al. (1998) discussed the use of sampling weights to rectify this problem in the context of continuous response variables. Those authors considered two different approaches. The first approach uses the selection of probabilities, while the second approach scales the weights of selection so that the scaled PWIGLS estimators are presented with two different scaling methods. The authors recommended “the weighted scaling method 2 as a means of reducing bias caused by informative sampling. In our simulation study these estimators perform fairly well and the associated variance estimators display remarkably little bias. […] the estimators proposed perform very well for all the sampling schemes and estimators considered”. They also mentioned that “It is often possible to control for such bias by including relevant ‘design variables’ as covariates in the multilevel model, but this may not be possible because of data availability or not be desirable for scientific reasons”. In this paper, we included the variable “Total School Enrolment” (SCHSIZE) as a covariate and measure of the school size, but the estimates obtained were not statistically significant. Thus, we used the estimation procedure implemented in MLwiN version 2.31, which experimentally extends the scaling method A presented by Pfeffermann et al. (1998) for binary responses, with a robust or ‘sandwich’ estimator for standard errors. We also used the scaling methods presented by Rabe-Hesketh and Skrondal (2006), which are implemented in Stata version 12.1. The estimates obtained were nearly the same and from the inferential perspective both methodological approaches lead to the same conclusion at the level of significance of 5%.

We fitted two models with the logit link function, depending on the set of covariates included in the linear predictor. The “Appendix” section contains the equations for each estimated model. Model 1 included level-one variables, such as gender, age and highest parental occupational status (HISEI), as a proxy for student’s socioeconomic status, and the level-two variables related to the school composition of socioeconomic status (school average of HISEI), school ownership, and percentage of non-existence of diverse ethnic backgrounds (learning no hindrance). Model 2 included, in addition, the proportion of students per school who had repeated at least once. We used also the index of economic, social and cultural status (ESCS), as proxy for student’s socioeconomic status, instead of HISEI. The results reported in the next section remain the same no matter the proxy used. Furthermore, the correlation between HISEI and ESCS is 0.8.

Results and discussion

Conditional distribution of repetition and the social disadvantage

The conditional distributions of repetition given the quintile of HISEI presented in Table 2 show that a higher socioeconomic status was related to a lower probability of repetition. Regardless of the grade and number of years of repetition, at the 1st quintile of HISEI, the probability was 0.26, while in the top quintile, it was 0.15. The detailed comparison of these probabilities by educational level ISCED 1, which are 0.33 and 0.11, shows that the relative risk of repetition is three times higher in the disadvantaged group. In other words, a student from the more socioeconomically disadvantaged group was three times more likely to repeat than his/her peer from the advantaged group.

Table 2 Conditional distributions of repetition given student’s socioeconomic level

The distribution of late repetition (at ISCED 2) conditional on early repetition (at ISCED 1) presented in Table 3 shows that in the group of students who were always promoted throughout the ISCED 1 level of education, 91% remained as such, while 9% repeated a grade at ISCED 2. Addressing the group of students that had experienced early repetition, those percentages were 62 and 38% at ISCED 1 and ISCED 2, respectively, suggesting a cumulative risk of failure after early repetition.

Table 3 Distribution of repetition at ISCED 2 given repetition at ISCED 1

In other words, the probability of repeating a grade at ISCED 2 was 4.3 times greater when the student had repeated at ISCED 1 than when the student had not. The respective odds ratio was 6.2. If grade repetition was an effective way to overcome learning deficits at an early age, both the ratio of probabilities and the odds ratio would be closer to 1 instead of 4.3 and 6.2, respectively.

Estimates of the multilevel logistic model

Table 4 presents the estimates (Est.) and standard errors (S.E.) of the fixed and random parameters of Model 1 and Model 2 obtained in MLwiN. Model 1 and Model 2 differ only in the inclusion of the variable school proportion of repeaters. The estimates were also obtained using Stata for Models 1 and 2. Since the results obtained in STATA were similar to the ones estimated in MLwiN, leading to a similar conclusion at a level of significance of 5% we opted to present only the results obtained in MLwiN.

Table 4 Logistic multilevel model estimates

The table shows that the relationships between the student variables, such as gender, age, socioeconomic status and probability of repetition, were statistically significant, and this remained so according to the evidence given by Model 2. The odds ratio per gender was 1.5, i.e., the probability of repetition divided by the probability of non-repetition in the group of male students was 1.5 times larger than in the group of female students. The odds ratio was also 1.5/year of delay and very close to one per unit of HISEI.

The fixed parameter of school socioeconomic composition was significantly different from zero, showing a negative association with the individual probability of repetition. That is, a student in a school that serves disadvantaged population was more likely to be a repeater than if he/she was in a school attended by more affluent students. Neither the type of school (public vs. private) nor the absence of diverse ethnic backgrounds at school showed statistically significant effects on the probability of repetition when controlled by all other variables.

Interpreting the estimates related to the non-existence of diverse ethnic backgrounds at school requires deeper reflection. In the beginning, considering the previous Brazilian studies about race and inequalities in school performance already mentioned, we expected the parameter related to Learning Hindrance to be statistically significant. The null hypothesis, which tests the relationship between Learning Hindrance and the probability of repetition is equal to zero was not rejected at the level of 5%. The school attribute of ethnic diversity is not the same as the student’s perception on ethnic diversity.

In the same way, the variable “sch_public” (school type) was not statistically significant. Thus, the estimates suggest that the probability of grade repetition was not a matter of school type in this sample. In other words, changing from public to private schools does not overcome the problem of repetition. As mentioned in the introduction, the average of student performances in large scale assessments, such as the SAEB, is higher in private than in public schools (Barbosa and Fernandes 2000; Ferrão et al. 2001; Soares et al. 2001). Moreover, the literature suggests that school type is associated with the socioeconomic school characteristics, the quality of school infrastructure and equipment, and other intra-school variables (Ferrão and Fernandes 2003). Regarding the Southeast region, Barbosa and Fernandes (2001) showed that while controlling for all such intra-school variables, the type of school was not statistically significant at the 5% level. In addition, public schools are over represented by students from poorer sectors of society—even when located in medium or upper class neighbourhoods—whereas the majority of students from families with higher purchase power often attend private schools (e.g., Alves and Soares 2007). Because grade repetition is related to student achievement, these findings add to the existing literature about school type. Hence, we found that the individual probability of repetition is likely to be more related to school socioeconomic composition and peers effects, no matter if the school is private or public. In this sense, Table 4 shows that the school socioeconomic composition (school_hisei) and the school proportion of repeaters (school_proprep) had strong influence on the individual probability of repetition. The latter variable is the most relevant for explaining level two variance of grade repetition. When the model includes the variable “school_proprep” the coefficient associated with “school_hisei” becomes statistically equal to zero and the same happens with the level two variance. Figure 3 plots the predictive values of the probability of repetition by the school proportion of repeaters. We can observe more predictive uncertainty in the middle of the scale than in the extremes. The odds ratio associated with the estimate of school proportion of repeaters was almost 214, which was influenced by the predictive probabilities at the extremes of the scale. For instance, if the predictive probability of repetition was 0.917 at a school with 100% repeaters and the predictive probability was 0.049 at a school with 0% of repeaters, the resulting odds ratio would be 214.

Fig. 3
figure 3

Predictive probability of repetition given the school proportion of repeaters. Source: authors’ computations


In this paper the phenomenon of grade repetition since the 40s in Brazil was studied. We demonstrated the association between the students’ socioeconomic status and the probability of grade repetition, that there is a cumulative risk of future repetition after an early repetition. All of these results are associated with one another and suggest a pattern that reinforces the cumulative effects of social disadvantage. We also found that the individual probability of repetition is likely to be more related to school socioeconomic composition and peers effects, no matter if the school is private or public. In other words, the results suggest that the socioeconomic composition and the school proportion of repeaters have a strong influence on the individual probability of repetition. The findings indicate that a student who is in a school with large percentage of repeaters is more likely to repeat, showing the selectivity power that the culture of repetition might have on individual students throughout their formal education. Perhaps certain schools tend to use the practice of grade repetition more than others, confirming that the “pedagogy of repetition” (Ribeiro 1991) still exists.

The quantitative evidence provided is based on PISA 2012 data modelling with the assumption of missing completely at random (Little and Rubin 2002), an assumption which may not be realistic. For now, these results should be taken with caution for policy and practice purposes. Further research is needed to provide guidance to schools concerning other school practices and initiatives that may help explain the variability of repetition across schools. To do so, we must conduct complementary analyses with complex and large-scale data that are collected every 2 years by the Brazilian educational evaluation system. Retention practice varies greatly across countries, thus similar analysis of countries other than Brazil may shed light on the relation of schools’ socioeconomic composition and the proportion of repeaters with the individual probability of grade repetition. This would help identify educational practices and policies that can be addressed in different countries to tackle the phenomenon of grade repetition.


  1. The computations take in consideration the complex design of the PISA survey. All estimates are computed with the IDB Analyzer ( Standard error (S.E.) corresponds to the square root of the sampling variance.


  • Alavarse, O. M. (2009). A organização do ensino fundamental em ciclos: algumas questões. Revista Brasileira de Educação, 14(40), 35–50. doi:10.1590/S1413-24782009000100004.

    Article  Google Scholar 

  • Allen, C. S., Chen, Q., Willson, V. L., & Hughes, J. N. (2009). Quality of research design moderates effects of grade retention on achievement: a meta-analytic, multi-level analysis. Educational Evaluation and Policy Analysis, 31(4), 480–499. doi:10.3102/0162373709352239.Quality.

    Article  Google Scholar 

  • Almeida, J. (1957). Repetência ou promoção automática? Revista Brasileira de Estudos Pedagógicos, 27(65), 3–15.

    Google Scholar 

  • Alves, M. T. G., & Soares, J. F. (2007). Efeito-escola e estratificação escolar: O impacto da composição de turmas por nível de habilidade dos alunos. Educação Em Revista, 45, 25–59. doi:10.1590/S0102-46982007000100003.

    Article  Google Scholar 

  • Barbosa, M. E. F., & Fernandes, C. (2000). Modelo multinível: uma aplicação a dados de avaliação educacional. Estudos Em Avaliação Educacional, 22, 135–153.

    Article  Google Scholar 

  • Barbosa, M. E. F., & Fernandes, C. (2001). A escola brasileira faz diferença? Uma investigação dos efeitos da escola na proficiência em matemática dos alunos da 4a série. In C. Franco (Ed.), Avaliação, Ciclos e Promoção na Educação (pp. 155–172). Porto Alegre: Artmed Editora.

    Google Scholar 

  • Brophy, J. (2006). Grade repetition. Paris: UNESCO.

  • Bryk, A., & Raudenbush, S. W. (1992). Hierarchical linear models: Applications and data analysis methods. Thousand Oaks, CA: SAGE Publications.

    Google Scholar 

  • Carvalho, S., & Firpo, S. (2014). O regime de ciclos de aprendizagem e a heterogeneidade de seus efeitos sobre a proficiência dos alunos. Economia Aplicada, 18(2), 199–214. doi:10.1590/1413-8050/ea374.

    Article  Google Scholar 

  • de Riani, J., Silva, L. R., Da, V. C., & Soares, T. M. (2012). Repetir ou progredir?uma análise da repetência nas escolas públicas de Minas Gerais. Educação E Pesquisa, 38(3), 623–636. doi:10.1590/S1517-97022012000300006.

    Article  Google Scholar 

  • Fernandes, C. O. (2004). Escolas em ciclos: Particularidades evidenciadas a partir dos dados do Saeb. Estudos Em Avaliação Educacional, 15(30), 83–106.

    Article  Google Scholar 

  • Fernandes, R., & Natenzon, P. E. (2003). A evoluçao recente do rendimento escolar das crianças brasileiras: Uma reavaliação dos dados do SAEB. Estudos Em Avaliação Educacional, 28, 3–22.

    Article  Google Scholar 

  • Ferrão, M. E. (2003). Introdução aos modelos de regressão multinível em educação. Campinas: Komedi.

    Google Scholar 

  • Ferrão, M. E., & Beltrão, K. (2001). Tracing schools which do not penalise over age students. In 27th Annual conference of the international association for educational assessment. Rio de Janeiro.

  • Ferrão, M. E., Beltrão, K., Barbosa, M. L., & Santos, D. (2002a). Aluno repetente: perfil, condições de escolarização e identificação dos fatores sociais. Brasília: Relatório técnico.

    Google Scholar 

  • Ferrão, M. E., Beltrão, K. I., Fernandes, C., Santos, D., Suárez, M., & Andrade, A. C. (2001). O SAEB–Sistema Nacional de Avaliação da Educação Básica: objetivos, características e contribuições na investigação da escola eficaz. Revista Brasileira de Estudos de População, 18, 111–130.

  • Ferrão, M. E., Beltrão, K. I., & Santos, D. (2002b). Políticas de não-repetência e a qualidade da educação: evidências obtidas a partir da modelagem dos dados da 4asérie do SAEB-99. Estudos Em Avaliação Educacional, 26, 47–73.

    Article  Google Scholar 

  • Ferrão, M. E., Beltrão, K. I., & Santos, D. P. (2002c). O impacto de políticas de não-repetência sobre o aprendizado dos alunos da 4asérie. Pesquisa E Planejamento Econômico, 32(3), 495–514.

    Google Scholar 

  • Ferrão, M. E., & Fernandes, C. (2003). O efeito-escola e a mudança -dá para mudar? Evidências da investigação brasileira. REICERevista Electrónica Iberoamericana Sobre Calidad, Eficacia Y Cambio En Educación, 1(1).

  • Freitas, M. A. T. (1940). A dispersão demográfica e escolaridade. Revista Brasileira de Estatística, 1(3), 497–527.

    Google Scholar 

  • Freitas, M. A. T. (1947). A escolaridade média no ensino primário brasileiro. Revista Brasileira de Estatística, 8(30/31), 395–474.

    Google Scholar 

  • Freitas, L. C. (2002). A internalização da exclusão. Educação & Sociedade, 23(80), 299–325. doi:10.1590/S0101-73302002008000015.

    Google Scholar 

  • Ganzeboom, H. (2010). A new International Socio-Economic Index [Isei] of occupational status for the International Standard Classification of Occupation 2008 [Isco-08] constructed with data from the ISSP 2002–2007. In Annual Conference of International Social Survey Programme. Lisbon.

  • Goldstein, H. (2003). Multilevel statistical models (3rd ed.). London: Edward Arnold.

    Google Scholar 

  • Goldstein, H., Browne, W., & Rasbash, J. (2002). Partitioning variation in multilevel models. Understanding Statistics, 1(4), 223–231. doi:10.1207/S15328031US0104_02.

    Article  Google Scholar 

  • Gomes, C. A. (2005). Desseriação escolar: Alternativa para o sucesso? Ensaio: Aval.Pol. Públ. Educ., 13(46), 11–38. doi:10.1590/S0104-40362005000100002.

    Google Scholar 

  • Gomes-Neto, J. B., & Hanushek, E. A. (1994). Causes and consequences of grade repetition: Evidence from Brazil. Economic Development and Cultural Change, 43(1), 117. doi:10.1086/452138.

    Article  Google Scholar 

  • Goos, M., Van Damme, J., Onghena, P., Petry, K., & de Bilde, J. (2013). First-grade retention in the Flemish educational context: Effects on children’s academic growth, psychosocial growth, and school career throughout primary education. Journal of School Psychology, 51(3), 323–347. doi:10.1016/j.jsp.2013.03.002.

    Article  Google Scholar 

  • Hill, A. J. (2014). The costs of failure: Negative externalities in high school course repetition. Economics of Education Review, 2524, 91–105. doi:10.1016/j.econedurev.2014.10.002.

    Article  Google Scholar 

  • Hoxby, C. M. (1998). The effects of class size and composition on student achievement: New evidence from natural population variation (NBER Working Paper Series No. 6869). Cambridge, MA.

  • INEP. (1999). SAEB 97 Primeiros Resultados. Brasília.

  • Jacomini, M. A. (2010). Por que a maioria dos pais e alunos defende a reprovação? Cadernos de Pesquisa, 40(141), 895–919. doi:10.1590/S0100-15742010000300012.

    Article  Google Scholar 

  • Klein, R. (2006). Como está a educação no Brasil? O que fazer? Ensaio: Aval.Pol. Públ. Educ., 14(51), 139–172. doi:10.1590/S0104-40362006000200002.

    Google Scholar 

  • Klein, R., & Ribeiro, S. C. (1991). O censo educacional e o modelo de fluxo: o problema da repetência. Revista Brasileira de Estatística, 52(197/198), 5–45.

    Google Scholar 

  • Koppensteiner, M. F. (2014). Automatic grade promotion and student performance: Evidence from Brazil. Journal of Development Economics, 107, 277–290. doi:10.1016/j.jdeveco.2013.12.007.

    Article  Google Scholar 

  • Laros, J. A. (2012). Fatores associados ao desempenho escolar em Português : um estudo multinível por regiões. ENSAIO: Aval.Pol.Públ.Educ., 20(77), 623–646.

  • Little, R. J. A., & Rubin, D. B. (2002). Statistical analysis with missing data (2nd ed.). New Jersey: John Wiley & Sons Inc.

    Google Scholar 

  • OECD. (2014). PISA 2012 Technical report Programme for International Student Assessment (December 2014). Paris.

  • Pfeffermann, D., Skinner, C. J., Holmes, D. J., Goldstein, H., & Rasbash, J. (1998). Weighting for unequal selection probabilities in multilevel models. Journal of the Royal Statistical Society: Series B, 60(1), 23–40.

    Article  Google Scholar 

  • Rabe-Hesketh, S., & Skrondal, A. (2006). Multilevel modelling of complex survey data. Journal of the Royal Statistical Society. Series A: Statistics in Society.. doi:10.1111/j.1467-985X.2006.00426.x.

    Google Scholar 

  • Ribeiro, S. C. (1991). A pedagogia da repetência. Estudos Avançados, 12(5), 7–21. doi:10.1590/S0103-40141991000200002.

    Google Scholar 

  • Silva, R. N., & Davis, C. (1993). É proibido repetir. Estudos em Avaliação Educacional. Estudos Em Avaliação Educacional, 7, 5–44.

    Article  Google Scholar 

  • Soares, J. F., & Alves, M. T. (2003). Desigualdades raciais no sistema brasileiro de educação básica. Educação E Pesquisa, 29(1), 147–165.

    Article  Google Scholar 

  • Soares, J. F., Cesar, C. C., & Mambrini, J. (2001). Determinantes de Desempenho dos alunos do ensino básico brasileiro: Evidências do SAEB de 1997. In C. Franco (Ed.), Avaliação, Ciclos e Promoção na Educação (pp. 121–153). Porto Alegre: Artmed Editora.

    Google Scholar 

Download references

Authors’ contributions

All authors conceptualized together the structure and the issue of the paper. MEF drafted the theoretical background, the multilevel statistical modelling as well as the conclusion. PMC drafted the section of PISA 2012data variables and contributed with comments to other sections. MEF and DASM drafted the results and discussion together. All authors read and approved the final manuscript.


Maria Eugénia Ferrão was partially supported by the Project CEMAPRE—UID/MULTI/00491/2013 financed by FCT/MEC through national funds. Daniel Matos was supported by CAPES Foundation, Ministry of Education of Brazil (Post-Doctoral Scholarship, process number 6196/14-4), and by the Federal University of Ouro Preto.

Competing interests

The authors declare that they have no competing interests. PMC declares that the views expressed may not in any circumstances be regarded as stating an official position of the European Commission.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Maria Eugénia Ferrão.

Appendix—model equations

Appendix—model equations


Y: student grade repetition,

\( Y\sim Binomial\left( {1, P\left( {y_{ij} = 1} \right)} \right) \),

P(y ij  = 1) is given by Eq. (1) for Model 1,

$$ P\left( {y_{ij} = 1} \right) = \left( {1 + exp\left( { - \left( {\beta_{0} + \beta_{1} {\text{boy}}_{ij} + \beta_{2} {\text{age}}_{ij} + \beta_{3} {\text{hisei}}_{ij} + \beta_{4} {\text{school}}\_{\text{hisei}}_{j} + \beta_{6} {\text{school}}\_{\text{public}}_{j} + \beta_{7} {\text{school}}\_{\text{no hindrance}}_{j} + u_{0j} } \right)} \right)} \right)^{ - 1} $$

where \( u_{oj} \sim N\left( {0, \sigma_{u}^{2} } \right) \).

We also have (Y) = P(y ij  = 1), \( VAR\left( {Y|P\left( {y_{ij} = 1} \right) } \right) = P\left( {y_{ij} = 1} \right)\left( {1 - P\left( {y_{ij} = 1} \right) } \right) \).

All remain for Model 2, with Eq. (2) instead of Eq. (1),

$$\begin{aligned} P\left( {y_{ij} = 1} \right) &= \left( 1 + exp \left( - \left( \beta_{0} + \beta_{1} {\text{boy}}_{ij} + \beta_{2} {\text{age}}_{ij} + \beta_{3} {\text{hisei}}_{ij} + \beta_{4} {\text{school}}\_{\text{hisei}}_{j} + \beta_{7} {\text{school}}\_{\text{proprep}}_{j} \right. \right. \right. \\ & \left. \left. \left. \quad + \beta_{6} {\text{school}}\_{\text{public}}_{j} + \beta_{7} {\text{school}}\_{\text{no hindrance}}_{j} + u_{0j} \right) \right) \right).^{ - 1} \end{aligned}$$

Rights and permissions

Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (, which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Ferrão, M.E., Costa, P.M. & Matos, D.A.S. The relevance of the school socioeconomic composition and school proportion of repeaters on grade repetition in Brazil: a multilevel logistic model of PISA 2012. Large-scale Assess Educ 5, 7 (2017).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: