Spatial variations of school‐level determinants of reading achievement in Italy

Introduction A well-educated and well-trained population is crucial for the social and economic well-being of a country. Improving students’ achievement can contribute to both economic growth and social development. In recent years, the UN Member States and the European Union have emphasized the necessity to strengthen the educational system promoting efficiency and equity (European Union 2009). In particular, Goal 4 of the 2030 Agenda for Sustainable Development launched by UN Member States in 2015 aims to “ensure an inclusive and equitable quality education and promote lifelong learning opportunities for all”. Equity in education means that the school system is able to offer to each student the same opportunities to achieve the best possible outcome, in terms of academic skills, regardless of gender, ethnic origin, socioeconomic Abstract

status and where one lives. To reach this goal, monitoring the educational outcome and understanding the factors behind students' and schools' results have become paramount, accentuating the importance assumed by the large-scale assessments, able to guarantee comparable and reliable data on students' achievement. In the last decade, large-scale national and international assessments have been widely used to measure what students know and can do in order to provide an image of the school system's status and are considered as a benchmark for policymakers for planning reforms to improve the performance of the school system (Breakspear 2012;Fullan 2009).
Student's achievement is a complex phenomenon and, as it has been widely demonstrated in the literature. Although student performances are primarily influenced by individual characteristics, the impact of the socioeconomic and cultural environment is also fundamental (Bakker et al. 2007;Marks 2006;Teodorovic 2012). The socioeconomic condition of the territory is central to understand the students' outcome since it affects the family background but, on the other side, this effect is reciprocal: the educational outcome of a country influences the development of the country. Several researchers demonstrated that socioeconomic inequality tends to increase in presence of inequality in educational outcomes (Checchi and Peragine 2010). Rodriguez-Posè and Tselios (2009) demonstrated that education can impact on economic inequality at regional levels. Since the human capital plays a central role for the socio-economic development of countries (Hanushek and Kimko 2000;Hanushek and Woessman 2009;Hanushek and Woessmann 2011), it is important to understand which factors contribute to the divergence in the educational outcome among the regions.
It is well known that one of the main characteristics of the Italian school system is the geographical cleavage between North and South. National and international large scale assessments reveal that North of Italy outperforms students in both Central and South of Italy, highlighting above all the worrying situation of students in South Italy starting from the lower secondary school (INVALSI 2018(INVALSI , 2019OECD 2016). Whereas the Northern regions are characterized by small variability in performance, in the South the variability within and between region is high, suggesting a complex and manifold reality (Benadusi et al. 2010;Bratti et al. 2007). Benadusi et al. (2010) demonstrated that one of the main factors for inequality in student scholastic performance is the gap among Italian macro-areas (North-West, North-East, Centre, South-West-Islands and South-East). Panichella and Triventi (2014) confirmed the crucial role played by the geographical factors in the exacerbation of educational inequalities focusing on educational careers. Several studies analysed the strong variations across macro-geographical areas in the levels of students' performance. Agasisti and Cordero-Ferrara (2013) proposed a multilevel analysis applied to PISA 2006 data to study educational disparities across regions in Italy and Spain. Hippe et al. (2018) exploited the regional distribution of skills in Italy and Spain, studying the extent of the regional inequalities in PISA 2015 using descriptive statistics and estimating several regression models. Costanzo and Desimoni (2017) explored inequalities in education using a quantile regression approach applied to primary school data from INVALSI largescale assessments; mathematics and reading scores were regressed on students' characteristics and geographical variables.
The presence of the well-known regional inequalities in education in Italy points out the need to better understand the regional disparities and the connected policy implications.
Previous studies analyse the geographical disparities in Italy assuming that only the student outcome depends on the geographical areas but that the effect on the student outcome of the contextual variables is the same across Italy. However, the effect of the contextual variables on the student performance may be different across the country (Agasisti et al. 2017) and, so, different critical factors could be identified in different geographical areas. To better understand the determinants of the mechanism of inequality it is important to verify if and how educational predictors have a different effect on the student performance according to the geographical location of the schools.
The aim of this study is to investigate how the impact of the contextual variables on academic performance varies by geographic areas in Italy, exploiting data aggregated at school-level. This work exploits the reading comprehension standardized test administered in Italy by the National Evaluation Institute for the School System (INVALSI) for the school years 2018-2019, focusing on the whole population of the students at the last year of the lower secondary school, involving more than 500 000 students in more than 5000 schools in Italy. For the first time, to analyse the inequality of the reading achievement in Italy, the georeferenced data of the schools are used. Our main concern is to examine the extent of the spatial disparities in the relationship between the academic achievement and some school-level factors related to inequalities in educational outcomes, moving beyond the regional administrative confines, in order to identify new spatial patterns. Geographically weighted regression (GWR) is utilized to explore the spatial variations in the above relationship. K-means clustering method has been applied to classify schools into homogeneous regions based on local regression patterns. In terms of policy implications, it is crucial to identify which and where local factors have the strongest impacts to plan different effective policy strategies. The findings of this paper demonstrate the necessity to design more specific education policy and support with the identification of the main critical factors for different geographical areas.

Educational system in Italy
The Italian schooling system has been centrally managed and financed since the unification of the country in the nineteenth century. The educational system in Italy is articulated in three main cycles: primary school (grades 1 to 5), lower secondary school (grades 6 to 8) and upper secondary school (grade 9 to 13). The total number of students is equal to about 7.6 million, attending 31 000 schools of which only 10% is private but periodically accredited by the Ministry of Education (Italian Ministry of Education 2012). The Ministry of Education in Italy plays a central role in the regulation of the schools' activities since the public schools are characterized by a low degree of autonomy. Although during the nineties a slow process for assigning more controls to the single schools in organising their own teaching activities began (laws n. 537/1993 and n. 59/1997), the school autonomy is still constrained by the inability to choose the teachers and manage the budget for the human resources. The Ministry of Education has the responsibility to recruit and to allocate teachers to schools and to determine their wages or to fire them. Regarding the funding, according to the official statistics of the Italian Ministry of Education (2018), the schools in Italy are mainly financed by the Central government, particularly more than 90% of the educational expenses are covered by the central government. Only the schools belonging to the Autonomous Provinces of Trento and Bolzano are financed by their own regional funds.
Hence, overall in Italy the school system is characterized by a low degree of autonomy and is conceived as a unified system aiming to create a national identity and to ensure a high degree of homogeneity in student education in each part of the national territory. Despite the centralization of the school system, strong differences in student performance can be found by territorial area.

Background
A topic of great interest in educational research in order to deal with inequality in the school system is understanding the factors that influence the learning outcomes. Understanding the factors that influence student achievement could help educators and policymakers in setting up a support system to allow all students to perform better and to develop equality in the educational system. This study investigates the inequality of the educational system at a geographical level. We aim to identify in which schools/geographical areas it is necessary planning an intervention to flatten the effects of some factors associated in the literature with the inequality in education in order to guarantee greater equality. We focused on the composition of the student body from two different points of view: the students previous learning levels and socio-demographic characteristics. Moreover, we analysed the effect of the school size on the school mean performance to examine if the geographical position of the school influenced the role of school size on the improvement of the achievements.
A fair analysis of the mean school performance takes into account the initial educational attainment of the students. The analysis of the gain between previous and current learning is widespread in particular in the contest of the evaluation of the school effectiveness (Heck 2000;Grilli and Rampichini 2009). The effect of the previous learning on the current learning helps to identify schools that produce a greater or lesser impact on student academic improvement and the omission of the previous achievements could lead to an inflation of the coefficients of the others variables since, it is well-known, that it is one of the most important predictors of the student achievement.
Many studies in the field of education have recognized the importance of the economic, social and cultural status as a determinant of the learning process (Schuetz et al. 2008;Giambona and Porcu 2015;OECD 2016;Gursakal et al. 2016;Benadusi 2016;INVALSI 2019). Agasisti and Vittadini (2012) demonstrated that high variability in the student outcome can be explained by contextual and regional factors; in particular, the composition of the school student body, in terms of socio-economic and cultural background (ESCS) matters more than school's resources, however, this effect is mitigated when considering the effect of macro-areas of the country. Several researchers recognized the impact of the ESCS on the performance varies across regions Benadusi et al. 2010;Matteucci and Mignani 2014), suggesting the necessity to analyse more in-depth the relationship between the ESCS and the territorial conditions.
One of the major concern at the global policy level is reducing the gender inequality in education; this aim has been recently ratified as one of the seventeen Sustainable Development Goals of the UN. Researchers depict two different realities in the analysis of the gender as a determinant of the learning: boys outperform girls in the STEM (Science Technology Engineering and Mathematics) disciplines (Robinson and Lubiensky 2011;OECD 2016;Wang and Degol 2017;Contini et al. 2017), but, focusing on the reading comprehension test scores, the traditional gender gap is reversed in favour of girls (Department of Education and Skills 2007; Legewie and Di Prete 2012;INVALSI 2019). From a policy perspective, it is important to understand when the gap first shows up and identify the geographical critical areas. The results of the quantile regression analysis on mathematics and reading performance in Italy performed by Costanzo and Desimoni (2017) suggest the necessity analyse the role of the gender in a more complex framework than the traditional regression model. Another variable widely associated with inequalities in educational outcomes is the immigrant status (Azzolini et al. 2012;Schnell and Azzolini 2015). Azzolini and Barone (2013) showed that immigrants' scholastic adaptation in Italy follows heterogeneous paths, suggesting a not complete integration of the immigrant into society. Agasisti and Vittadini (2012) showed that a high concentration of immigrant students within the schools is associated with lower performance, suggesting a negative peer effect.
Instead, the literature is not unified on the directions of the effects of the school size on the students' outcome. Some studies from American evaluations found that student performance decrease as school size grows (Lee and Loeb 2000;Wasley et al. 2000); studies, based on the analysis of other countries, claimed a positive effect of larger school (Luyten, 2014;Scheerens et al. 2014). However, the heterogeneity of the effects of small schools on student performance and engagement for students' subgroups is demonstrated in several studies (Leithwood and Jantzi 2009;Weiss et al. 2010;Schwartz et al. 2013).

Data
The National Evaluation Institute for the School System (INVALSI), annually, carries out large-scale survey assessments in Italy to monitor students' achievement in Reading (reading comprehension and grammatical knowledge), Mathematics and English Language (reading and listening comprehension). In particular, INVALSI standardized tests are administrated to students attending the 2nd and the 5th grade of the primary school, the 8th grade of the lower secondary school, the 10th and the 13th grade of the upper secondary school, involving about 3 000 000 students within approximately 15 000 schools. In 2018, INVALSI shifted from paper-and-pencil to computer-based tests for the assessments of the competence of students in lower and upper secondary school. Moreover, from 2018 INVALSI standard test is compulsory at the end of the lower secondary school for all students to be admitted to the state final exam. Computer-based administration and the test compulsoriness provided a consistent and cleaner dataset and allowed to minimize cheating and, thus, to avoid cheating correction procedure , Longobardi et al. 2018.
In addition to the standardized test, INVALSI requires each student to compile a questionnaire after the test to collect socio-demographic variables regarding the student.
School' secretarial offices provide to INVALSI further information about classes' and schools' characteristics (e.g. the number of classes within the school, the number of buildings of school and the school address).
In this study, we focused on the reading standardized tests administered by the INVALSI at the end of lower secondary school in the school year 2018-2019. In 2018-2019, the INVALSI test has been administered to a population of 542 689 students in 5761 schools. In this study, we considered the data aggregated at school-level and we focused on schools with at least 20 students, a percentage of students' participation higher than 80%, geospatial location available. The number of schools analysed in this study is 5520.
The students' performances at Italian INVALSI test are measured using the Weighted Likelihood Estimation (WLE) estimated by the Rasch model (for more information a technical report is available on the official website: http:// inval si-areap rove. cineca. it/).
As predictors of the students' performance, we selected a set of common variables to represent to the socio-demographic background of the school, one variable to evaluate the impact of the school students' previous knowledge on the school mean performance and two variables to evaluate the effect of the school size.
The socio-demographic variables at school-level included in the analysis are those usually associated with the inequalities in the educational outcome, namely the percentage of immigrant students, the percentage of females, the percentage of late-enrolled students (i.e. students enrolled at least 1 year after the of 6 or repeated one or more years) and socio-economic background. The index of economic, social and cultural status (ESCS) is based on the following set of variables related to students' family background: parents' occupation, parents' education and the number of books at home that is a proxy of the family educational resources. The index is derived by INVALSI following an OECD's standard (OECD 2005). Campodifiori et al. (2008) detail the methodological aspects of INVALSI procedure to estimate the ESCS.
The indicator of the students' previous achievement is the students' WLE score at reading INVALSI test at the end of primary school in the s.y. 2015-2016 (called WLE in G05) aggregated at school level. Indeed, using the INVALSI student identifier, it is possible to match individual students' test records from year to year, to map the student's academic achievement over time.
The last group of variables are included in the analysis to represent the school size: the number of classes and the number of buildings. This information is derived from the data provided annually by the school's secretarial office.
All the variables included in the models have been standardized with mean 0 and variance 1. Table 1 shows the descriptive statistics of all the standardized variables included in the model as outcome and predictors by geographical area (North West, North East, Centre, South, South and Islands).

Methods
To assess the geographic dimension of the association between schools' performance and predictor variables, this study followed a two steps procedure. First, both linear regression model and geographically regression model were estimated to study the association at global and local level, respectively.
An OLS model can be expressed as.
where y i is the dependent variable for i-th observation, β 0 is the intercept, β k is the coefficient for the predictor x k , x ik is the k-th predictor for i-th observation and ε i is the error term.
GWR allows the relationship between the dependent variable and the predictors to vary geographically considering locally weighted regression coefficients. In other words, the regression coefficients are variable since they depend on the geographical coordinates of the observations. In the framework of the GWR model (Brunsdon et al. 1998), the relationship between y i and the predictors can be written as.
where (u i , v i ) denotes the geographical coordinates (latitude and longitude) of the ith observation. Thus, β 0 (u i , v i ) represents the intercept of the observation i with spatial coordinates (u i , v i ) and β k (u i , v i ) denotes the coefficient for the predictor x k of the observation i with spatial coordinates (u i , v i ) . Instead of estimating one single regression, the GWR model generates separate regressions, one for each observation. In this way, GWR allows to examine the spatial variation in the relationship between the dependent variable and the predictors. The GWR model calibrates separate regressions for each observation assuming that observations are weighted on the base of their proximity to i-th observation. The estimation of the coefficients is performed using weighted least squares, assuming closer observations have a greater influence on the estimation of regression parameters of i-th observation than remote observations. The weighting is controlled by the weight matrix that is a diagonal matrix in which each diagonal element w ij is a function of the location of the observation. In particular, a kernel density function determinates the weight w ij . In this study, we exploited the Gaussian weighting function as kernel density function, thus w ij can be written as.
(1) where d ij is the distance between the i-th observation with spatial coordinates (u i , v i ) and the j-th observation with spatial coordinates u j , v j and h is the bandwidth. The choice of a weighting function involves choosing the bandwidth h and the choice of this parameter is crucial since it has the strongest influence on the results. The bandwidth is the number of observations to which the kernel assigns a non-zero weight, i.e. the bandwidth represents the distance from the observation i of interest beyond which the weight of the observations is equal to the value 0. We assumed a Gaussian adaptive kernel, that allows the bandwidth varying on the base of the density of observation points. The bandwidth has been computed through a calibration process based on the minimization of the Akaike Information Criteria (Fotheringham et al. 2003). The spatial analyses have been performed using the R package spdep (Bivand and Wong 2018;Bivand et al. 2013) e GWmodel (Gollini et al. 2015;Binbin et al. 2014).
In the second step, the k-means clustering method has been applied to the matrix of the spatial coefficients estimated using the GWR to divide schools into areas where performance are homogeneously affected by the analysed predictors, based on local regression patterns. K-means clustering is one of the most popular unsupervised machine learning algorithms, with the objective to group similar data points together and discover underlying patterns. To achieve this objective, k-means looks for a fixed number (k) of clusters in a dataset. Determining the optimal number k of clusters in a data set is a fundamental issue and, in literature, a wide variety of indices have been proposed. For the identification of the number of clusters the package NbClust package (Charrad et al. 2014) has been used. NbClust package provides 30 indices for determining the optimal number of clusters and proposes to user the best clustering scheme from different results obtained by varying all combinations of number of clusters.

Results
To examine the geographical patterns in the relationship between schools' performance and ESCS, WLE in G05, number of classes, number of school buildings, percentage of immigrant students, percentage of late-enrolled students and percentage of females, the OLS and GWR models were estimated. Table 2 summarize the results of the OLS model that pictures the relationships through the entire country. All the predictors included in the model are significant; only the number of buildings and the percentage of late-enrolled students have a negative impact on the mean school performance. We checked the presence of multicollinearity between the predictors using the variance inflation factor (VIF), that measure how much the variance of the estimated regression coefficient is inflated by its correlation with the other predictors. As general role, when the 0 < VIF < 10 there is no multicollinearity. All the VIF measures of the predictor variables range from 1.01 to 1.78, suggesting that the predictors in the model are not correlated with the other variables. Table 3 reports different measures of goodness of fit for the OLS model and GWR. The GWR fits better the data as it has higher adjusted R 2 and lower AICc. In correspondence of the OLS model, the adjusted R 2 is equal to 0.60, which indicates that about the 60% of (3) the total variance of the dependent variable is explained by the model, whereas for the GWR the adjusted R 2 is equal to 0.70. Figure 1a shows the residuals of the OLS model suggesting the presence of spatial correlation in the residuals: we can observe high residuals in North Italy and low residuals in South Italy. The Moran's I has been used as a measure of the spatial correlation for the residuals: positive and negative values of the Moran's I suggest positive and negative autocorrelation, respectively; under the null hypothesis of no spatial autocorrelation the Moran's I is expected to be close to zero. The Moran's I results statistically different from zero (I = 0.085, p < 2.2E-16 ) and confirms the spatial autocorrelation among model residuals of the OLS model. The GWR residuals exhibit a reduction of the spatial pattern (Fig. 1b); the Moran I (I = 0.007, p < 5.39E − 07) suggests a substantial reduction although not total elimination of the spatial autocorrelation among model residuals.
Another measure of goodness of fit of the GWR model is provided by the local R 2 that ranges from 0.25 to 0.95, with the 75% of schools having a value of 0.63 or higher (Table 4). Table 4 shows the estimated coefficients of the GWR model. The WLE obtained in G05 and the ESCS result to be consistently positively associated with the mean school performance, whereas the other indicators have both positive and negative values in different locations. Small interquartile range is observed in correspondence with the percentage of females, the number of buildings, the number of classes and the percentage of late-enrolled students. These variables result to be not significant in the majority of the locations. For the WLE in G05, the ESCS, the percentage of immigrant students and the number of buildings, the interquartile range of the GWR coefficients falls below the magnitude of the OLS coefficients. Figure 2 shows the spatial patterns of the intercept and the coefficients of the 7 predictors estimated exploiting the GWR. Only the significant coefficients are represented in   To summarize the results of the GWR and to identify clusters of schools with performances homogeneously affected by the analysed factors, the k-means clustering has been applied to the regression coefficients matrix. The comparison between 30 different indices leads to the identification of seven as the optimal number of clusters. Table 5 shows the summary statistics for each cluster, in particular, we reported the mean and the percentage of significant coefficients in correspondence of each variable. It can be noticed that the WLE in G05 and ESCS play a predominant role in the interpretation of the clusters. The use of the k-means has facilitated the reading of the results since it is now possible to identify easily the important factors of a specific geographic area (cluster). Figure 3 reports the maps and radar plot of the identified seven school clusters. The two clusters, illustrated in the Fig. 3a, include a particular section of the national territory, Alps, Friuli, Veneto, the Po valley, the Apennines and Sardinia. These two  clusters are quite similar considering the mean of the effect of the previous competencies (0.245 and 0.298 for Cluster 2 and Cluster 6, respectively) and the negative impact of the presence of immigrant students (− 0.128 for Cluster 2 and − 0.176 for Cluster 6), which results to be around the national average for both variables (Table 5). On the other hand, an opposite behaviour of the family background is observed: for schools in Cluster 2 the ESCS effect is greater than the national average, whereas in Cluster 6 the ESCS effect results to be less than the national average. Cluster 2 and Cluster 6 included 18.73% and 21.66% of the analysed schools, respectively. The three clusters, reported in Fig. 3b, represent almost completely the schools located in South Italy, covering 25.85% of Italian schools (Cluster 4: 6.25%; Cluster 5: 12.08%; Cluster 7: 7.52%). The radar plot highlights that focusing on the WLE in G05 and the ESCS these clusters result to be very similar. In all three clusters, the ESCS effect ranges only from 0.497 to 0.550 and its value is greater than the national average (0.40). On the other hand, the effect of the student previous knowledge, which varies between 0.142 and 0.179, is less than the national one (0.248). This means that in South is more important the family background instead of the previous competencies of the students. In the red cluster (Cluster 7) the 73.88% of the schools is significantly influenced by the presence of late-enrolled students, that has on average a greater negative effect on the performance (− 0.149) than the national average (− 0.09). The presence of immigrant students has a significant and negative effect on the performance for 94.97% of the schools located in Cluster 4, which covers Calabria and Basilicata. For these schools the coefficient results to be on average (− 0.323) three times higher than the national average (− 0.126).

Fig. 3 Map and radar plot of Italian school clusters
The behaviour of the schools belonging to Cluster 1 and Cluster 3 (Fig. 3c) is very similar; both have a positive effect of the previous competencies greater than the national average (0.31 towards 0.25), whereas the socio-economic background effect is less than the national average. Cluster 1 and Cluster 3 included 20.40% and 13.36% of the analysed lower secondary schools, respectively. In contrast to the reality described for South, in these territories it is very important for the school what the student knows at the beginning of the lower secondary school and not his socio-economic status. Another important thing, for the blue cluster that covers part of the Tyrrhenian coast, is the negative effect, greater than the national average, of the presence of immigrant students in the school (− 0.168 towards − 0.09).

Concluding remarks
The aim of the study is to investigate how the impact of the contextual variables on academic performance varies by geographic areas in Italy, examining the extent of the spatial disparities moving beyond the regional administrative confines in order to understand the factors that most influence the educational outcome of each school. We aim to identify in which schools/geographical areas it is necessary planning an intervention to flatten the effects of some factors associated in the literature with inequality in education (the school mean of the students previous learning levels, some socio-demographic characteristics of the student body as the school mean of socio-economic and cultural indicator, the school percentage of immigrant students, of late-enrolled students and of females, and the school size defined as the number of classes and the number of school buildings) in order to guarantee greater equality. We exploited the reading standardized tests administered by INVALSI in 2017-2018 focusing on 8th-grade students, analysing at the school-level the data of 5520 out of 5761 Italian lower secondary schools.
In this paper, for the first time, the performances of all Italian schools in INVALSI large-scale assessment survey are analysed using the geo-reference data. The well-known geographically cleavage of Italy is analysed from a new perspective: the geographical disparities relies not only on an unequal distribution of the school performance in terms of academic achievement in Italy but on a different effect of the contextual variables on the school performance. Whereas the OLS model provides an overall picture of the relation between academic performance and contextual variables, the GWR models this relation at local level studying the spatial variation of the regression coefficients. The GWR local modelling outperforms the OLS model, explaining the 10% more of the outcome's variations. This analysis sheds light on how the predictor variables and their effects are related to geography, showing a clear spatial pattern. Applying the k-means clustering on the local regression coefficients we identified seven school clusters that are homogeneous with respect to the factors' effect on school performance. Each cluster has been characterized geographically and in relation to the intensity of predictors statistically significant in the area, in order to identify for each area the critical local factors.
The finding of this work revealed strong variations across the schools of the impact of the contextual variables analysed on the educational outcome, highlighting the presence of strong inequality in the Italian educational system. As it has been shown, the school performance in South Italy results strongly affected by the socio-economic status. This means that low socio-economic conditions impede the student's educational development entailing low academic performance. On the other hand, the schools that belong to part of the Tyrrhenian and Adriatic coast are able to moderate the socio-economic differences of the students and school's mean performances result to be affected mostly by students' competences at the beginning of the secondary school. However, in the schools of the Tyrrhenian coast, as in some schools in South Italy, particularly in Sicily and Puglia, the strong negative impact of the percentage of late-enrolled students is worrying. The low effect of the previous knowledge of the students in all South is a warning signal of a not efficient educational system, where variables associated with inequality are too relevant. Another warning signal that highlights mechanisms of inequality in the educational system is the high impact of the immigrant status on the school performance in North-Est and Calabria.
In Italy, there is a single school system but the findings of this work outlined a fragmented reality, corroborating the hypothesis to improve decentralization in the educational field. Several researches claimed that the well-known regional differences in human capital are the results of a long history, and the central government should take care of such differences as they turn impact on territorial socio-economic differences that are driving elements in understanding the inequalities in the educational outcome.
In terms of policy implications, the use of the geo-referenced data has the benefit to study the reality at local levels and it could be an important help for policymakers, local administration and schools to design effective strategies to improve students learning based on the evaluation and the identification of critical local factors. The finding of the analyses suggests the need to use different policy strategies depending on the territorial necessity. In the areas where the school outcome is strongly affected by the socio-economic background, it is important to plan actions directly on the students to compensate the family lack in terms of cultural background and to sustain families, organizing, for example, out-of-school care and recreation programs. In the South where the impact of the previous knowledge is particularly high, suggesting that one of the school problems is the efficiency, one possible strategy is to improve the learning opportunities of all the students promoting the educational services starting from the early school years. In the territories where the school outcome is strongly and negatively affected by the percentage of the immigrants, should be planned actions to promote cultural integration and to support students and families. The presence of migrant in schools is no longer a deviant case but is new normality which schools have to address. For new immigrants the first obstacle is the language, one possible strategy is the organization of intensive course, in which language learning is the central effort, and the assistance after regular classes with the instalment in schools of learning and homework centres. The integration of cultural items from countries of emigration in the school life could help the selfesteem of the immigrant students. The high impact of the percentage of the late-enrolled students on the school outcome suggests that the children who are retained learn less than they should have and some schools are not able to integrate the retaining students limiting the learning opportunities of the whole class. For these schools too, one possible strategy could be to plan measures regarding teaching, to increase the elements of support in the teacher's role and to introduce a teacher assistant for help underachievers.
In spite of the many promising aspects of GWR, there are some limitations in this study. Multicollinearity issue in the GWR is still a debated topic. Despite GWR has been demonstrated to be sufficiently robust to withstand the multicollinearity effects (Fotherigan and Oshan, 2016), the collinearity issues restrain the number of the predictors that can be included into the model. Secondly, while this study supports the use of GWR over nonspatial OLS methods for the prediction of school academic performance, the GWR model is not able to control for the entire spatial autocorrelation as shown from the Moran's I on the residuals. To address this limitation better diagnostic tools should be exploited in future investigations, although approaches to calculate the goodness of fit are still studied. Finally, in this study for the estimation of the GWR model, the data has been aggregated at school level and the student-level risk factors were masked. Multilevel models overcome this problem by combining an individual-level model with a macro-level model. On the other hand, the multilevel model is limited in the modelling of spatial processes by the necessity of an a priori definition of a discrete set of spatial units at each level of hierarchy (Fotherigan et al. 2003). This assumption implies that the outcome is modified in exactly the same way throughout a particular spatial unit, e.g. a region, but the process is modified in a different way outside the boundaries of the spatial unit. This assumption is often unrealistic since the effects of the space are continuous. For this reason, the application of the multilevel model for the spatial analysis is limited, whereas the geographically weighted regression has the advantage to estimate local model and analyse the spatial variation of the predictors of the academic achievement at school-level.
In summary, this work allows to visualize the Italian situation going beyond the classic territorial definition and offers an in-depth analysis of the Italian education system. The identification of new spatial clusters is a useful tool to differentiate supports for schools on the basis of their unique specific needs and it is the first step to understand how to address a more contextualized policy response to educational needs in Italy. To have a complete overview of the Italian situation, future research aims to extend this work to the school performance at mathematics INVALSI tests.