Modern international large‐scale assessment in education: an integrative review and mapping of the literature

Introduction Studies on international large-scale assessment (ILSA) in education, such as the Programme for International Student Assessment (PISA) or the Trends in Mathematics and Science Study (TIMSS), have received increasing attention from educational practitioners, researchers, policymakers, and the public at large. The goal of research in ILSA is to shed light on how educational infrastructures, policies, and contexts operate and how related factors might contribute to educational and social outcomes across different schools and countries. The outcomes of such studies feature prominently in discussions about the comparative quality of educational jurisdictions, and have had important policy implications across and within multiple countries (Johansson, 2016). Abstract

Page 2 of 33 Hernández-Torrano and Courtney Large-scale Assess Educ (2021) 9:17 Although the theoretical conception, design, implementation, scale validation, and descriptive analysis for these large-scale projects are generally conducted by international institutions such as the Organisation for Economic Co-operation and Development (OECD), the International Association for the Evaluation of Educational Achievement (IAE), and the United Nations Educational, Scientific and Cultural Organization (UNESCO), some programs also invite public participation -for example, "countries and economies are invited to submit questions that are then added to items developed by the OECD's experts and contractors" (PISA, 2020; PisaFAQ). The sampling designs for ILSAs shape possible analyses and are characterized by collections of data from large representative samples of students and schools across multiple jurisdictions. While large representative samples of students sit in standardized assessments and give responses to questions about their school experience, principals, for example, report on the implementation of specific school policies, teachers report on aspects of the classroom, and parents report on home support for learning.
ILSA research has come a long way from the early international studies in education (e.g., FIMS) (Rutkowski et al., 2014). Today, ILSA research in education is a flourishing field that has experienced significant growth in recent decades (Addey et al., 2017;Kamens & McNeely, 2010). The modern scholarship in this broad field includes publications about methods and analytic procedures, research on secondary analysis of ILSA datasets, and studies that apply a social science framework to the ILSA phenomenon. The field brings together "scholars and practitioners who are contributing to the analysis, critique, and development of large-scale assessment methods and results, as well as their implications for education policy" (CIES, 2021).
As the volume of research in ILSA grows and empirical evidence accumulates in the field, it is increasingly necessary to synthesize the knowledge generated to offer educational scholars, practitioners, and policymakers a coherent vision of the recent research trends in the literature as well as the structural cornerstones of the field. Such epistemological syntheses are important because they envision the past, present, and future developments of a field. While program-specific reviews have provided some insights into the contribution of some ILSA programs at a given point in time (e.g., see Hopfenbeck et al., 2018 for systematic review of PISA research to 2015), a more thorough review of research on ILSA in education more generally, one that provides clarity about the overall growth and trajectory of this broad field, has yet to be undertaken. The purpose of this study is to provide a quantitative synthesis of modern research available revolving around the five major current and ongoing ILSA programs in education: PISA, TIMSS, PIRLS, ICCS, and ICILS. We use a descriptive bibliometric approach to map the growing literature in research on ILSA in education and describe recent developments of the field, as well as its actual structure based on publication and citation data.
The study is guided by the following research questions: • RQ1: What is the volume and growth trajectory of modern research on ILSA in education? • RQ2: What journals, articles, authors, and countries have had the highest impact on the dissemination of modern research in the field? This study contributes to the literature by examining how recent research on ILSA in education is built on the basis of different social, intellectual, and conceptual frameworks and by identifying the key players in the development of the field. The findings have the potential to inform future studies by identifying strengths and gaps in ILSA research in education in terms of its growth patterns, relevance, and coverage.
In the following sections, a brief history of ILSA in education is presented prior to a description of common subject areas. This is followed by a discussion vis-a-vis the motivations of participating jurisdictions. Thereafter, details of the prominent ILSA programs, focal to this study, are provided prior to a review of the strengths and weaknesses of such research programs. Finally, a short summary of previous reviews of ILSA research in education is provided prior to providing a rationale for the current study.

A brief history and scope
Though the field of ILSA in education has become quite prominent over the past two decades, its history can be traced back to the late 1950s. At the UNESCO Institute for Education in Hamburg, a group of educational researchers met and discussed the possibilities of conducting research on student academic performance and its antecedents across multiple countries (Husén, 1979). One of the key initial intentions of this proposed enquiry was for countries to learn from the experiences of others and to avoid developments that produced less than satisfactory results (Husén, 1979). This initial meeting represents the origins of the IEA, which, from the 1960s, has conducted regular studies in international large-scale assessment. Today, the IEA's TIMSS with its focus on Mathematics and Science and Progress in International Reading Literacy Study (PIRLS) with its focus on Reading Literacy, alongside the OECD's PISA triennial surveys, represent prominent ILSA research programs with global reach.
Broadly, the shift in early ILSA studies in the 1960s to that in the current era can be characterized politically, temporarily, and disciplinarily: politically, by a shift in focus with countries not only interested in educational accountability, but now also interested in becoming embedded in world society (Pizmony-Levy, 2013); temporarily, by a more high-paced, synchronized administration, and more immediate publication of results (Landahl, 2020); and, disciplinarily, by an expansion of common academic subjects to include problem-solving, and, more recently, research in ICT skills and civic education.
The expansion to include ICT and civic education has enabled the examination of a more broad range of cognitive, psychological, and educational processes and outcomes. For example, Torney-Purta and Amedeo (2013) posit that while civic education is not commonly conceptualized as ILSA research, it has "the potential to contribute to understanding many aspects of the school's role cross countries" (p. 254). To the authors' first point about conceptualization, we point out that the first civics education study was authorized by the IEA in the early 1970s (Torney et al., 1975). To the authors' second point about potential contribution to understand the roles of schools, we concur and point to recent studies that buttress this argument -for example, a study by Maurissen et al. (2020) contributed to an understanding of how student-level processes, such as student-teacher relationships, contribute to perceived civic outcomes within schools. In addition, studies in civic education have enabled researchers to better understand youths' civic identities (see Knowles et al., 2018 for full review). For these reasons, this study also includes the more recently established International Computer and Information Literacy Study (ICILS) and Civic and Citizenship Education Study (ICCS), both administered by the IEA.

Motivations of participating governments and jurisdictions
Research by DeBoer (2010) and Kijima (2010) identifies four forms of motivations for a jurisdiction's participation in ILSA study programs: the rational choice model, the policy diffusion model, the macro-dissatisfaction perspective, and, herein defined as, the financial aid model (Kijima, 2010). Jurisdictions that are driven by the rational choice model partake in ILSA studies to inform decision-making and uphold their international reputation, while states that function under the policy diffusion model are motivated by identifying effective educational processes and have an interest in promoting the transfer of specific educational practices. However, jurisdictions with a macro-dissatisfaction perspective see such international studies as "bringing attention to a perceived crisis and prompting a focus on education" (Torney-Purta & Amedeo, 2013, p. 249). Finally, empirical work undertaken by Kijima (2010) suggests that recently participating jurisdictions may be motivated by an associated increase in educational foreign aid. Kijima found that countries that participate in major cross-national assessments, such as PISA, receive an average of 37% more funding than non-participating country counterparts. While it might be argued that these four motivations exist, we note that they are not mutually exclusive. Torney-Purta and Amedeo (2013) state that for jurisdictions "operating according to the rational choice and the policy diffusion models, secondary analysis has obvious advantages" (p. 249). Indeed such analysis is better aligned with initial goals. Though, we argue that secondary analysis may benefit all participating jurisdictions regardless of initial motivational orientations -while participating jurisdictions must manage their country's reported rank-ordering of student performance and potential political fallout, all jurisdictions can also make use of the research infrastructure that ILSAs provide (Johansson, 2016). Research has also suggested that jurisdictional motivation for participation may also be driven financially.
Nevertheless, countries and jurisdictions leverage off the fact that they (1) have access to large-scale representative data, rather than common smaller convenience samples, (2) can save on the time and costs associated with research design, data collection, and management, (3) are able to undertake research that identifies student-, school-, and country-level related conditions that may be effective for learning outcomes of interest, (4) test competing statistical models, and (5) replicate and compare findings across jurisdictions and regions (Donnellan et al., 2011;Johansson, 2016).

Description of ILSAs in education
The definition of ILSAs adopted in the present study resembles that given by Bos (2002): "Studies in which both achievement of certain age/grade in one or more subjects is compared across education systems and effects of contextual factors at the system, school, classroom and student level on achievement are studied" (p. 2). Specifically, the IEA (TIMSS, PIRLS, and ICCS) and OECD (PISA) study programs under review are very similar in that they both use comparable psychometric methods to analyze, validate, and scale student response data (von Davier et al., 2013;IEA, 2013) and make use of two-stage clustered sampling designs (OECD, 2009), first with random school samples weighted on school size, then either (a) a random sample of one or two intact classes (IEA studies), or (b) a random sample of the school's 15-year-olds (OECD studies). Table 1 provides a summary of the similarities and differences associated with the five ILSA programs of interest in the study. As illustrated, all five studies employ cyclical designs with a focus on measuring trends, though target populations are only comparable for TIMSS Mathematics and Science and ICCS assessment programs (Grade 8).
The programs differ in terms of general design doctrine. PISA can be conceived as being based on Human Capital Theory (see Cardoso, 2020;and, Sellar & Lingard, 2013 for a comprehensive review), whereas TIMSS uses a curricula model (IEA, 2019a) as the major organizing principle where participating countries' intended, implemented, and attained curriculum inform the general design of the survey program. PIRLS uses the "attainment of societal goals" encompassing literacy experience and the processes of comprehension to attain societal goals (Mullis & Martin, 2015), whereas the IEA's ICCS can be conceived as evolved from ecological systems theory (e.g., Neal & Neal, 2013) and is based on cognitive and affective-behavioral dispositions toward civics and citizenship . Finally, IEA's ICILS is underpinned on modern conceptions of computer and information literacy and computational thinking (IEA, 2019b).
In terms of focal subject areas, it is sometimes argued that PISA Science, Reading, and Mathematics literacy focus more on applications to real-world situations compared to TIMSS and PIRLS assessments, which focus more on mastered factual and procedural knowledge taught in school (ACER, 2020), though some studies have found PISA Maths and TIMSS numeracy to be highly comparable forms of assessments (see, for example, Wu, 2009). One particular point of difference for the ICCS study is that it benchmarks student outcomes to the UNESCO sustainable development goals related to citizenship. In addition, the more recent number of participating countries also differ with 79 countries participating in the 2018 PISA, 61 in TIMSS and PIRLS, and 24 in the ICCS surveys. Important to note, studies under all programs cyclically provide international league tables that tend to draw a large amount of media attention and policy debate, yet, as Hopmann et al. (2007) note, there has been a tendency to treat the results of the examinations uncritically and make direct links to a nation's relative school quality and economic future.

Criticisms of ILSAs
Compared to the level of attention paid to the results of international league tables in public and governmental domains, there exists a dearth of criticism of such programs.

ICILS (IEA)
Administration periods 2000, 2003, 2006, 2009, 2012, 2015, 2018… 1995, 1999, 2003, 2007, 2011, 2015 Grade 4 (though next grade higher when average age of testing is less than 9.5; TIMMS & PIRLS); Grade 8 (though next grade higher when average age of testing is less than 13.5; TIMMS) 2 Grade 8 (though next grade higher when average age of testing is less than 13.5) 3 Grade 8 (though next grade higher when average age of testing is less than 13.5) 7 General Design Doctrine Human Capital Theory (Sellar & Lingard, 2013) TIMSS:National Curricula Focussed (IEA, 2019a) PIRLS: "Reading to do" (Stiggins, 1982) and the attainment of societal goals (Mullis & Martin, 2015) Ecological Systems Theory (Neal & Neal, 2013) Computer and information literacy and computational thinking ( Though, academic criticism have centered around the idea that (a) ILSAs promote regional and global isomorphism, (b) results from ILSAs are often reported in a poor uncriticized way, (c) causal language is often misused when presenting ILSA results, and (d) ILSAs are exclusionary and lack consequential and other components of validity. A common criticism levelled against ILSAs is that they promote isomorphic ideologies (see Wiseman et al., 2014) across jurisdictions and cultures. Here it is argued that similarities in educational structures, policies, pedagogical approaches, and curricula content emerge as a consequence of shared insights from ILSAs (Pettersson, 2008). While theorists have argued for regional (Dale, 2000) and global (Meyer et al., 1992) educational homogenization, it is difficult to identify the degree to which ILSAs themselves drive isomorphism (Johansson, 2016) or whether commonalities across curricula happen to exist.
Torney-Purta and Amedeo (2013) point to the common lack of criticism of country rankings in league tables often reported in the media given the inherent inaccuracies associated with measurement. Specifically, the authors point to the imprecision associated with estimates of country-level means in ILSAs. A calculation of sampling errors alone 1 for the more recent 2018 PISA Science scores helps to illustrate this point: while results afford Australia, the U.S., and New Zealand mean scale scores of 502, 503, and 508, we can only be 95% confident that the countries' true means fall between 499.15 and 504.85, 501.31 and 504.69, and 505.40 and 510.60, respectively (see Additional file 1: Technical Appendix, Note 1). Therefore, it is only reasonable to assert that the Scientific literacy of 15-year-olds in New Zealand likely exceeds that of counterparts in both Australia and the U.S. In addition, ILSA programs commonly report shifts in countrylevel means (or growth trends) cycle-to-cycle. However, such estimates also involve item equating (or item link) methodologies that align the results across different assessment administrations (Wu, 2010). While this contributes to an even larger amount of error in estimating cyclical shifts in country-level means (Michaelides & Haertel, 2004), associated inaccuracies (equating errors) are commonly not reported (Wu, 2010). Other measurement critics have argued that PISA Reading data has reflected at least two underlying dimensions and that alternate scaling models are more appropriate (Goldstein et al., 2007).
Another criticism of ILSA is, that, due to the cross-sectional survey designs of such programs, it is not easy to draw causal inferences about the data (Rutkowski & Delandshere, 2016). To this end, Gustafsson (2008), argues that one key negative consequence of ILSAs is the causal language used to explain findings. Further, Allardt (1990) argues that there is little to no evidence that comparative surveys in any field of the social sciences have been able to create knowledge about causal phenomena. Relatedly, Goldstein (2004) points to ILSA's general lack of longitudinal data (on the same sample of students) that would enable researchers to attribute differences in groups to educational systems per se. Despite these criticisms, it should be noted that recent studies have made an attempt to use methodologies that enable causal inferences from ILSA data (see for example, Hogrebe & Strietholt, 2016). Hernández-Torrano and Courtney Large-scale Assess Educ (2021) 9:17 Some academics have argued that ILSA is exclusionary and lacks consequential validity. ILSA relies on sophisticated statistical techniques such as item-response theory (IRT), computer adaptive testing (multi-stage CAT), and matrix sampling designs (with different groups responding to different sets of items) to enable efficient and unbiased estimation of system-level performance. However, Gustafsson (2008) argues that this complexity also makes insights and analyses of these data exclusionary for some countries, only available to a small number of experts. Therefore, insofar as the interpretation and use of measurement devices form an important part of validity (see Messick's 1989 reconceptualization), it could be argued that ILSAs lack this specific component.
Despite criticisms, the ILSAs carry with them many advantages. As mentioned, ILSAs provide a data infrastructure for research into a variety of issues and avoid the ethical problems and costs associated with randomized experiments. The strong methodological foundations enable secondary analysts to make use of the openly-available data to provide stable and generalizable descriptions of knowledge trends within and between schools, and between-countries, and across administrative cycles.

Summary of reviews on ILSA research in education
As research on ILSAs continues to grow, researchers have begun to provide summaries and reviews to provide practitioners, researchers, and policymakers a general picture of this developing field. This has generally involved narrative and critical reviews of the literature. These kinds of reviews are particularly useful to offer relevant insights on the themes, methodologies, and theoretical underpinnings of the research on a particular field or topic. Examples of such studies include an examination of the scientific contribution and impact of TIMSS and the influence of school and classroom contexts on student academic achievement (Drent et al., 2013), a critical review on PISA effects on education governance (Pons, 2017), and methodological reviews of research strategies and statistical techniques used in ILSA research in education (e.g., Lin et al., 2014;Liou & Hung, 2015). Meta-analytic approaches have also been used to review research in the field. Such procedures are appropriate to systematically assess the results of previous research to derive conclusions about that body of research. An example is the metaanalysis by Else-Quest et al., (2010) which used PISA 2013 and TIMSS 2013 data to explore cross-national gender differences in mathematics achievement.
A third path to synthesize the research literature in a field is a bibliometric approach. Bibliometrics can be generally defined as "a set of quantitative methods used to measure, track, and analyze print-based scholarly literature" (Roemer & Borchardt, 2015, p. 234). Bibliometrics has been proposed as a valuable approach to map vast amounts of research available in particular disciplines and to describe their developmental trends and current status in a comprehensive, systematic, and replicable manner (Linnenluecke et al., 2020). Such syntheses are important because they envision the past, present, and future developments of a field. A few bibliometric reviews of the literature on ILSA research in education are available. In 2012, Domínguez et al. (2012) provided an early descriptive bibliometric study on the impact of the PISA studies on academic journals from 2002 to 2010. The study, analyzing 322 recognized papers, identified prominent researchers, journals, and topics of the publication period. The authors identified the prominence of Western countries and authors, with a focus on student performance, and an emergence of ICT use and equality topic areas. Lindbald et al., (2015) conducted a systematic review of 337 journal articles on ILSA (PISA, TIMSS, and ICCS) research that focuses on comparative achievement between schools. The authors identified prominent journal articles, journals, author origins, and associated fields of education The authors found a Western country bias, fields focused on educational comparative-and policy-related research, and complementary fields of economics, psychology, and sociology, and concluded that the field is conducted and managed by few and thus potentially immature and vulnerable.
More recently, Hopfenbeck et al. (2018) categorized and analyzed 654 journal articles that covered PISA-related topics. The authors noted a steady rise in the number of publications from just several in the early 2000s to 103 in 2014, and also noted the Western-centric outputs covering a similar suite of fields to that identified in the Lindbald et al., (2015) study. The authors also identified that the journals covered three main research categories of secondary analysis (61.8%; with focus on demographic inequalities), critique (16.2%, with focus on construct validity, research design, and various technical issues), and policy categories (22.0%; focusing on the effects of PISA on policy and governance).

The present study
The purpose of this study is to contribute to the growing literature on ILSA research in education by providing an overall picture of the recent development and structure of research in the field using a bibliometric approach. More specifically, we aim to map modern ILSA research in education by describing the developmental trajectory of the field based on publication and citation data related to the five major recent and ongoing PISA programs (i.e., PISA, TIMSS, PIRLS, ICCS, and ICILS), the core journals and most influential publications in the field, the leading scholars and countries and the patterns of scientific collaboration between them, disciplines and historical developmental paths underlying the foundations of ILSA research in education, and the major research topics addressed in the literature.
Our study complements and differs from previous reviews of the literature in several ways. First, our study provides a wider coverage of the extant literature by considering publications addressing five major ILSA programs in education, in contrast to previous studies that have focused on specific assessments (e.g., Drent et al., 2013;Hopfenbeck et al., 2018;Lenkeit et al., 2015;Pons, 2017). Early ILSA studies (e.g., First International Science Study, FISS; First International Mathematics Study, FIMS; Six Survey Study) were not considered in this study as they have been discontinued decades ago and do not provide relevant insights to current education systems and policies. Second, previous reviews have predominantly examined documents published in academic journals in which the language of publication is English. In our study, we do not exclude documents published in other languages to account for a broader geographical representation of ILSA research in education across the world. Third, although other reviews have examined the development and the major journals, authors, and countries in ILSA research in education, no study to date has explored how researchers and countries collaborate in the production of scientific knowledge in the field (i.e., social structure). Similarly, this is the first study that explores in a systematic way what disciplines underlie the foundations of the field and the historical paths that contributed to its development (i.e., intellectual structure). Fourth, we provide a detailed analysis of the major research topics addressed in the literature using a data-driven approach that provides a unique and objective perspective to account for the research topics that have received the greatest attention in recent decades, instead of using content analysis of publication content as in previous studies (i.e., focussing on knowledge structure) (e.g., Domínguez et al., 2012;Hopfenbeck et al., 2018;Lenkeit et al., 2015).

Data and methods
A bibliometric approach was used to build a corpus of publications on modern ILSA research in education using metadata extracted from four indexes of the Web of Science (WoS) database: The Science Citation Index-Expanded (SCI-Expanded); the Social Sciences Citation Index (SSCI); the Arts & Humanities Citation Index (A&HCI); and the Emerging Sources Citation Index (ESCI). WoS database was used because it is the most widely used database for harvesting research metadata (Meho & Yang, 2007) and is considered the industry standard in most disciplines, including education (Ivanović & Ho, 2019). Also, WoS is multidisciplinary and thus allowed for the compilation of publications on ILSA research emerging from multiple disciplines (McVeigh, 2009). Moreover, WoS provides a wider coverage of citation information compared to other multidisciplinary databases such as Scopus and Google Scholar (Li et al., 2010) and better accuracy in journal classification (Wang & Waltman, 2016). Six complementary searches were conducted in the four WoS indexes on April 22, 2021. The first five searches were aimed to retrieve publications related to the five ILSA programs under investigation in this study and included the following key terms: (1) ["PISA 20*" or "Programme for International Student Assessment"], (2) ["TIMSS" or "Trends In International Mathematics And Science Study"], (3) ["PIRLS" or "Progress In International Reading Literacy Study"], (4) ["ICCS 20*" or "International Civic and Citizenship Education Study"], and (5) ["ICILS" or "International Computer and Information Literacy Study"]. The sixth search was intended to extract more general publications in ILSA research in education and included the key terms ["ILSA" or "International Large Scale Assessment"] and ["Education"]. All search terms used were imputed in the Topic field. Figure 1 presents the number of documents retrieved from this search strategy, which overall yielded a total of 2,477 publications, which were subsequently reduced to 2,287 after the removal of the duplicates. The documents were then filtered by type of document and only articles and reviews were extracted because other document types do not contain complete metadata information. No filter was applied for the language of publication or year of publication. This resulted in a total of 2,233 publications. For each publication, the following metadata was extracted: publication title, abstract, and keywords; publication year, journal, and number of citations, and references; and author's names and country of affiliation (i.e., the territory where the institution with which the author is affiliated is located).

Data analysis procedures
The data analysis comprised three steps. First, descriptive bibliometric analyses in Bibliometrix version 3.0 were performed to provide an overview of the development of ILSA research in education and the key players in the field (RQ1, RQ2). Bibliometrix is an open-source tool developed in the open-source R programming language that calculates multiple bibliometric and scientometric indicators for science mapping (Aria & Cuccurullo, 2017). More specifically, descriptive analyses using the biblioAnalysis and plot functions were performed to illustrate the dynamic growth of publication and citation data, the core journals and most influential publications, the leading authors and countries, and the most frequently used keywords in ILSA research. Frequency counts of the total number of publications for a given year and the total citations of the articles published each year provided an account of the evolution of the interest in ILSA research in education through time. The Standard Competition Ranking (SCR) was used to rank the productivity of journals, authors, and countries based on the total number of publications. The most influential articles were ranked based on the total number of citations. Second, multiple social network analyses were conducted in VOSViewer to explore the structure of research on ILSA in education (RQ3, RQ4, RQ5). VosViewer is a freely available software for viewing and constructing bibliometric maps Waltman et al., 2010). In VOSViewer, the units of analysis are journals, publications, citations, authors, countries, or keywords, depending on the focus of the analysis. To create the bibliometric maps, VOSViewer normalizes the differences between the units of analysis and builds a two-dimensional map where these units are represented as circular nodes. The size of the nodes accounts for their volume (e.g., number of publications in the dataset by an author) and the distance between the nodes reflects the similarity between these nodes. Nodes are connected with lines or edges, which indicate a relationship between nodes. Edge thickness indicates the strength of that relationship. Finally, VOSViewer groups closely related nodes into clusters, where each color represents a cluster (Van Eck & Waltman, 2014). Co-authorship analyses were used to explore the networks of scientific collaboration between authors and countries (RQ3) (Newman, 2001(Newman, , 2004. A co-citation analysis of the most cited journals in the dataset was performed to explore the disciplines underlying the structure of the field (RQ4) (Ding et al., 2000). A co-occurrence analysis of the most frequent keywords in the dataset was conducted to explore the knowledge base of research on ILSA in education (RQ5) (Rijsbergen, 1977). Additional information is presented in the Results section to facilitate the interpretation of the co-authorship, co-citation, and co-occurrence bibliographic maps. Third, additional social network analyses were conducted to further elaborate on the historical evolution of the field. A historiographic analysis in Bibliometrix using the his-Network and histPlot functions was performed to uncover the historical citation paths of the most cited documents in the database (Garfield, 2004). Finally, three keyword cooccurrence maps were generated in VOSViewer to describe the thematic evolution of ILSA research in education across three "time-slicing" periods based on the overall time distribution of keywords in the dataset. Additional details are provided in the appropriate Results section to facilitate the interpretation of the analyses about the historical evolution of ILSA research in education.

Growth trajectory
The developmental patterns of a field can be illustrated by examining the trends in publication and citation data. The 2,233 publications in ILSA in education research have been cited a total of 27,406 times. Figure 2 shows the dynamic growth of publications and citations in the field. Overall, the trends suggest a steady increase in the scholarly interest of ILSA in education research from 1997 to 2020 that can be organized into three stages:

Authors
All the 2,233 publications in the dataset have been published by a total of 3,508 researchers. The majority of authors in the dataset produce multi-authored documents (88%) and more than three-fourths of the publications are co-authored papers (77%). Table 2 shows The results of the co-authorship analysis of authors with five or more publications in the dataset (n = 122) are presented in Fig. 3. In this analysis, each node represents an author and its size reflects the number of publications in the dataset for each author. The edges connecting the nodes account for co-authorship relationships (i.e., co-authored publications), and the clusters can be interpreted as networks of scientific collaboration between authors (i.e., research groups). The map suggests the existence of numerous networks of scientific collaboration, with eight large research groups at the core and all smaller groups appearing in the periphery. These eight groups present some collaborative ties between themselves and are led by productive researchers in the field. The group led by Goldhammer-Luedtke (red cluster) seems to be situated at the center of the collaborative network and maintains links with most of the other connected clusters. A considerably large cluster not connected to these central groups is the one composed by Borgonovi, Santin, and other authors (yellow).

Countries/territories
A ranking of the 50 most productive corresponding countries/territories publishing research in the dataset is presented in Table 3. The USA appears as the leading country both in terms of number of publications and total citations, with Germany, China, Turkey, Spain, the United Kingdom and Australia, all with more than 100 publications each, following in the rankings. It should be noted that while China has been relatively  productive, Chinese students themselves are not proportionally surveyed. For example, in PISA, only students in Shanghai are sampled, while in TIMSS and ICCS, only students from regionally-associated jurisdictions, Taiwan and Hong Kong, are sampled. Nevertheless, the USA, Germany, Australia, and the United Kingdom are the counties with the highest number of citations. The United Kingdom emerges also as the country with the highest collaboration intensity in the ranking (i.e., rate of publications with at least one co-author from a different country), while Turkey and Spain demonstrate substantively lower levels of international collaboration. Figure 4 displays the networks of collaboration at the county/territory level. Only countries/territories with ten or more publications were included in the analysis (n = 41). Overall, modern research on ILSA in education seems to be generated by the collaboration of geographically proximal countries/territories. Germany appears at the center of the map and demonstrates co-authorship ties with other productive countries in the dataset, especially in Western (red cluster) and Northern Europe (blue, orange). The USA, the most productive country in ILSA education research, collaborates closely with Turkey, South Africa, and East Asia (Japan, South Korea). Australasian countries tend to cluster together (green). Finally, an international collaborative network between predominantly Spanish speaking countries is represented by the purple cluster.

Journals
The 2,233 publications in the dataset have been published in 616 journals. Table 4 shows the top core journals in ILSA research ranked by number of publications. Most of the  Figure 5 presents the yearly growth in the number of publications in the five core journals in the field and reveals that the journal Large-scale Assessment in Education has become the core journal in the field. Figure 6 presents the co-citation analysis of journals in the dataset with more than 50 citations (n = 283). In this analysis, the nodes denote journals in the dataset and their size is a reflection of the number of co-citations with other journals in the dataset. Two journals are co-cited if a third journal in the dataset cites a publication in these journals. Frequently co-cited journals are assumed to share theoretical and semantic grounds, and therefore clusters can be interpreted as disciplines from which modern ILSA research in education emerges. The analysis suggests that there are five distinctive but related disciplines contributing to the development of modern ILSA research in education. The green cluster groups journals in the fields of Psychology and Behavioral Sciences (e.g., Educational, Social and Personality, Counseling, Developmental psychology). The blue cluster pulls together several journals publishing research on multiple topics in the Educational Sciences (e.g., science education, school leadership/management, teacher education, school effectiveness, policy education). Journals in STEM Education (science education, math education) are grouped in the purple cluster. Journals in the field of Psychometrics and Statistics form the yellow cluster, and the red cluster agglomerates journals in the Social Sciences, including Economics and Sociology.  Table 5 presents the 10 most influential publications in ILSA research, ranked by the number of total citations. The most cited publication in the dataset is the article by Else-Quest et al. (2010), a meta-analysis exploring gender gaps in mathematics achievement using TIMSS 2003 and PISA 2003 data. Although the most influential publications address a variety of research topics, an overarching issue across the publications is the study of cross-country and cross-national patterns in variables measured by ILSA assessments. Other recurrent topics in these publications are equity in education, academic achievement, student interest, engagement and self-concept, and the globalization of education. The proportion of theoretical-empirical publications in the list is approximately 50-50, but there is a clear predominance of documents examining or discussing issues around the PISA test.
To elucidate the historical roots of the modern research on ILSA in education, a historical direct citation network analysis of the 50 most cited publications was performed (see Fig. 7). In this analysis, each node represents a highly cited paper in the field and the direction of the arrows account for the chronological change of research trends in the past. Each path represents the historical evolution of a research theme based on the chronological relations of the most relevant citations in the field, which permits the understanding of the genealogical antecedents and descendants of ILSA in education research. The historiography suggests the existence of nine historical developmental paths. Based on the findings, the earliest developments correspond to the blue and purple paths at the beginning of the twentieth century. The publication by Hanushek (2003), which explores human capital and quality education around the world using TIMSS data, serves as a seed for the development of broad research on (in)equity in education through ILSA data (blue path). The studies by Wilkins (2004) and Marsh (2004) on academic self-concept initiate research on academic self-concept and motivation, especially in Math and Sciences (purple). Another early development is marked by the red path, which represents the chronological evolution of research on education policy and globalization. This is an intricate path that starts with Simola's (2005) publication examining the social, cultural, and historical factors explaining the success of the Finish comprehensive school system. The next historical paths in the timeline correspond to the development of research on ICT literacy and engagement (green); model fit, robustness, and scaling of ILSAs in education (brown); gender gaps in ILSA research (grey); family socio-economic inequalities (pink); and most recently, science literacy (orange). All publications in the historical citation analysis are provided in Additional file 2 to guide the reader in understanding these developmental trends.

Keywords
Our bibliometric analysis revealed the presence of 4,774 author keywords (AKW) and 2,316 Keyword Plus (KWP) in the dataset. Table 6 shows the 10 most frequently occurring AKW and KWP on ILSA in education research and Fig. 8 presents the yearly growth of occurrence for the period 1997-2020. PISA and TIMSS clearly stand out as the most common AKW, while achievement, performance, and education appear as the most frequently used KWP.  Figure 9 presents the results of the co-occurrence analysis of all keywords in the dataset occurring five or more times (n = 649). In this analysis, nodes account for keywords in the publications and edges connect keywords that frequently appear together (i.e., cooccur) in the publications. Clusters accumulate frequently co-occurring words across publications and can be interpreted as research topics or themes addressed in the literature. The map suggests that the interest of researchers in the field has revolved around six major research topics: (1) Measurement and testing, with frequently occurring keywords such as model, validity, item response theory, tests, and measurement equivalence (red); (2) Educational policy and reform related to policy, governance, impact, accountability, standards, and OECD (green); (3) Education quality/effectiveness around issues of efficiency, competition, choice, and class size (light blue); (4) Equity in education pertaining to segregation, inequality, gender, attainment, and migration (dark blue); (5) Interpersonal relationships connected to school climate, family, socialization, victimization, and bullying (yellow); and (6) Motivation and beliefs (e.g., self-concept, self-efficacy beliefs, attitudes, and interests) (purple). There are three secondary additional topics that seem to have captured the attention of ILSA in education researchers. These are related to students characteristics and include student knowledge and comprehension (brown), attitudes and interests (orange), and engagement and values (pink).
The thematic evolution of the topics addressed in the literature of ILSA in education was explored through three additional co-occurrence analysis of the keywords included in the dataset. A co-occurrence map of frequently occurring keywords was generated for each of the three of the development of the field: emergence (1997-2006), fermentation (2007-2014), and take-off (2015-2020). The three maps are displayed in Fig. 10 using density visualization. These maps are similar to the one presented in Fig. 9 and display the most frequently occurring keywords. Labels represent highly frequently occurring keywords in the dataset; the larger the label, the higher its occurrence. Keywords are placed in the map based on their co-occurrence relationship. However, instead of displaying clusters of frequently co-occurring words as in Fig. 9, the density visualization colors each point of the map based on the density of items in that area. These colors can go from red to blue. High density areas are those with a large number of items in the neighborhood and are represented by reddish colors. Low density areas include fewer items and are colored in blueish. The density view is particularly useful to determine important areas of a map and was used in this study to identify the main research themes addressed in the dataset in each of the three developmental stages of modern ILSA research in education. Figure 10A provides an overview of key topics addressed in the emergence stage (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005)(2006). A small number of topics were addressed in this stage due to limited number of publications. Nevertheless, research in this period seemed to focus predominantly on TIMSS, achievement, and education, which are the areas with highest density (red). Some attention was provided too to issues around school-context, ability and performance; education and class size; math, and student knowledge and curriculum. Figure 10B illustrates the major research areas in the fermentation stage (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014)). An increase in the number of topics addressed during this period can be easily observed. Research on PISA, achievement, and education represent areas with the highest density in the map and constitute the main focus of the fermentation stage. Still, research on TIMSS, particularly related to mathematics, remains important. Other areas in the map with relatively high density (yellow) represent research on attitudes, motivation, and engagement; mathematics achievement; education quality and inequality (e.g., choice, attainment); issues around impact and schools; and scientific literacy.
The major research topics addressed in the literature during the take-off stage (2015-2020) are presented in Fig. 10C. PISA, achievement, and schools remain as the areas with the highest density in the map, while research on TIMSS dilutes. Research on school issues; motivation (e.g., efficacy, beliefs), and inequality (e.g., resources, tracking, opportunity) are additional major focuses of the take-off stage. Research on education policy (e.g., efficiency, governance, globalization, standards) and measurement (e.g., models, validity, measurement invariance, item response theory, tests) also emerge more clearly as important topics during this period.

Discussion
The purpose of this study was to map the growing global literature on ILSA research in education and to provide an overall picture of the recent development and structure of the field in terms of volume and growth trajectory, leading publications, journals, authors and countries, networks of scientific collaboration in the field, disciplines and historical developmental paths underlying the foundations in the field, and the research topics that have received the greatest attention in the literature, as well as their evolution. We used a novel approach relying on bibliometric techniques and a data-driven approach to contribute to the scholarly debate in the field and provide directions for future research.

Volume and growth trajectory of ILSA research in education
The findings of the study suggest that the scholarly interest in modern ILSA research in education has consistently increased from the end of the twentieth century to date, especially after 2014, which is consistent with previous reviews conducted for earlier periods (e.g., Hopfenbeck et al., 2018). Overall, the developmental paths of publication data reflect the patterns of an emerging field. The typical development of a field is organized in several stages. At its birth, a few scholars explore new ideas and theoretical frameworks and produce the first publications (preliminary stage). Later, these ideas permeate to the research community and a larger group of researchers generate a notable growth in the number of publications (exponential growth stage). This is followed by a period in which the number of publications remains stable and ultimately declines when interest in the area is lost or replaced by new ideas (saturation phase) (Dabi et al., 2016;Keathley-Herring et al., 2016). Based on our findings, ILSA research in education seems to be situated at the beginning of the exponential growth stage, or what we have denominated as the take-off stage. Based on the growing citation patterns over time (Fig. 1), we would expect that a new suite of papers from the 2010 to 2015 era that holds more specific relevance (e.g., to student ICT-related behaviors, well-being) to begin to establish prominence in the field over the 2020 to 2025 period. Several reasons can potentially explain the increasing trends in publication and citation data. First, the knowledge derived from ILSA programs accumulates over time due to their cyclical nature, offering researchers a periodically larger pool of updated data. Second, the number of jurisdictions participating in ILSA programs also increases with each cycle, which makes ILSA data relevant to a continuously wider audience. Third, new assessment domains have been incorporated to ILSA programs in recent cycles (e.g., creative problem solving, student well-being), further expanding their interest to the research community. Fourth, international organizations (e.g., IEA, OECD, World Bank, Soros Foundation, UNESCO, IDB) have been committed to fostering research capacity for analyzing ILSA data over the last two decades, especially in low-and middle-income countries (Ababneh et al., 2016;Wagemaker, 2013). Fifth, the more pronounced growth in publications can be partially explained by taking into account that specialized journals Page 25 of 33 Hernández-Torrano and Courtney Large-scale Assess Educ (2021) 9:17 in the field have been launched in the last 10 years. Finally, concerns about the implications of ILSA programs in education policy-making have also been on the rise, which has likely contributed to the increased productivity in the field.

Leading journals, articles, authors, and countries in the field
The analysis of the core journals in the field suggests that ILSA research in education is, not surprisingly, an educational topic. Most publications in the field are disseminated in educational journals, although research emerging from other disciplines also exists, with some of the core journals specializing in psychology, economics, and measurement and psychometrics. Also to note, most of the leading journals in the field publish research predominantly in English-language, although there are journals disseminating research in other languages, such as German, Turkish, and Spanish, which illustrates the current global interest in the field. Our analysis of journal yearly growth points to the sharp rise in prominence of a subject-specific journal, Large-Scale Assessments in Education, and the relative decline of three journals that burgeoned in the 2005 to 2011 period (Journal of Science and Mathematics Education, International Journal of Science Education, and Education and Science). This suggests that the launch of Large-Scale Assessments in Education in 2013, as part of a commitment to promote quality ILSA research in education and enhance consequential validity of the field, was timely with the journal well-positioned to play a key role in collating and disseminating ILSA research over the next five to ten years. Specialized journals are indeed capital for the development of a field because they provide an avenue for the dissemination of specialized knowledge, the exchange of ideas, and the formation of a scholarly community of experts in the field (Vanderstraeten et al., 2016). However, the fact that there is only one specialized journal in the field may be limiting the development of the field. Publishers and professional organizations may consider launching competing ILSA-specific journals in light of the growing volume and global interest in modern ILSA research in education revealed in this study. The analysis of the most influential publications revealed that research in the field tends to focus on cross-national differences in student outcomes measured in the ILSA programs. This is to be expected, as ILSA provides a valuable tool for improving the quality of education systems by learning from the experiences of initiatives and practices implemented in diverse contexts. An interesting insight derived from our analysis, however, is the particular prominence of the gender-and policy-centric publications from Else-Quest et al. (2010) and Grek (2009), respectively, establishing that these research themes are quite central to ILSA research in education. In addition, most of the influential publications seemed to revolve around the analysis and discussion of the results from the OECD PISA program, which indicates that the PISA test captures the greatest attention in the field at the moment.
Research in the field is generated by scholars affiliated to different countries participating in ILSA programs, especially in the US and Germany. This is despite the apparently low interest of US academics on ILSA research (Green Saraisky, 2015). In terms of leading authors, our analysis suggests that the scholarly community of ILSA research in education is composed by a relatively large group of experts that produce a similar number of articles. This suggests that a depth of expertise, a constituent generally necessary for a knowledge society (Grundmann, 2016), exists in ILSA research in education helping to ensure the ongoing sustainability of the field. Our wide coverage of the international literature demonstrated that while predominantly English-speaking are core to ILSA research in education, non-English speaking jurisdictions appear to be emerging as contributors to the field. Here, China, Spain, and Turkey stand out as major contributors of research in ILSA in education, now outperforming the United Kingdom and Australia. Interestingly, the productivity of these three countries seems to rely on different publication patterns. China seems to maintain high productivity by collaborating with researchers from overseas, whereas Spain and Turkey seem to produce research in the field predominantly within national borders and are likely to disseminate it in local languages via publications in national journals (e.g., Egitim Ve Bilim-education and Science, Revista de Educación). Still, our study shows that there is a relative scarcity of research produced in lower-middle-income countries (LMIC) that participate in these ILSA programs. Thus, the result of this study suggest that there might not be a Western bias in modern ILSA research in education, as suggested in previous studies (e.g., Domínguez et al., 2012;Hopfenbeck et al., 2018;Lenkeit et al., 2015;Lindbald et al., 2015), but a predominance of research produced in higher-income countries. Some of the reasons for this may pertain to LMIC's less developed research expertise and relative scholarly isolation, and language-based access issues given the disproportionate number of Englishlanguage journals representing the field (Wang et al., 2006;Ynalvez & Shrum, 2011).

Collaboration patterns in the production of ILSA research in education
There is evidence that research collaboration is growing around the world (Henriksen, 2016;Kliegl & Bates, 2010). Research collaboration is generally considered as a key contributor of the development of a discipline and as a metric of excellence and quality (Coccia & Wuang, 2016, Freshwater et al., 2006Kim, 2006;Rolfe et al., 2004), specially in some global regions (e.g., Europe) (Kwiek, in press). Also, research collaboration has a positive influence on productivity and academic impact (Abramo et al., 2017;Kato & Ando, 2013) and the development of research capacity in developing research environments (Barrett et al., 2011). Based on the findings of this study, collaboration in ILSA research in education seems to be the norm. This is a positive sign, considering that researchers in the humanities and social sciences tend to collaborate less often compared to those in the physical and life sciences (Kwiek, 2018, Yemeni, 2019. More specifically, the present study identified the existence of multiple networks of scientific collaboration at the author and country levels. Eight interrelated research groups are situated at the core and can be considered as influential collectives moving the field forward. These groups are led by some of the most productive researchers in the field and represent international and national collaborations between universities in different regions. International scientific networks tend to be characterized by co-authored publications by authors affiliated to geographically proximal countries and territories belonging to the same global regions (e.g., Europe, East Asia, Australasia) or authors who share cultural and linguistic ties (e.g., Spanishspeaking countries). This is a typical pattern in the social sciences (Mosbah-Natanson & Gingras, 2014), and such patterns of co-authorship could be monitored in future bibliometric studies so as to track shifts in the circulation and exchange of diverse ideas/ phenomena identified as potentially important to the development of the field (Barrett et al., 2011;Kato & Ando, 2013).

Intellectual roots of ILSA research in education
One of the aims of our study was to identify the intellectual roots of ILSA research in education. In this sense, we revealed two interesting insights into the development and structure of the field. First, we observed that the knowledge in modern ILSA research in education has emerged from the interdisciplinary research conducted in five related disciplines: educational sciences, STEM education, educational psychology, social sciences, and measurement and psychometrics (Fig. 6). This is another hopeful feature of the field, considering that interdisciplinary research is capital to integrating diverse approaches and providing holistic perspectives needed to solve complex problems and inform better decision-making (e.g., Aboelela et al., 2007). Second, our historical direct citation network suggests that ILSA research in education has developed grounded on nine distinctive historical paths broadly accounting for research on equity, quality education, self-efficacy, STEM education, and education policy. These nine historic ILSA paths point to the breadth of the field and contribution to educational research in general. Earlier historical paths (i.e., most prominent at the early stages of the development of the field) seem to include research in educational quality, policy, and globalization, and (in)equality. The other developmental paths have been consolidated more recently, and account for research on science literacy, family socio-economic inequalities, gender gaps, computer skills and ICT engagement, and ILSA measurement. The densest and most intricate are the ones for research in policy and globalization, quality, and equity in education, suggesting that these focal areas will continue being an important part of future ILSA research.

Research themes/topics and their evolution through time
A major contribution of this study is the identification of the research topics that have captured the greatest attention of researchers in the field and their evolution since its origins until the present. Unlike previous studies that used content analysis techniques, we explored this issue using a data-driven approach. The major findings derived from our analysis are discussed below.
First, results from the analysis of the most frequently occurring keywords used in the literature suggest that work on the OECD "PISA" program supersedes "TIMSS", "PIRLS", "ICCS", and "ICILS". This is understandable given its more diverse set of subject areas, broader global reach, and more regular triennial administrative cycle.
Second, the results of our co-occurrence analysis for the total period analyzed (1997-2020) elucidated that ILSA research in education has revolved around several major research topics. Some of these topics, such as measurement and testing, equity and inequality, and cognition, have been identified in earlier studies (Domínguez et al., 2012;Hopfenbeck et al., 2018;Lenkeit et al., 2015). However, our search strategy and novel data analysis approach allowed us to identify a few additional topics not previously reported in the literature. This included issues related to educational policy and reform, education quality and effectiveness, interpersonal relationships, student motivation and engagement, and student knowledge, attitudes, and values. Overall, most of these topics are connected with prominent research themes currently explored by education researchers (see Huang et al., 2020) and indicate that ILSA studies appear to be well-placed to make an ongoing major contribution to educational research in general. Third, the co-occurrence analysis for each developmental stage in ILSA research in education revealed some interesting trends. For example, research on TIMSS was more popular during the early stages of modern ILSA research in education (when PISA data and reports were not available), but a focus on PISA research has become clear in recent years. Also, some topics seem to have continued to remain popular in the field. These are issues connected to student achievement, performance, and education. The attention paid to other themes has evolved through time, with some progressively increasing in popularity as we approach the present. For instance, research on ability, class size, student knowledge, and curriculum were the most popular during the emergence stage (1997)(1998)(1999)(2000)(2001)(2002)(2003)(2004)(2005). In the fermentation stage (2007)(2008)(2009)(2010)(2011)(2012)(2013)(2014), research on motivation and beliefs, education quality and effectiveness, and equity and equality in education appear as frequently addressed topics. In the most recent period, the take off-stage (2005-2020), these research themes consolidated and expanded to include issues on measurement and testing, education policy and globalization. This evolution partially mirrors the evolution of the research interests in education research (Huang et al., 2020), providing further evidence of the potential of ILSA research to inform international research in education and related disciplines.

Limitations
There are several limitations that should be considered. First, we only retrieved documents indexed in the Web of Science database, so we could have omitted relevant publications disseminated in journals not included there. Second, Web of Sciences, as other interdisciplinary databases (e.g., Scopus), is biased against research in the Humanities and the Social Sciences and published in other languages other than English, which might have excluded some other relevant publications (Mongeon & Paul-Hus, 2016). Third, only journal articles and reviews were examined in this study, omitting other types of publications that have also contributed to the development of the field. The consideration of alternative types of publications not included in this study could alter the ranking of leading authors and institutions, as well as the networks of scientific collaborations. Still, we believe that our search strategy, coupled with novel data analysis procedures, provide a meaningful and insightful contribution of the literature and the broadest account of the development and structure of the field provided to date. Future studies can replicate the findings of this study by using alternative databases (e.g., Scopus, ERIC) and including other types of documents beyond the ones included in this study (e.g., book chapters, conference procedures, grey literature).

Conclusion
The present study provides a broad overview of the development and structure of ILSA research in education based on the five major recent and ongoing ILSA programs in the field. There are several positive features that point to the progressive maturation of the field and its good standing to contribute to educational research in general. ILSA research in education is an emerging field currently situated at its take-off stage and