Who likes to learn new things: measuring adult motivation to learn with PIAAC data from 21 countries

Despite the importance of lifelong learning as a key to individual and societal prosperity, we know little about adult motivation to engage in learning across the lifespan. Building on educational psychological approaches, this article presents a measure of Motivation-to-Learn using four items from the background questionnaire of the Programme for the International Assessment of Adult Competencies (PIAAC). We used multiple-group confirmatory factor analyses for ordered categorical data to investigate the scale's dimensionality and measurement invariance across countries. Regression analyses were used to investigate the scale's criterion validity. Results show that the proposed four-item scale fits the data considerably better than the original six-item scale labeled Readiness-to-Learn. Further analyses support the scale’s configural, metric (weak) and partial scalar (strong) measurement invariance across 21 countries. As expected, Motivation-to-Learn has significant relations to the working population’s engagement in learning in terms of participation in non-formal education over the last 12 months. These relations remain relevant after taking literacy as an indicator of level of education into account. The Motivation-to-Learn scale presented here may be used to indicate adult motivation in cross-country comparisons. The potential of using the scale in future PIAAC analyses and research on adult learning is discussed.

In 2013, the Programme for the International Assessment of Adult Competencies (PIAAC) drew the attention of politicians and researchers to adult learners, a previously neglected group within educational psychology. Unlike full-time students in the PISA, adults participating in the PIAAC vary substantially in their past, present and potential future engagements in educational activities. Further education during adulthood is central to catching up on and maintaining competencies. However, the majority of existing research on participation in further education focuses on individual and contextual socio-demographic predictors (OECD 2005), whereas adult MtL is relatively understudied despite its theoretical and empirical relevance for adult lifespans (Courtney 1992;Gorges 2015).
It is difficult to account for motivation in the PIAAC because-unlike PISA-it does not measure established motivational psychological constructs. However, section I of the background questionnaire pertains to psychological factors of skill acquisition that provide an opportunity to measure motivation via a newly developed scale hitherto referred to as Readiness to Learn (RtL). Building on the RtL items, the goal of the present paper is twofold: (1) to examine whether the RtL items also assess adult MtL and (2) to empirically investigate the psychometric properties of the resulting MtL measure. We start with a review of potential theoretical constructs that constitute adult MtL and map the RtL items onto established MtL measures from educational psychology research. We then examine the MtL scale's factorial validity and test for measurement invariance across countries using confirmatory factor analyses. Finally, we check the scale's criterion validity based on its relations with participation in further education as assessed in the PIAAC background questionnaire, while accounting for socio-demographic determinants of age, employment, and literacy as an indicator of level of education (OECD 2005). In sum, our goal is to build a psychometrically sound scale that may be used as a measure of general adult MtL in future investigations using PIAAC data.

Motivation-to-Learn in educational psychology
Educational psychology produced manifold theoretical approaches to define MtL in educational contexts (Schunk et al. 2014). While earlier research mostly differentiated quantities of motivation, later research shifted its focus from quantities to qualities of motivation providing fruitful explanation for and predictions of individuals' experiences and behaviors. The latter expounded on student MtL during primary and secondary school, which are both relevant for research on adult learning (Courtney 1992;Gorges 2015;Gorges and Kandler 2012). Furthermore, what motivates task choice typically motivates cognitive task engagement (Pintrich and Schrauben 1992), use of deeplevel versus surface-level learning strategies (Ames and Archer 1988), and consequential learning outcomes (cf., Schunk et al. 2014). Therefore, our review of research focuses on motivational constructs that predict engagement in learning.
One major line of research focuses on the distinction between intrinsic and extrinsic forms of motivation (cf., Rheinberg 2010; Ryan and Deci 2000). Intrinsic motivation refers to the (anticipated) enjoyment gained from task engagement independent of extrinsic rewards or subsequent consequences. Intrinsic motivation is closely related to the concept of interest, and is a positive emotional and personal valence attached to a particular object of interest or an activity (cf. Renninger et al. 1992). However, intrinsic motivation has been conceptualized as situation-specific, whereas interest-based motivation can be situation-specific or reflect an enduring personal characteristic (cf., Renninger et al. 1992;Schiefele 2009). Intrinsically motivated behavior is driven by positive incentives inherent in particular activities and experiences in school contexts (cf., Schunk et al. 2014) and beyond (e.g., Durik et al. 2006). By contrast, extrinsic motivation refers to task engagement due to external incentives or punishments (cf., Rheinberg 2010; Ryan and Deci 2000). Extrinsic motivation may lead to selection of a task that contributes to the learner's short-or long-term personal goals (e.g., career aspirations). Intrinsic motivation is typically assessed using items that explicate people's affective and/or cognitive evaluations of a particular object or activity, whereas extrinsic motivation is assessed with reference to external incentives for task engagement.
A second productive strand of motivational research refers to goal-directed behavior. People pursue higher order goals across specific tasks and situations, and this goal orientation provides underlying reasons for engaging in learning activities. In educational settings, the distinction between performance and mastery goal orientation has received considerable attention (Maehr and Zusho 2009). Learners with performance goal orientation strive to outperform others and demonstrate their abilities, whereas learners with mastery goal orientation strive to develop their skills. Goal orientation can be conceptualized as either situation-or person-centered (Maehr and Zusho 2009). The former refers to active goals guiding a learner's behavior in a particular learning situation; the latter reflects an "enduring personality disposition" guiding action across situations (Kaplan and Maehr 2007, p. 163). The respective conceptualization is typically implied in the instructions used to assess goal orientation. For example, Harackiewicz et al. (1997) asked university students to report on their goal orientation with respect to a specific course. Accordingly, implementing a person-centered conceptualization of goal orientation, individuals may be asked to report on their mastery goal orientation using a measure that refers to their general-i.e., not situation-specific-goal orientation. In this case, individuals with high mastery goal orientation should be more likely to embrace opportunities to develop their skills and expand their knowledge than individuals with low mastery goal orientation (Gorges et al. 2013;Schunk et al. 2014). In addition, mastery goal orientation relates to the use of deep-level learning strategies such as elaboration (Ames and Archer 1988). Hence, goal orientations explain differences in individual task preferences, and the quality and extent of cognitions and experiences during task engagements (cf. Maehr and Zusho 2009).

Items in the PIAAC background questionnaire
PIAAC "provides a rich source of data on adults' proficiency in literacy, numeracy and problem solving in technology-rich environments [ICT]-the key information-processing skills that are invaluable in 21st-century economies-and in various 'generic' skills, such as co-operation, communication, and organizing one's time" (OECD 2013a, p. 3). These skills are assumed to be critical information-processing competencies important for adults in different life contexts, like work and social participation.
To gain a deeper understanding of skill development and skill differences, the PIAAC background questionnaire includes a range of questions on generic skills, everyday activities, and the subjective perception of matching skills and workplace requirements for example (OECD 2011). Although not specified in further detail, the background questionnaire features a so-called Readiness-to-Learn scale (Yamamoto et al. 2013, p. 24). It is important to note that this scale has also been labeled meta-cognitive abilities in the conceptual framework of the background questionnaire (i.e., abilities that "structure the learning process and affect the efficiency with which new information is being processed" OECD 2011, p. 52), learning styles in the reader's companion (i.e., "interest in learning, approach to new information" OECD 2013a, p. 39), or learning strategies in other sources (i.e., "ability to acquire […] skills after leaving education" Allen et al. 2013, p. 10). The fact that all these labels refer to the same items illustrates a common challenge in educational psychology that may be termed "jingle-jangle fallacy", which refers to the problem of differentiating between the multitude of different motivational constructs and terms used in the literature (see Murphy and Alexander 2000 for a thorough review of motivational terminology). For PIAAC investigators, including RtL 1 items was considered crucial because "there is good empirical evidence that [these] learning strategies affect the acquisition of skills and educational attainment" (OECD 2011, p. 53;see also OECD 2013b).
According to the conceptual framework of the background questionnaire, RtL items go back to the work of Kirby and coauthors on approaches to learning (OECD 2011). Building on Biggs (1985) and Entwistle and Ramsden (1982), Kirby et al. (2003) conceptualize approaches to learning as "a set of motives and strategies" (p. 32). While Biggs (1985) distinguishes different forms of motivation on the one hand and different learning strategies on the other, Kirby et al. (2003, p. 50) argue that each form of motivation is inherently linked to a specific strategic approach to learning (e.g., intrinsic motivation is related to deep-level learning). Hence, they built subscales comprising both motivational items (e.g., "In my job one of the main attractions for me is to learn new things") and items referring to learning strategies ("In trying to understand new ideas, I often try to relate them to real life situations to which they might apply").
Existing instruments to assess approaches to learning within the school context (Marton and Saljö 1976;Entwistle and Ramsden 1982) are not consistent with Kirby et al. (2003) approaches to learning in work contexts. Therefore, newly developed items that "aim to measure the extent of elaborate or deep learning" were implemented in the PIAAC background questionnaire (OECD 2011, p. 53). While the initial goal was to measure deep and surface approaches to learning using 13 items, only the six items listed in Table 1 were retained after preliminary studies (OECD 2010a). Smith et al. (2015) examined the psychometric properties of this 6-item RtL scale for the PIAAC US-sample. Although they state that the underlying theoretical constructs are unclear they argue that RtL relates to concepts from educational psychology and adult education research. Their findings show that the scale fails to show unidimensionality unless they allowed for correlated errors between item 1 and item 3. Including error correlations in their confirmatory factor analyses lead to an acceptable fit, which, however, is statistically questionable because it indicates that some items share more than what is captured by the one RtL factor. According to Smith et al. (2015) the RtL scale with correlated errors shows strong measurement invariance across gender, age groups, Table 1 Items of the readiness-to-learn scale in the piaac background questionnaire Verbatim questions: "I would now like to ask you some questions about how you deal with problems and tasks you encounter. To what extent do the following statements apply to you?"; answers have been recorded using the following response options: 1 not at all, 2 very little, 3 to some extent, 4 to a high extent, 5 to a very high extent  (Trautwein et al. 2012); mastery goal orientation (Harackiewicz et al. 1997) and employment status, although the scale failed to show such invariance across educational levels. Thus, although the six-item scale has successfully been used by M. C. Smith, Rose, Ross-Gordon, and Smith et al. (2014) to predict literacy skills above and beyond socio-demographic factors, findings regarding the factorial structure of the RtL scale suggest that-at least in the US sample-it potentially lumps together what are actually two distinct theoretical constructs underlying the six items.

From readiness-to-learn to motivation-to-learn
The conceptualization of approaches to learning used to develop the RtL scale in the PIAAC background questionnaire explicitly combines motivation to learn and learning strategies, which are theoretically and empirically distinct constructs according to the educational psychology literature (see examples in Table 1 and research using the Motivated Strategies for Learning Questionnaire; cf., Duncan and McKeachie 2005). Thus, combining both in one scale obliterates the theoretical existence of two important and discrete prerequisites of successful engagement in learning. Therefore, we carefully examine the items at hand in order to differentiate MtL, or the use of deep learning strategies, within the theoretically conglomerate RtL scale.
A closer look at the six items reveals that they may be grouped into items expressing intrinsic forms of motivation (what people "like" to do in items 2, 4, and 5), and spontaneous behaviors people show in particular situations (what people "try" or "look" for in context in items 1, 3, and 6). Two of the behavioral items (1 and 3) match expressions typically used to assess the deep-level learning strategy of elaboration (Duncan and McKeachie 2005) but neither of the two items includes information on the motivation for people to use such a strategy. However, item 6 contains not only a spontaneous behavior but also a motivation for this behavior ("to make it clearer"). Thus, in addition to intrinsic motivation in items 2, 4, and 5, item 6 also focuses on aspects of individual MtL and skill development.
Looking more closely, item 2 refers to the positive experience of learning new things. Individuals agreeing with this statement should enjoy learning and willingly engage in learning opportunities. Motivational measures of the experiential quality of learning primarily tap intrinsic forms of motivation. Accordingly, the wording of this item is close to a typical measure of intrinsic motivation (e.g., Trautwein et al. 2012). Items 4 and 5 refer to the satisfaction one gains from task engagement with the purpose of understanding if not mastering difficult things. Thus, these items relate to a mastery goal orientation. Finally, item 6 describes goal-directed behavior explicitly; the goal is to understand "something unclear". Given this reason behind one's actions, the item relates to mastery goal orientation and, thus, is motivational as well. Table 1 illustrates the substantial overlap of the six RtL items with items from educational psychology research. Overall, we conclude that these four items show theoretical MtL content validity based on the discussion outlined above, and thus proceed to test their construct validity.

Cross-country comparisons based on PIAAC data
One of the PIAAC's outstanding features is collection of cross-national data offering large-scale comparisons of multiple OECD countries and cultures. PIAAC data have been collected from representative samples of the adult population aged 16-65 in 24 countries (OECD 2013a). However, comparisons across countries require measurement instruments-standardized skill assessments and scales-to be equivalent across countries. In order to draw reasonable conclusions from such analyses we must be sure that the meaning of the scale of interest or the items used to measure it are not culturally influenced. For this purpose, it is necessary to examine the degree of measurement invariance by country (Chen 2008).
Measurement invariance (MI) in multiple group comparison may pertain to different parameters of psychological assessment (Chen 2008;Sass 2011). In the case of continuous observed variables, the following parameters are constrained to establish different levels of MI across groups: intercept parameters, factor loadings, and residual variances. Hence, in the case of continuous observed variables, four levels of MI are typically tested (see e.g. Widaman and Reise 1997): (1) configural, (2) weak or metric, (3) strong or scalar, and (4) strict. Configural MI means that the parameters (e.g., factor loadings, intercept parameters, and residual variances) in the measurement model are not constrained, but freely estimated in all groups. Weak (or metric) MI implies that the factor loadings are set equal in all groups. However, the intercept and residual variances are allowed to vary across groups. Strong (or scalar) MI is established if the factor loadings and intercept parameters are held equal in all groups. For identification purposes the means of latent factors is fixed to zero in one group (i.e., the reference group) and freely estimated in all remaining groups when testing for strong MI. Strict MI requires that researchers constrain all parameters in the measurement model. Hence, strict MI is established if the intercept parameters, factor loadings and residual variances are held equal in groups. From a psychometrical point of view, strong measurement invariance is sufficient to ensure that the same construct is measured in all groups and to compare the means of the latent factor across countries (Widaman and Reise 1997).
In this study, we treated the items as ordered categorical observed variables. Because factor models with categorical observed variables are based on conditional probabilities, intercept parameters are fixed to zero and residual variances are fixed to 1 using a factor analytical approach (Millsap and Yun-Tein 2004). This means that researchers may impose restrictions on thresholds and/or factors loadings in order to test the level of MI in case of ordered categorical data.

The present study
Building on the theoretical analyses of the PIAAC background questionnaire items, the goal of the present study is to test whether items 2, 4, 5, and 6 reflect the underlying construct of MtL. As the PIAAC background questionnaire does not contain other motivational scales, our analyses focus on testing factorial validity (i.e., factor structure), measurement invariance (MI) across groups, and the relation of MtL to participation in further education. We approach the empirical test of the scale in three steps. First, we use multiple-group confirmatory factor analysis (MG-CFA) to investigate whether the six items actually measure an underlying theoretical construct (factorial validity), and whether the proposed four-item-scale fits the data better than the original six-itemscale. Second, we examine whether the scale measures the same underlying construct in all PIAAC countries by testing the degree of MI across countries. We treat the items as ordered categorical items and use a multiple-group graded response model (Samejima 1969 see below), which is statistically more appropriate but has not been applied so far to these data. Finally, we scrutinize the scale's criterion validity by investigating its relations to participation in further education. We expect that MtL shows significant relations to participation in both job-related and non-job-related further education above and beyond typical socio-demographic factors (e.g., education, employment and age).

Data
Our analysis includes PIAAC data from the 21 countries that met the psychometric prerequisites and provided representative samples (OECD 2013). 2 It should be noted that completed cases in PIAAC are defined by an international consortium on standards and guidelines (OECD 2010b). Here also literacy-related non-respondents (LRNR) are assigned (for whom age and gender were collected by the interviewer) to completed cases and were handled as part of the PIAAC net sample. Literacy-related reasons for non-interviews or breakoffs to the background questionnaire are, e.g., language problems, mental disabilities, etc. In the countries included in our analyses, these respondents comprise less than 5 % of the population.

Measures
All scales and socio-demographic information were part of the PIAAC background questionnaire. The administration of the background questionnaire was a computerassisted personal interview (CAPI).
Our core construct Motivation-to-Learn-what we label MtL-is measured using items 2, 4, 5, and 6 listed in Table 1 whereas the RtL scale comprises all six items. Responses were recorded on a 5-point scale (see Table 1).
Level of education (based on the variable EDCAT6) is classified according to the International Classification of Educational Attainment (UNESCO 1997). Following conventions from large-scale multi-national studies we distinguish three levels (high, intermediate, and low) to describe our sample (OECD 2013c;Heisig and Solga 2015).
Participation in further education was measured as participation in non-formal, noncompulsory education during the 12 months prior to data collection indicated by the derived NFE12 variables. Non-formal education is defined as "any organized and sustained educational activities" (OECD 2011, p. 39) which are not "provided in the system of schools, colleges, universities and other formal educational institutions that normally constitutes a continuous 'ladder' of full-time education" (OECD 2011, p. 34). Hence, providers of non-formal education include adult education centers (e.g., courses on health-related issues, foreign languages, culture, or use of information technology), foreign language schools, human resource development programs and more. Non-formal education may be job-related (NFE12JR) or non-job-related (NFE12NJR).
Literacy is used as a continuous indicator of level of education in our analyses of criterion validity due to the close relation of these two constructs (OECD 2013a). Literacy is defined as the capability to understand, interpret, and use information in written form as productive and goal attaining knowledge (Jones et al. 2009;OECD 2013a). Literacy in PIAAC was measured with tasks like reading and understanding medical instruction leaflets, a short newspaper article, or a job description in an online portal, for example. Table 2 gives an overview of the number of cases per country, distributions of relevant background variables, and participation in further education. 3 The proportion of female participants is evenly distributed across countries. In all countries except Italy over 50 % of the population has a medium or high educational attainment. The age distribution shows that the majority of respondents are working aged (around 40 %). Descriptive statistics and zero-order correlations between all items are shown in Table 3.

Statistical analyses
In the present study, we conducted multiple-group confirmatory factor analysis (MG-CFA) for ordered categorical observed variables. Fitting different MG-CFA models scrutinizes the hypothesized one-factor structure in all 21 countries simultaneously, tests   the degree of measurement invariance using classical model fit criteria, and adequately models the measurement level of the scale. More specifically, we used a multiple-group graded response model for the analyses (Koch and Eid 2015;Samejima 1969), which can be expressed as follows: where i = (1, . . . , i, . . . I) are indicators (e.g., items or item parcels), and g = 1, . . . , g, . . . , G represent groups (e.g., countries). In the above model (see Eq. 1), it is assumed that there is a continuous normally distributed latent response (Y ig * ) for each observed variable Y ig . The continuous normally distributed latent response variable (Y ig * ) can be decomposed in a similar way as in confirmatory factor models for continuous variables, assuming an (additive) intercept parameter α ig , a weighted latent factor λ ig η g and a measurement error ε ig variable (Koch and Eid 2015). The observed variables Y ig are linked to latent response Y ig * assuming a threshold relationship (Eid 1996;Millsap and Yun-Tein 2004;Muthén 1984): The parameters κ isg are threshold parameters that divide the continuous latent variable Y ik * into S categories.
In order to identify and estimate the model (Eq. 1) certain restrictions have to be made. First, all intercepts α ig have to be fixed to zero, as there are no intercept parameters in case of categorical observed variables. Second, for each factor one loading parameter has to be fixed to a value greater than zero (usually to one). Third, the variances Var ε ig of the error variables have to be fixed to a value larger than zero (usually to one) in one group. In many SEM packages (e.g., Mplus, Muthén and Muthén 1998) the latent mean of η g is fixed to zero by default, whereas the variance of η g is freely estimated in all groups. Mplus allows two ways to formulate and estimate such a model: the Delta and the Theta parameterization. The Delta parameterization does not allow residual variables ɛ ig to be part of the model and uses scaling factors instead (see Muthén and Asparouhov 2002). The Theta parameterization used here allows residual variables to be freely estimated in all groups, but the reference group. However, in order to obtain a model that is equivalent to an item response model (i.e., probit model or graded response model), the residual variances need to be fixed to unity (see Eid 1996;Samejima 1969;Takane and de Leeuw 1987). In order to test strong or scalar MI in this model, it is necessary to impose all of the following restrictions (see Eid and Kutscher 2014): 1. The threshold parameters κ isg are equal in all groups (i.e. κ isg = κ isg' = κ is ). 2. The factor loadings λ ig are equal in all groups (i.e. λ ig = λ ig' = λ i ). 3. The variances Var(ɛ ig ) are equal in all groups e.g., Var ε ig = Var ε ig ′ = Var(ε i ) = 1 . (1) 4. In one group, the expected value (mean) of the latent factor has to be fixed to 0 [E (η 1 ) = 0], whereas it is freely estimated in the remaining groups.
Weak or metric MI requires only restrictions 2 & 3, while configural MI just requires restriction 3. Each type of MI is considered full if the restrictions apply to all items, and partial if the restrictions apply to most but not all items (Byrne et al. 1989;Steenkamp and Baumgartner 1998). Partial MI requires that the model parameters of interest (i.e. the factor loadings and/or the thresholds) of at least two items remain invariant across all groups; the invariant items then define the meaning of these latent variables (Byrne et al. 1989;Steenkamp and Baumgartner 1998). In our study we tested for partial MI in cases where full MI could not be established. Comparing the fit of these models tests the different levels of MI.
Commonly, the test of exact model fit (i.e. Chi square or Chi square difference test), the root-mean-square-error-of-approximation (RMSEA, Steiger 1990) and comparative fit index (CFI, Bentler 1990) are used for model testing. A model is said to fit the data well, if (1) the p value of the Chi square test (or the Chi square difference test) is equal or larger than 0.05, (2) the RMSEA is below 0.06 (or fit acceptably if below 0.08, see Chen et al. 2008;Hu and Bentler 1999) and the CFI is greater than 0.97 (or fit acceptably if greater than 0.95, see Schermelleh-Engel et al. 2003). Note that the Chi square test has been subject to criticism for being too sensitive in large samples, which is often the case in cross-cultural studies with hundreds or thousands of observations in each country (Schermelleh-Engel et al. 2003;Nagengast and Marsh 2014). In such situations, the Chi square test (as well as the Chi square difference test) will often reject the model of interest as a result of its large power to detect even small (marginal or practically insignificant) parameter deviations. Cheung and Rensvold (2002) and Chen (2007) therefore provided guidelines to compare the fit of competing models. They argue that a decrease of model fit is said to be practically insignificant when the RMSEA drops by less than 0.015 and when the CFI drops by less than 0.01.
To estimate the fit of the model we used weighted least square mean-and-variance adjusted (WLSMV) estimation implemented in Mplus (Muthén and Muthén 1998). We included sampling weights as recommended for this type of analysis. Note that Mplus uses pairwise present data when using WLSMV estimation and does not permit a full information approach. Given the fact that the percentage of missing data was very low (less than 0.05 % regarding the items from the RtL scale), we considered this procedure as tolerable.
If full measurement invariance (e.g. equivalence of all factor loadings across all groups) should not hold, partial measurement invariance will be tested (Byrne et al. 1989;Steenkamp and Baumgartner 1998). Additionally, we inspected modification indices to identify potential misfit within each country. A modification value corresponds to the expected change in Chi square value (model fit) if certain parameter restrictions (e.g., uncorrelated measurement errors) are relaxed.
To evaluate the criterion validity of MtL, we investigated its relation to participation in further education. In doing so, we used the IEA-IDB analyzer 4 (IEA 2012) to see how MtL relates to participation in education both with and without controlling for literacy as an indicator of level of education. We first specified factor scores obtained from Mplus indicating MtL as a single predictor of participation in further education. Then we specified literacy as a second predictor of participation in further education. Because employment, age, and language are further key socio-demographic factors predicting participation in further education we reduced our sample to the employed working population aged 30-49 where test language is the same as native language. We considered three types of participation in non-formal further education to further check the robustness of our MtL scale. Because, strictly speaking, we tested for effects of MtL on participation in education we report beta coefficients. However, as PIAAC is a cross-sectional dataset, we refer to these analyses as testing criterion validity (rather than predictive validity). In all analyses conducted with the IEA-IDB analyzer the replicate weights were taken into account and standard errors were computed using the jackknife repeated replication method (IEA 2012).

Establishing an MtL scale and testing configural invariance
First, we tested if the original six RtL items form one common factor in all countries by fitting a common factor model imposing only configural MI; i.e., we did not allow for any residual correlations among the six items by country. This model did not produce an acceptable overall fit to the data, χ 2 (189) = 21,089.265, p < 0.001, CFI = 0.967, RMSEA = 0.125 [0.123;0.126]. Note that evaluating the CFI and RMSEA different conclusions could be drawn. According to the CFI, the common factor model fitted the data acceptably (CFI > 0.95), whereas the RMSEA indicated that the model did not fit the data (RMSEA > 0.10). The CFI compares the fit of the specified model to the fit of a baseline model in which all items are assumed to be uncorrelated with each other. As a consequence, the CFI will be high if the observed items are substantially correlated with each other. In contrast, the RMSEA is a measure of approximate fit and has been regarded "as relatively independent of sample size, and additionally favors parsimonious models" (Schermelleh-Engel et al. 2003, p. 37). Because not all model fit criteria were met we concluded that the common factor model using the original six items does not fit the data.
To investigate sources of local misfit in the common factor model, we evaluated modification indices above 100 to identify only major sources of misfit. The modification indices pointed towards substantial residual correlations between items 1 and 3, suggesting that these items form a second factor apart from the postulated common MtL factor. Particularly high modification indices and therefore high residual correlations (often greater than 0.30) were observed for Australia, Canada, Denmark, Estonia, Finland, South Korea, Netherlands, Sweden, UK, and USA. Allowing residual correlations between items 1 and 3 in all countries improved the fit of the model, χ 2 (168) = 14,244.282, p < 0.001, CFI = 0.978, RMSEA = 0.109 [0.107; 0.110], but the RMSEA still suggests lack of fit.
It is worth noting that by allowing residual correlations in one or more countries, researchers cannot decide whether or not the particular scale is one-dimensional or twodimensional. This is because a common factor model with correlated residual variables is data equivalent to a factor model with two correlated factors or a factor model with an additional method factor. Since the goal was to test the unidimensionality of the six-item scale (i.e., configural invariance), we fixed the residual correlations to zero in all countries for the subsequent analyses.
We evaluated the fit of the one-factor model using the four-item scale next (excluding items 1 and 3) also imposing configural MI. Compared to the six-item scale, the onefactor four-item model fits the data considerably better, χ 2 (42) = 2256,255, p < 0.001, CFI = 0.994, RMSEA = 0.086 [0.083, 0.089]. The Chi square value of the four-item scale was almost six times less than the Chi square value of the six-item scale. Additionally, the RMSEA dropped by 0.023 and the CFI increased by 0.016. These results show a substantial improvement in model fit, resulting from excluding items 1 and 3 from the original scale. Table 3 summarizes standardized factor loadings for both scales.
We continued testing beyond mere configural MI despite the fact that the RMSEA slightly exceeded the cut-off value of 0.08. This seemed justifiable given that the RMSEA was between 0.08 and 0.10 indicating mediocre fit (Schermelleh-Engel et al. 2003). Table 4 presents the fit of the models that were used for testing different levels of MI for the four-item factor (see Additional file 1: Table S1) for analogous analyses for the six-item scale). Due to the large sample size, the Chi square tests were significant when testing and comparing all of the subsequent models. Thus, we followed the guidelines by Cheung and Rensvold (2002) and Chen (2007) and compared the models according to the changes in RMSEA and CFI. First, we fixed all factor loadings to be equal across countries. Comparing the fit of Model 1 and Model 2 (configural MI vs. full weak MI), the RMSEA dropped by 0.022 indicating a practically significant improvement of fit when full weak invariance was imposed. As other aspects of model fit did not deteriorate, full weak MI could be established.

Testing measurement invariance across countries
Next, we tested full strong or scalar MI by fixing all threshold parameters to be equal across all countries (Model 3a). This model containing full strong MI still fits the data acceptably (i.e. RMSEA = 0.078, CFI = 0.955). However, based on the guidelines by Cheung and Rensvold (2002) and Chen (2007) the change in the RMSEA and CFI indicate a practically significant decrease in fit compared to Model 2 because the RMSEA increased by 0.014 and the CFI dropped by 0.034 units. Thus, we decided to test for partial strong MI. Again, we first evaluated parameter restrictions showing modification

Table 4 Fit indices for multiple-group CFAs of the four-item scale for different levels of measurement invariance
Model 1 configural invariance without constraints, Model 2 full weak model with factor loading invariance, Model 3a full weak/full strong; Model 3b full weak/partial strong without constraints for all four items on two thresholds (2 and 3); χ 2 χ 2 test statistics, df degrees of freedom, npar number of free parameters, CFI comparative fit index, RMSEA root mean square error of approximation. Significant χ 2 and ∆χ 2 (p < 0.05) are printed in italics

Table 5 MtL predicting participation in further education: results from regression analyses without and with controlling for literacy
Analyses conducted by IEA IDB analyzer; MtL motivation to learn; NFE12 participation in non-formal education in the 12 months preceding survey (derived); NFE12JR participation in non-formal education for job-related reasons in the 12 months preceding survey (derived); NFE12NJR participated in non-formal education for non-job-related reasons in 12 months preceding survey (derived); values of NFE12, NFE12JR and NFE12NJR combined: 0 did not participate; 1 participated; L literacy included as second predictor; SE standard error; sample restricted to working population (i.e., employed for more than 10 h per week in the 12 months preceding the survey and aged between 30 and 49) where test language is the same as native language; the OECD patch was used for the adjustment of the NFE variables; significant coefficients are printed in italics (p < 0.05); for further description see text indices above 100 to identify major sources of misfit. In addition, we examined the standardized expected parameter change (sEPC) and found that many of the largest sEPCs referred to thresholds 2 and 3. One of the highest sEPCs was encountered for threshold 2 of item I_Q04j with a value of 0.627 with a corresponding modification index of 695 (in Italy) and for threshold 2 of item I_Q04l with a value of 0.475 and a corresponding modification index of 472 (in Spain). Thus, we removed the equality restrictions of thresholds 2 and 3 (i.e., the threshold from 2 = Very little to 3 = To some extent and from 3 = To some extent to 4 = To a high extent) for each item across all countries (see Model 3b). Model 3b fits the data acceptably well. The remaining restrictions are still sufficient to establish partial strong/scalar MI (Byrne et al. 1989;Steenkamp and Baumgartner 1998). Moreover, there was no indication that Model 3b fit significantly worse than Model 2 (∆RMSEA = 0.002, ∆CFI = 0.01).
In sum, these results provide evidence that partial strong or scalar invariance holds for the four-item scale. Thus, the four-item scale allows mean comparisons across all 21 OECD countries. As a result, further analyses using this scale may assume full weak MI and partial strong MI, which is sufficient to warrant using the scale for comparing both mean differences and relations to other variables.

Testing the criterion validity of motivation-to-learn
Finally, we examined the criterion validity of the new MtL scale based on its relations with participation in non-formal further education in the last 12 months before data collection (see Table 5). Because participation in further education is substantially affected by previous education, age, employment, and language, we adjusted for literacy and reduced the sample to employed working-age individuals with the test language matching the native language to avoid biases. Given that full weak and partial strong MI could be established for these items, factor scores may be used as a manifest variable for studying associations with other variables.
Results from regression analyses demonstrated that the MtL scale significantly relates to participation in further education in all countries. Table 5 summarizes the standardized regression weights. The relation between MtL and participation in non-formal education ranged between β = 0.07 in Denmark and β = 0.25 in Poland. As expected, relations decreased when only considering job-related participation in further education; this is likely due to the role of external initiation and opportunity structures for increasing participation (Boeren et al. 2010). Relations decrease again when considering literacy as a covariate. However, relations between MtL and participation in job-related NFE remain significant in most countries, indicating a predictive validity of MtL for participation in job-related NFE above and beyond age, employment, language, and level of education. Surprisingly, the relation between MtL and participation in non-job-related NFE is rather low, which may be due to very low participation rates. These relations appear less affected by literacy as a covariate, which is theoretically expected because this type of further education is less tied to professional accomplishments.

Discussion
The goal of the present study was to develop a psychometrically sound measure of Motivation-to-Learn (MtL) based on items from the PIAAC background questionnaire.
Building on the existing six-item Readiness-to-Learn (RtL) scale, our results show that the proposed four-item MtL scale fits the data reasonably well, whereas the full RtL scale, which includes items on the use of learning strategies, is not appropriate. Furthermore, we found evidence for configural, full metric and partial scalar invariance across 21 countries for the four-item MtL scale. Finally, results from regression-based analyses using the IEA-IDB Analyzer show that the relations between MtL and participation in further education (controlling for literacy) support the scale's criterion validity. In sum, results suggest that the four-item MtL scale is satisfactory for further use in future analyses of the PIAAC data, and for measuring motivation.

Readiness-to-learn versus motivation-to-learn across countries
The concept of RtL was supposed to merge both motivational aspects and use of learning strategies into a specific approach to learning (Kirby et al. 2003). The conceptual framework of the PIAAC background questionnaire refers to a total of 13 items intended to measure deep versus surface approaches (OECD 2011). With this number of items it could have been possible to distinguish at least two forms of motivation-intrinsic and extrinsic-and/or two strategic approaches to learning-deep and surface-as proposed by Biggs (1985). Apparently, the questionnaire had to be shortened to include only six items (OECD 2010a). Unfortunately, there is no hint as to why these particular six items were chosen for the final version of the questionnaire.
Given only six items it is no longer possible to measure diverse and discrete motivational qualities and strategic approaches to learning. Instead, results from multiplegroup confirmatory factor analyses support our theoretically driven compilation of the hypothesized scale to measure MtL. More specifically, excluding the two items that clearly refer to the use of learning strategies significantly increased the fit of the scale. Thus, these items apparently do not belong to the underlying MtL factor but measure a second factor reflecting use of deep-level learning strategies. Our results are in line with previous findings by Smith et al. (2015), who show that the respective items had substantial error correlations when specifying a one-factor-model using all six items in the US sample. Thus, the MtL scale is a sound instrument to capture motivation in further analyses, while the use of the six-item RtL scale is not recommended.
Our analyses show reasonably strong MI allowing for comparisons of latent factor means and structural coefficients across 21 PIAAC countries. As we had measurement error fixed throughout the analyses, the MtL scale score may be used in latent and manifest analyses to compare both relations and means across countries.

Criterion validity for participation in further education
Results from the regression-based analyses of the relations between MtL, participation in further education and literacy are largely as expected; hence, bolstering our empirical arguments for the soundness of the four-item MtL scale. MtL shows substantial relations to participation in further education that decrease when literacy-as indicator of level of education-is taken into account. The strength of association varies across countries, which may be explained by differences in educational policy and opportunities offered by educational institutions (Desjardins and Rubenson 2013). More specifically, the influence of motivation as an antecedent of participation in further education may be small when participation in further education is commonplace and demanded independent of personal aspirations, so that most people participate. By contrast, motivation may be more important when adults have high chances to realize their educational plans because educational offers are readily available but participation is largely left to individual choice. Future research should focus on the role of motivation for participation in further education across different regimes of educational policy.
Surprisingly, relations between MtL and non-job-related participation in NFE are non-significant in many countries. One possible explanation for this finding could be that participation rates were very low in all countries. In addition, participation in nonjob-related non-formal education could also be driven by factors other than the desire to expand one's competence. For example, several findings emphasize that adults may engage in learning as a social activity (Courtney 1992). Future research should investigate relations between MtL and participation in further education in more detail, for example, by distinguishing types of further education or considering level of education as a moderating variable. For example, findings from a recent study based on German data from the Adult Education Survey demonstrating that individual learning motivation-comprising both enjoyment of learning and benefits gained from further education-is particularly important for further education participation for people with lower levels of education, especially for informal learning activities (Gorges and Hollmann 2015).

Limitations
Because Kirby et al. 's (2003) approaches to learning draw on motivational theories and theories about learning strategies, the present study focused on constructs from the educational psychology literature as a theoretical underpinning of readiness-to-learn and motivation-to-learn, respectively. However, considering the items used in the readinessto-learn scale, different theoretical constructs, for example, from personality psychology (openness, typical intellectual engagement; Goff and Ackerman 1992; need for cognition; Cacioppo and Petty 1982) may be relevant for the conceptualization of readinessto-learn as well. However, it may be assumed that such personality traits' predictions of educational task choice (i.e., participation in further education) will be mediated by motivation, which is considered a direct antecedent of behavior derived from both personal and contextual factors (Heckhausen and Heckhausen 2009). Consequently, MtL should also be assessed independent of any specific learning content, educational institution, or other organizational framework.
The current analyses were somewhat limited by the contents of the PIAAC background questionnaire. For example, the data did not contain further psychological measures to examine the scale's discriminant and convergent validity in more detail. Furthermore, due to its cross-sectional design, we had to focus on the scale's criterion validity, whereas its longitudinally predictive validity would be an important aspect as well. Confirmatory factor analyses revealed some correlated errors indicating that the scale's underlying theoretical construct is not absolutely clear-cut. Moreover, we had to free some parameters, that is, we had differential item functioning for some countries. Nonetheless, given the large sample sizes that might pander to spurious significant findings, and the large number of countries representing diverse cultural contexts, we conclude that the scale performed quite well.

Outlook and suggestions for future research
Overall, the scale is comparable across countries. Nevertheless, because different subpopulations vary substantially in their competencies (e.g., for Germany, see Maehler et al. 2013), future research still needs to investigate the scale's psychometric properties and comparability across different subgroups within countries such as gender or age groups (see, for example, the approach taken by Smith et al. 2015).
As MtL significantly relates to participation in further education, it can be considered an important variable in future analyses of the PIAAC data. However, without denying the important role of the present scale, both large-and small-scale future research on adults in educational settings would benefit from theoretically sound and clear-cut measures to assess both MtL and use of learning strategies. Only recently, Gorges (2015) outlined the potential of motivational research for studies on further education participation. As already mentioned, because psychological research on MtL is mostly constrained to educational institutions, MtL is typically measured and analyzed with respect to learners' current educational activities and the particular learning contents of those activities. However, while adults are generally potential participants in educational activities, they are not necessarily engaged in learning at the time of survey. Moreover, adults may choose from a great variety of possible educational activities. Thus, measuring general adult MtL seems quite fruitful to understanding the processes of why they participate in further education. At the same time, however, it is unclear which educational activity the respondents may refer to or might plan next on their educational agenda making the task challenging with the data at hand. Therefore, instruments to measure adult MtL, and eventually to predict lifelong learning, have to fulfill a range of prerequisites: They should capture MtL independent of any current learning activity, but they should not refer to some abstract future learning activity in which people might engage.
In addition, such measures should not emphasize the instrumentality of learning as this is very specific to the individual learners' situations. Rather, MtL should predict engagement in learning regardless of external incentives to learn, or-ideally-instruments to measure MtL should be able to differentiate between external and internal forms of motivation.
As PIAAC and other representative surveys such as the Adult Education Survey or the German National Educational Panel are designed to cover myriad further education forms, very specific measures of adult motivation to learn would not be sufficiently broad. Hence, because such measures would have to abstract from particular learning opportunities, they will probably lose predictive power compared to more specific scales (Steinmayr and Spinath 2009). Nevertheless, although the development of adult motivation to learn scales appears to be quite challenging, such measures in concert with the present four-item scale would fully realize their potential in a longitudinal dataset with more detailed information on past and future learning activities, which unfortunately is a general shortcoming of educational psychology research on adult learning at the moment (Calfee 2006).