 Research
 Open Access
 Published:
How long is the shadow? The relationships of family background to selected adult outcomes: results from PIAAC
Largescale Assessments in Educationvolume 6, Article number: 4 (2018)
Abstract
Background
Ongoing interest in the relationships between family background and adult outcomes is motivated by concerns regarding the intergenerational transmission of advantage/disadvantage. Currently all countries are far from achieving the ideal that all individuals, irrespective of their starting points or their demographic characteristics, are able to accumulate sufficient human capital so that they can achieve success in the workplace and fulfill their responsibilities as family members and as citizens. This study quantifies the length of the shadow cast by family background and personal characteristics on an individual’s prospects in the labor market. It also examines the extent to which these relationships are mediated by factors more proximal to labor market entry.
Methods
This study uses data for 21 OECD countries from the first round of PIAAC. It employs descriptive statistics, correlations and logistic regression. Two dichotomous variables are derived from each country’s national annual income distribution: Q1 = 1 if the individual’s income is in the first (lower) quartile and Q4 = 1 if the individual works fulltime and whose income is in the fourth (upper) quartile. For each country, a nested sequence of logistic regression models are fit to ascertain the role of more proximal factors in mediating the impact of family background and demographic characteristics on these labor market outcomes.
Results
The patterns of relationships are qualitatively similar across the 21 countries, although the estimated associations vary greatly in strength. Parental education accounts for substantial proportions of the variation in respondents’ Educational Attainment and Cognitive Skills. In most countries, children born to parents with lower levels of education have less than a fifty–fifty chance of exceeding that level. Family background is strongly associated with income, but the relationship is largely mediated by Educational Attainment and Cognitive Skills. Females and younger individuals have much higher odds of being in the lower quartile and much lower odds of reaching the upper quartile, even after adjusting for other variables. The magnitudes of these adjusted odds are concerning.
Conclusions
Family background and gender cast a long shadow on individuals’ life prospects. Countries vary greatly in their success in mitigating these disadvantages. Formulating effective policies will depend on understanding a complex set of dynamics that surely differ among countries.
Background
Every nation confronts the challenge of preparing its people to thrive in an increasingly competitive, global economic system. A good part of that preparation involves building the stock of human capital, which comprises Cognitive Skills and knowledge, along with interpersonal skills such as collaboration and teamwork, and character traits such as motivation, persistence, reliability and selfdiscipline. (Kirsch et al. 2016). Ideally, all individuals, irrespective of their family background or their demographic characteristics, should be able to accumulate sufficient human capital so that they not only can achieve success in the workplace, but also fulfill their responsibilities as family members and as citizens.
In principle, governments would like to monitor the full stock of human capital. However, to this point, only certain Cognitive Skills have been assessed systematically and in a manner that facilitates both intranational and international comparisons. With regard to those skills, interest centers on documenting their overall distribution, as well as the differences in the distributions across subpopulations defined by combinations of such factors as gender, race/ethnicity, location and family of origin socioeconomic status.
International comparisons provide: (i) Transparency to education systems that is frequently not otherwise available; (ii) Independent benchmarks that jurisdictions can use to set their own targets; (iii) Models that can be adapted to help meet the targets (Singer et al. 2018). Lack of expected progress on key national indicators, such as means of cognitive skill distributions, as well as (apparently) unfavorable comparisons based on international assessments, can serve as an impetus for policy changes, although much depends on aspects of the local context (Ritzen 2013). In the United States, for example, there are concerns with the persistence of skill gaps among groups defined by race/ethnicity, as well as the growing skill gaps associated with differences in the socioeconomic status of the family of origin (Reardon 2011; Reardon and Portilla 2016). Though much of the discussion centers on changes over time, evidence from crosssectional international comparisons is also relevant and informative.
There is ongoing interest across the Organization for Economic Cooperation and Development (OECD) in the relationships between family background on the one hand, and a range of adult outcomes on the other. This interest is motivated, in large part, by concerns regarding the intergenerational transmission of advantage/disadvantage and its impact on intergenerational social mobility.
This study investigates some key aspects of the intergenerational transmission of advantage/disadvantage: specifically, the crossnational patterns of statistical associations between indicators of family background and (i) measures of Cognitive Skills and markers of Educational Attainment; (ii) indicators of labor market success. Furthermore, the study documents the extent to which these associations are mediated by individual characteristics that are more proximal to entry into the labor market. In the process, it also examines the relationships of certain demographic characteristics and those same indicators of labor market success. In metaphorical terms, the study attempts to quantify the length of the shadow cast by family background (and personal characteristics) on an individual’s prospects in the labor market.
The study draws on first round data from the Programme for the International Assessment of Adult Competencies (PIAAC). PIAAC is the third in a sequence of periodic surveys conducted under the auspices of the OECD to examine the relationships among adults’ demographic and family characteristics, Cognitive Skills, Educational Attainment, work experiences and labor market outcomes. PIAAC is a household survey that draws nationally representative samples of adults ages 16–65. The utility of PIAAC is that it offers a common framework for comparing patterns of relationships across countries and, in particular, the contribution(s) of differences in family background, Cognitive Skills, and Educational Attainment in accounting for the variation in labor market outcomes. Unlike the administrative databases that are often used to address similar questions, PIAAC provides direct measures of foundational skills such as literacy and numeracy. We employ a combination of descriptive statistics, correlations and logistic regression to examine these relationships, as well as to convey the crosscountry patterns in the results.
The target population for this study comprises adults ages 25–54 in 21 countries. Family background is here represented by Parental Education and Books in the Home.^{Footnote 1} The labor market outcomes constitute two dichotomous variables indicating whether the respondent is in the upper quartile or in the lower quartile of the national annual income distribution.
Briefly, we show that not only does family background indeed cast a long shadow on adult outcomes, but also that gender casts one that is equally long. These results bear directly on the intergenerational transmission of advantage and disadvantage. More specifically, we find that across all countries:

(i)
Parental Education has a strong association with Educational Attainment, but the strength varies very substantially across countries.

(ii)
The wellknown relationship between Parental Education and Cognitive Skills persists even when controlling for Educational Attainment, albeit in a weaker form.

(iii)
Countries also vary substantially with regard to the probability that adults’ Educational Attainment equals or exceeds that of their parents’.

(iv)
In all countries, the strongest predictors of the probability that an adult achieves (or exceeds) specified levels of proficiency in both Literacy and Numeracy are Educational Attainment and Parental Education.

(v)
Family Background (Parental Education and/or Books in the Home) is strongly associated with both wagerelated labor market outcomes, but the strength varies across countries and is largely mediated by Educational Attainment and Cognitive Skills.

(vi)
Gender and age are strongly associated with both wagerelated labor market outcomes, but the strength varies substantially across countries. These associations are NOT mediated by either Cognitive Skills or Educational Attainment—or even by Occupational Category.
Thus, adults from more disadvantaged family backgrounds or are female, or younger, are considerably less likely to achieve the upper quartile of the annual income distribution and considerably more likely to be in the lower quartile of that distribution. This is the case in all the countries studied, though some have been more successful than others in reducing these inequities. These findings substantially extend the results reported in OECD (2013b) and OECD (2016).
The article is organized as follows. The next section provides some background and a brief review of the literature, followed by a section describing the data and methods employed. The next two sections present results: the first offers findings related to Cognitive Skills and the second findings related to labor market outcomes. The final sections comprise a summary and conclusions.
Literature review
As noted above, there is ongoing global interest in the extent to which education and other social services are able to mitigate the influence of differences in family background on children’s success in adult life. Accordingly, many international studies have been utilized to address this question. In the main, they have shown that there are strong statistical associations between the sociodemographic characteristics of individuals’ families of origin and their academic achievements, as well as a range of adult outcomes (Ermisch et al. 2012).^{Footnote 2} In this regard, the OECD has provided invaluable resources. Most recently, it has published statistical reports based on PIAAC data (OECD 2013b, OECD, 2016) that contain a broad range of relevant analyses.
For example, according to the OECD (2013b, Fig. 3.1, http://dx.doi.org/10.1787/888932900821) differences in certain characteristics (Age, Nativity/Language, Educational Attainment, Parental Education, and Occupation) are all strongly associated with differences in Literacy scores. The results for Parental Education are particularly revealing. The focus is on the average difference in mean Literacy scores between respondents who had at least one parent with tertiary education and respondents with neither parent having attained upper secondary. For the 21 countries in the study, the differences range from about 30 points (Estonia) to more than 50 points (United States). After adjustment for a broad constellation of factors, the differences range from about 11 points (Japan, Estonia) to about 29 points (United States).
The OECD (2013b, Fig. 3.12, http://dx.doi.org/10.1787/888932901068) offers further evidence of the long shadow of parental education. It displays the adjusted odds ratios for scoring at Level 2 or below on the Literacy scale (i.e. below the commonly accepted threshold of proficiency) for individuals characterized by both Parental Education and (their own) Educational Attainment.^{Footnote 3} The odds ratios for individuals with the lowest level of Educational Attainment and whose parents were also at the lowest level of Educational Attainment ranged from about 3 (Estonia, Cyprus) to more than 10 (United States.). Thus, other things being equal, individuals in the United States at the lowest level of Educational Attainment with parents also at the lowest level of Educational Attainment, were more than 10 times more likely to score in the lowest levels of the Literacy scale than individuals whose Educational Attainment was at least upper secondary, as was that of their parents.
However, the adjusted odds ratios for individuals at the lowest level of Educational Attainment, but who had at least one parent who attained upper secondary, range from about 2 (many countries) to 4 (England/N. Ireland), with most countries falling between 2 and 3. The adjusted odds ratio for the United States is about 2.5. Thus, holding the individual’s Educational Attainment constant (at the lowest level), differences in Parental Education are still strongly associated with differences in the probability that the individual falls in the lower levels of the Literacy distribution.
Some investigators have employed data from earlier OECD studies. For example, Park and Kyei (2011) used data from IALS (1994–1998) to investigate the relationships of Educational Attainment with both prose and document literacy. The focal population was adults aged 26–35. Across all 19 countries studied, they found substantial skills gaps between those individuals in the highest categories of Educational Attainment (ISCED 5–7) and those in the lowest categories (ISCED 0–2). The sizes of the gaps varied considerably across countries.^{Footnote 4}
The article by Massing and Schneider (2017) examines a different facet of the relationship between Cognitive Skills and Educational Attainment. One important finding was that literacy levels varied substantially within ISCED categories across countries, even when controlling for other factors related to literacy levels. This indicates that the goal of harmonization of education levels across countries has been imperfectly achieved and implies that studies involving education levels, particularly those more coarsegrained than the ISCED levels, should be done on a countrybycountry basis.
There is also substantial evidence for the strong relationship between Cognitive Skills and labor market outcomes. For example, Fig. 6.4 of OECD (2013b, http://dx.doi.org/10.1787/888932902493) displays for each country the 25th, 50th, and 75th percentiles of the wage distributions by literacy proficiency level.^{Footnote 5} In every country, the wage distributions shift to the right with increasing proficiency. However, there is also considerable overlap in the wage distributions across literacy proficiency levels. The United States appears to have the least overlap, particularly between Level 3 and Level 4/5.
An important question is the strength of the relationship between Cognitive Skills and labor market outcomes when other relevant variables are included as predictors in the model. It must be borne in mind that the statistical relationships among these variables embody complex dynamics and that in a crosssectional observational study such as PIAAC it is impossible to disentangle the causal mechanisms at play.
With this caveat, it is of interest to consider Fig. 6.7 (OECD 2013b, http://dx.doi.org/10.1787/888932902550). It displays the percentage differences in wages associated with a one standard deviation difference in Years of Education and a one standard deviation difference in literacy scores.^{Footnote 6} For Years of Education, the percentage differences in wages range from about 8% (Sweden) to about 25% (Poland). The United States stands at 23%. For Literacy scores, the percentage differences in wages range from about 4% (Italy) to about 14% (England/N. Ireland). The United States stands at 12%. In all countries, the percentage differences associated with Years of Education are much greater than those associated with Literacy scores. For both variables, the results vary substantially across countries.
There has been longstanding interest among economists in delving into the relationships among Cognitive Skills and labor market outcomes. A common obstacle has been the lack of direct measures of individuals’ skills that could be linked to databases that contain information on their labor market outcomes. As a fall back, economists have relied on years of schooling or indicators of Educational Attainment.
There have been some exceptions, see for example, Murnane et al. (2000) and the references therein. Murnane et al. (2000) used two United States longitudinal data sets to examine the relationships between math skills measured at the end of high school and wages, in one case, at age 27 and, in the other, age 31. For males and females separately, they fit a number of regression models that also included race/ethnicity and Educational Attainment. The results were somewhat mixed, but the general conclusion was that math skills did have a statistically significant relationship with wages even when Educational Attainment was included in the model. At the same time, differences in wages across Educational Attainment levels were very substantial: An increase of one standard deviation in math scores was associated with a wage increase that was typically less than a quarter of the difference between those with a college degree and those with a high school diploma. The authors also adduced evidence that the effect of math skills on wages was partially mediated through Educational Attainment. A similar, but more extensive, study was conducted by Lin et al. (2016).
Holzer and Lerman (2015) used PIAAC data for the United States to examine the relationships between Cognitive Skills (literacy, numeracy, problem solving in technologyrich environments [PSTRE]) and labor market outcomes. They studied respondents aged 25–65. Literacy and numeracy scores were segmented into three categories: low proficient (levels 2 and below), proficient (level 3), and highly proficient (levels 4 and 5). For problem solving in technologyrich environments the corresponding categories were: low proficient (level 1 and below), proficient (level 2), and highly proficient (level 3).
They found that all three skills were strongly and positively associated with categories of Educational Attainment. Moreover, controlling for Educational Attainment (3 categories), both Literacy and Numeracy were strongly associated with earnings—not surprising in view of the high correlation (0.85) between Literacy and Numeracy. They also carried out a regression of the natural logarithm of earnings on demographics (Gender, Age, and Nativity) and the three skills, with the base categories being Low Proficiency. The results showed that, for each skill, the regression coefficients for both included skill levels were significantly different from zero. Notably, the stronger association was with Numeracy and the weaker one with Literacy. When Education Attainment was added to the model, the regression coefficients were reduced by as much as a factor of 2.^{Footnote 7} The coefficients for the Literacy were no longer significant.
Economists have employed PIAAC to conduct international comparisons of the relationships between skills and Educational Attainment, on the one hand, and wagerelated outcomes on the other. For example, Hanushek et al. (2013) used data from the first round of PIAAC to investigate the relationships between the natural logarithm of wages and various individual characteristics, especially Cognitive Skills. Their sample was restricted to respondents working at least 30 h per week. In all countries, they found that even with gender, years of experience and years of schooling in the model, the coefficient of Numeracy was statistically significant and substantively important. The same was the case for Literacy and PSTRE, although the coefficients for the latter were substantially smaller than those for Literacy that, in turn, were similar to those for Numeracy.
Cappellari et al. (2015) focused on how well Numeracy and Years of Education could account for the variation in different wage outcomes. They compared results from ordinary least squares (OLS) and instrumental variables (IV) approaches in order to explore issues of endogeneity.^{Footnote 8} Inasmuch as one of their goals was to relate patterns derived from PIAAC to education policy choices and governance features, they linked the PIAAC data to other databases with the requisite information. Consequently, they were restricted to using the data from only 13 European countries. In their analyses, they pooled the data from the 13 countries but incorporated country fixed effects in their models, along with variables representing a range of respondent characteristics.
Overall, they found that Years of Education was a stronger predictor of wage outcomes than Numeracy. For example, for the dichotomous outcome indicating whether the respondent’s income is in the top quartile of the income distribution, the IVestimated coefficient of the (standardized) education variable is more than twice as large as the IVestimated coefficient of the (standardized) Numeracy variable.^{Footnote 9} As we shall see, these findings, as well as those from the other studies cited, are broadly consistent with ours.^{Footnote 10}
In sum, the literature has illuminated the relationships between various individual characteristics on the one hand, and educational and labor market outcomes on the other. However, what is lacking is a systematic, temporally coherent study of the relationships among these characteristics and the outcomes of interest. This exploratory study aims to partially fill this gap. Moreover, the approach adopted here is somewhat different from that carried out by most economists. The latter tend to focus on the relationships between Cognitive Skills (or Years of Education or Educational Attainment) and labor market outcomes. Demographic characteristics and family background indicators are treated as confounding factors to be controlled for. By contrast, we begin with family background and demographic characteristics and attempt to trace the pathways through which they impact those same labor market outcomes.
Data and methods
PIAAC differs from previous rounds of international surveys of adult competencies in a number of important ways. Foremost is the introduction of computer administration of the assessment and the background survey.^{Footnote 11} Taking advantage of computer delivery, the design employed in the administration of the cognitive items was a multistage, stagewise adaptive design. For further information, see OECD (2013a) and Kirsch and Lennon (2017). In point of fact, the assessment was actually administered in two modes: by computer or, for those who were deemed unable or unwilling to be assessed by computer, by paperandpencil. In the computer administration, respondents were assessed in Literacy and Numeracy, albeit with variable degrees of precision. A small proportion of respondents were also assessed in Problem Solving in Technologyrich Environments.^{Footnote 12} In the paperandpencil administration, respondents were assessed in Literacy and Numeracy only.^{Footnote 13} All respondents completed an extensive background questionnaire.
We employ data from the first round of PIAAC, for which data collection took place during the period August 1, 2011 to November 21, 2012.^{Footnote 14} Twentyfour countries participated in the first round of PIAAC. Two countries, Russian Federation and Cyprus, are not members of the OECD, while Australia, which is an OECD member, does not have data on the public use files and so could not be included in the analysis. Accordingly, as noted earlier, we consider the remaining 21 OECD countries. See Appendix 1 for a list of the countries comprising this subset, as well as the abbreviations used in this study.^{Footnote 15} PIAAC sampled adults ages 16–65 at the time of the survey. For reporting purposes respondents were placed in one of five age categories: 16–24, 25–34, 35–44, 45–54, 55–65. This study focuses on the middle three categories (i.e., ages 25–54), in as much as many individuals in the youngest category are engaged in education/training, while many of those in the oldest category are in their final years of labor force participation or may be out of the labor force entirely.
We employ the countrylevel data files produced by the PIAAC consortium. Each country’s data file is organized by individual respondent, with data on cognitive outcomes, background characteristics, education and training, labor market outcomes, and samplingrelated quantities. Further information on PIAAC design, development, implementation, and postadministration processing (including output data) can be found in OECD (2013a).
The criteria of interest in this study comprise Cognitive Skills, Educational Attainment, and two wagerelated labor market outcomes. The Cognitive Skills are Literacy and Numeracy.^{Footnote 16} They are defined as follows (OECD 2013b; Chapter 2):
Literacy
The ability to understand, evaluate, use and engage with written texts to participate in society, to achieve one’s goals, and to develop one’s knowledge and potential.
Numeracy
The ability to access, use, interpret, and communicate mathematical information and ideas in order to engage in and manage the mathematical demands of a range of situations in adult life.
Direct measurement of each cognitive skill is accomplished through administration of a specially designed set of instruments, such that each respondent only receives a subset of the item pool developed for that skill.^{Footnote 17} This design improves overall construct representation but complicates the psychometric analysis. The skill values associated with a respondent are estimated by applying a countryspecific, IRT/latent regression model that combines information from the respondent’s answers to the cognitive items administered to her with the respondent’s background information (von Davier et al. 2007; OECD 2013a). While the IRT parameters are estimated from the entire international sample, the specific country population parameters are estimated from the national samples. The output of the model consists of a set of ten plausible values (PV) for each cognitive skill for each respondent.^{Footnote 18} The PVs are represented as numerical values on a scale of 0–500. It is these sets of PV that are subject to secondary analyses of the type reported here.^{Footnote 19} For further information regarding PV, consult Sect. 5 of OECD (2013a) or Braun and von Davier (2017).
The two labor market outcomes examined in this study are calculated from the countrylevel annual income distributions.^{Footnote 20} The computation of an individual’s annual income is described in (OECD 2013a, Ch. 20.4). Briefly, respondents were given the option of reporting earnings on whatever time scale that was most convenient or familiar to them. For those who did not want to report precise amounts, there was the option of reporting in broad categories. Both respondents on salary and those who were selfemployed were included. Given the range of reporting formats, a great deal of work (transformations) was done to place the responses on a common scale. Ultimately, estimated monthly earnings for salaried workers (including bonuses) and for the selfemployed were calculated, combined and then converted to annual earnings. After correcting for purchasing power parity, the distributions were represented in terms of both deciles and quartiles.
In deriving the national income distributions for this study, the PIAAC contractors decided to exclude individuals with zero incomes. Respondents with missing data were also excluded. Of course, the proportions of individuals reporting no income varies across countries and should be taken into account when interpreting the results. Appendix 2 contains the countryspecific exclusion percentages. They range from 15% (Sweden) to 42% (Spain), with a median of 29.6%. The value for the United States is 28.2%.
The percentiles employed in subsequent analyses are derived from these national income distributions. We defined a dichotomous variable, Q1, to take the value 1 if the respondent’s annual income was at or below the 25th percentile (lower quartile) with respect to the national income distribution and 0 otherwise. With regard to the upper end of the national income distribution, a decision was made to restrict attention to those individuals reporting they were working full time, defined as working 30 or more hours per week. This decision was motivated by the simple fact that weekly earnings are the product of the hourly wage and the number of hours worked. Thus, focusing on fulltime workers enhances the utility of earnings as a criterion variable in models that are intended to elucidate the relationships between earnings and various possible explanatory factors. For example, if one of these factors (e.g. Gender) is correlated with both hourly earnings and hours worked, then there would be ambiguity in the interpretation of the gender coefficient in the wage equation. That said, this restriction does limit the generalizability of the results obtained from the reduced sample.
With this caveat in mind, we defined a second dichotomous variable for fulltime workers, Q4, to take the value 1 if the respondent’s annual income was at or above the 75th percentile (upper quartile) with respect to the national income distribution. Note that since the national annual income distributions were based on data from the full national samples (only excluding those respondents with zero or missing incomes), the proportions falling in the lower and upper quartiles for the analytical samples may differ substantially from 0.25.
The implications for sample sizes are as follows: The total sample size in the age categories selected for this study is 88,818. Of those, there were 65,082 respondents who had both paid income in the previous 12 months and valid income data. Disaggregated by country, it is these respondents who were assigned a “1” or “0” depending on whether their annual income was located in the lower quartile. Finally, of the 65,082 respondents there were 31,876 who reported working fulltime (30 h per week, or more). They comprise the fulltime (FT) sample. Disaggregated by country, it is these respondents who were assigned a “1” or “0” depending on whether their annual incomes were located in the upper quartile.
In addition to these four variables, the study investigates a number of possible explanatory factors derived from information collected through the BQ. These factors are listed in Table 1. For discrete factors the table displays the number of categories and the reference group used in the analyses. The categories corresponding to factors other than Age are described below:

Gender: male and female.

Parental Education: neither parent attained upper secondary education; at least one parent attained upper secondary education; at least one parent attained tertiary education.

Books in the Home: 25 books or less, 26–100 books, 101 books or more.

Educational Attainment: below secondary education; secondary education; beyond secondary education but not tertiary education; tertiary education.
In earlier iterations of model fitting, we also introduced two other factors. The first, Nativity/Language, refers both to the respondent’s country of birth and mother tongue. It is represented by four categories: Native born/Native language; Native born/Foreign language; Foreign born/Native language; Foreign born/Foreign language. We found that in six countries the sample sizes in some cells were so small that the model couldn’t be fit. In the other countries, the associated regression coefficients were generally not significant and eliminating this factor did not materially degrade the fit. The second, Years of Experience, refers to years of paid work during one’s lifetime (topcoded at 35). Again, introducing this factor did not meaningfully improve the fit. Accordingly, we have not included these two factors in the reported results.
For certain reporting purposes of the OECD, the scales for Literacy and Numeracy have been segmented into 6 ordered levels. Group performance can then be described by the percentages of the population falling in each level. The boundary between Levels 2 and 3 (occurring at 276 scale score points) typically is taken as the demarcation between those with adequate skills (i.e. proficient) and those without them.^{Footnote 21} For further information on these levels, see OECD (2013b, Chapter 2). In this study, when Literacy and Numeracy serve as the criterion, they are dichotomized into indicator variables that take the value 1 if the PV scale score is at 276 or above, and 0 otherwise.^{Footnote 22}
Because the wagerelated criterion (dependent) variables are dichotomous, we employed logistic regression to examine their relationships to the explanatory factors. Since the data associated with each respondent includes sets of ten PV for each cognitive skill, as well as the sampling weights that reflect the probability sampling design that was employed, it is strongly recommended to take these into account in a logistic regression analysis. We accomplished this by employing the IES’s, IDB Analyzer (IEA 2015).^{Footnote 23}
The basic logistic regression model takes the following form: Suppose Y is a random quantity that can take the values 1 or 0, denoting group membership. Let P_{ i } denote the probability that individual i has the value Y_{ i } = 1. Then
where X_{ i } = vector of demographic characteristics (Age, Gender, ParEd, Book), Z_{ i } = vector of acquired characteristics (Cognitive Skills, Education Attainment), β and γ = vectors of fixed coefficients to be estimated. (Note: In the baseline model, Z_{ i } = ϕ).
It is common with logistic regression to report results in terms of the estimated coefficients of the explanatory variables. These are difficult to interpret directly, however, and should be transformed into odds ratios and probabilities. Readers should also bear in mind that results from different studies are often not directly comparable as they differ in the filters used to construct the analytic databases. The filters can involve combinations of age category and employment status, as well as other factors.
When presenting the fitted coefficients of the logistic regressions we follow general practice and indicate whether the coefficients are significant at the 0.05 or 0.01 levels. The estimated standard errors do take into account uncertainty due to both measurement error and sampling error. However, given the number of models that are fit, as well as the number of predictors in each model, these indications of significance should be taken simply as guides to potentially interesting results and not be overinterpreted.^{Footnote 24}
To complement the presentation of the estimated coefficients for the logistic regression models, we employ a statistic, D, the coefficient of determination (Tjur 2008). This statistic possesses distinct advantages over the more commonly used goodnessoffit statistics for logistic regression, including the fact that it can be interpreted in a manner analogous to that of the R^{2} statistic in ordinary least squares regression.
In the first part of this study, we examine: (i) the relationships of Parental Education to Educational Attainment, as measured by diplomas or degrees achieved, (ii) the probabilities that children’s Educational Attainment exceeds that of their parents, and (iii) the relationships of Parental Education to both Literacy and Numeracy, holding Educational Attainment constant. The findings should command considerable policy interest. Ideally, education ought to reduce early differences in Cognitive Skills, including those correlated with differences in family background—with greater reductions associated with greater Years of Education and/or higher levels of Educational Attainment.
One expects substantial heterogeneity across countries in both sets of relationships. Indeed, the strength of the residual relationships between Parental Education and Cognitive Skills within Educational Attainment strata can serve as an additional indicator of the extent to which public investment has compensated for differences in family resources to “level the playing field of opportunity” (Kirsch et al. 2016; Braun 2016; Smeeding 2016). At the same time, we must be mindful that parental education is only one of many facets of family socioeconomic background that influence both the development of Cognitive Skills and the capacity of the education system to reduce those early differences.
In the second part of the study, the goal is to extend the cited results in OECD (2013b). To this end, we examine the relationships between the two indicators of family background, as well as Age and Gender, and the two wagerelated variables, Q1 and Q4. We also investigate the extent to which those relationships are mediated by more proximal variables such as Cognitive Skills and Educational Attainment. As this is an exploratory study, we do not begin with a formal set of hypotheses to be tested.
Results: Parental Education, Educational Attainment and cognitive outcomes
We begin by quantifying the relationship between Parental Education and Educational Attainment within each of the participating countries, employing a wellknown rank correlation coefficient, denoted by gamma, often used as a measure of the association between factors represented by ordered categories.^{Footnote 25} Lower correlations signal that the country’s educational system, presumably in conjunction with other social policies, is more successful in promoting educational mobility, with higher correlations signaling less success in mitigating the influence of family background on human capital development.^{Footnote 26} The weighted sample results, along with approximate standard errors, are presented in Table 2, column A. The range of gammas is 0.30—0.76, surprisingly large. The lowest, but still moderate, correlations are found in Finland, Norway, and Estonia, while the largest, very strong, correlations are found in the Slovak Republic, Italy, Poland and the Czech Republic. The value for the United States is 0.54, slightly greater than the median of 0.51.
Of course, correlations do not tell the whole story of intergenerational education mobility. There is also increasing interest in directly comparing individuals’ Educational Attainment to that of their parents.^{Footnote 27} With PIAAC data we can compute two relevant statistics for each country: (i) The probability that an individual’s educational level equals or exceeds that of her most educated parent and (ii) For an individual whose most educated parent’s education lies in one of the two lower categories, the probability that her education level exceeds that of her most educated parent. The unweighted sample results are contained in Table 2, columns B and C. Note that the denominators for the two statistics differ. In the first case, the denominator consists of the entire population (i.e. yielding an unconditional probability) while in the second case, the denominator consists of the subset of the population with Parental Education in the lower two categories (i.e. yielding a conditional probability).
With regard to (i), the probabilities range from 0.73 to 0.95, with a median of 0.82. Thus, even in countries with the lowest probabilities (Denmark, Germany, Estonia), only about onequarter of the population in the focal age groups lost ground with respect to Educational Attainment. At the high end (Italy, Poland, Spain, Korea, Slovak Republic) fewer than ten percent lost ground. With regard to (ii), the probabilities range from 0.22 (Germany) to 0.65 (Korea). The median is 0.46, while about half the countries have values between 0.39 and 0.49. The value of 0.36 for the United States places it in the bottom quartile. Again, higher probabilities signal countries that have been more successful in mitigating the effects of parental background—at least with respect to Educational Attainment. Generally speaking, countries with higher probabilities for (i) and (ii) had distributions of parental education that were stochastically smaller than the OECD average (i.e., parents typically had lower levels of Educational Attainment than was the case in the OECD overall). The low value for Germany may be due to the multiplicity of alternate routes (e.g. apprenticeships) into tertiary education or the workforce.^{Footnote 28}
Figure 1 displays the scatter plot of the gamma coefficients (Table 2, Column A) against the conditional probabilities (Table 2, Column C). There is a strong negative slope indicating, as one might expect, highgamma countries tend to have lower conditional probabilities of individuals’ Educational Attainment exceeding that of their (lower educated) parents, and lowgamma countries tend to have higher conditional probabilities of individuals’ Educational Attainment exceeding that of their (lower educated) parents. There are, however, some apparent outliers. For example, Austria has a gamma of 0.46, just below the median, but the third lowest conditional probability. On the other hand, there are number of countries (e.g. Czech Republic, Poland, Italy and Slovak Republic) that have high gammas and low conditional probabilities, but high overall probabilities of individuals’ Educational Attainment equaling or exceeding that of their parents (Column B). Further elucidation of these patterns would require consideration of the distributions of parental Educational Attainment, as well as an historical analysis of countries’ education policies over two generations—well beyond the scope of this study.
Parental Education is strongly positively associated with an individual’s Literacy and Numeracy scores (See for example, OECD 2013b, p. 113). These results demonstrate that Parental Education is also strongly positively associated with Educational Attainment. We now take the analysis a step further by examining the relationship between Parental Education and Literacy and Numeracy, separately, within each of the four strata defined by respondents’ Educational Attainment. The results are illustrated (Fig. 2) by the data from the Czech Republic, where the patterns are particularly strong for both Literacy and Numeracy. Figure 2 displays the distributions of the skill (either Numeracy or Literacy) for each combination of Parental Education (3 levels) and Educational Attainment (4 levels). The distributions are represented by boxandwhisker plots, which visually highlight five percentiles (10, 25, 50, 75, 90) and the two adjacent values.
Consider, for example, the lowest level of Educational Attainment (respondent did not complete secondary school). The difference in median Numeracy scores between respondents with the highest level of Parental Education and those with the lowest level is 61 points. For Literacy the comparable difference is 33 points. At the highest level of Educational Attainment (respondent obtained a bachelor’s degree or higher), the difference in median Numeracy scores between respondents with the highest level of Parental Education and those with the lowest level is 67 points. For Literacy the comparable difference is 35 points.
In most countries, within each Educational Attainment category, there are systematic differences in the distributions of Literacy and Numeracy across levels of Parental Education, particularly between the lowest and highest levels. The differences are generally larger the lower the level of Educational Attainment, though there are a few exceptions. Interestingly, Germany and Sweden have similar ranges for the distributions of Literacy and Numeracy for each Educational Attainment category. However, although there is a moderate association with Parental Education within attainment categories in Germany, there is almost no association in Sweden. Box plots for Germany, Sweden, and the United States are contained in Appendix 3.
The results suggest that in most countries children coming from more poorly educated families suffer from a double disadvantage: They are not only less likely to attain higher levels of education, but also less likely to achieve adequate Cognitive Skills than their peers at the same level of Educational Attainment. As we shall see below, this double disadvantage is detrimental to their labor market success, as measured by wages.
In a preliminary set of analyses, we estimated for each country a logistic regression prediction model for which the criterion was a high score in both Literacy and Numeracy. Specifically, the dichotomous criterion equaled 1, if and only if the respondent achieved Level 3 or above on both the Literacy and Numeracy scales, and 0 otherwise. Such individuals are generally considered to have the foundational skills needed to achieve some degree of success in the modern world (Kirsch (2001), pp. 39–44). The explanatory factors were Age, Gender, Parental Education, and Educational Attainment. We found that across all countries the strongest predictors were the different levels of Educational Attainment followed by the lowest level of Parental Education. The coefficients of Age and Gender were not significant. Thus, it appears that the relationship between Parental Education and Literacy and Numeracy is largely, but not entirely, mediated through Educational Attainment.
Results: Parental Education and labor market outcomes
We now turn to the relationships of different individuallevel characteristics to the labor market outcomes represented by Q1 and Q4. Since income distributions typically have an extreme right skew, it is common in the economics literature to use the natural log of income as the criterion in a single, ordinary least squares regression. The advantage of the present approach is that it enables us to see if there are markedly different patterns in the relationships between income and individual characteristics at the two tails of the income distributions.^{Footnote 29} Apparently, this is the motivation in Cappellari et al. (2015) to employ Q1 and Q4 as dependent variables in a set of auxiliary analyses. Although logistic regression is not as efficient in its use of data as ordinary regression, the number of observations available in these analyses are sufficient to yield reasonably accurate estimates of the regression parameters.
In this set of analyses, initial interest centers on the strength of the association between two indicators of family background (Parental Education and Books in the Home) and Q1 and Q4. We note that the countrylevel correlations (gamma coefficients) between these two indicators range from 0.44 (Belgium, Japan) to 0.74 (Italy), suggesting that it is reasonable to include both indicators in the regression. By fitting a sequence of nested models, we are able to study, for each country, how the magnitudes of the regression coefficients for the two indicators change with the introduction of Cognitive Skills and Educational Attainment. Of course, changes in the coefficients of other explanatory factors are also of interest.
For both analyses, the factors displayed in Table 1 were introduced successively in blocks as indicated in the Table 3. Block 1 comprises standard demographic characteristics and family background indicators, while Block 2 comprises a single composite measure of Cognitive Skills; that is, the Literacy scale scores and the Numeracy scale scores were summed and standardized to form a single predictor. As it is expected to be the strongest predictor, Educational Attainment was introduced next (Block 3) so as not to mask the other relationships of interest.
Findings for Q4 analyses
In another set of preliminary analyses (not detailed here) we employed an analytic sample, with the only filters being that the respondent had a valid, nonzero income. A striking result was the very large, negative coefficients (Models 1, 2, and 3) associated with Gender in all countries. The straightforward interpretation is that even controlling for the other factors in the model, being female was associated with much lower odds of reaching the upper quartile of the income distribution. It seemed plausible that, to some extent, this could be due to the different proportions of males and females working fulltime. Accordingly, we constructed a second analytic sample, the FT sample described above, for subsequent analyses.
The shift to the FT analytic sample forced us to drop Canada from the Q4 analyses as Canada did not collect information on fulltime work. Overall, this shift resulted in a loss of 29.3% of the data from the remaining 20 countries. The reductions by gender were 17.4% for males and 39.8% for females. These genderrelated differences are of independent interest. As we have argued above, for some policy questions the FT sample should be a better basis for making gender comparisons.
Since we do not have full information on a respondent’s history of labor market participation, we cannot account for differences by gender in the number of years spent in parttime work, or being out of the labor force entirely, factors that can also influence current annual income. Consequently, genderrelated differences in these (and other) unobserved characteristics are confounded with the characteristics that are represented in the model. Since women are generally more likely to have worked parttime, or stopped out entirely, during their prime working years, the estimates of gender disadvantage obtained here may well be somewhat larger than would be the case if the variables corresponding to these other factors were included in the model.
It bears mentioning that the magnitudes of the gender coefficients, though still very large, are substantially smaller in the set of analyses using the FT sample in comparison to those in the set of preliminary analyses referred to above. For example, the Netherlands has one of the largest gender coefficients. In Model 3, the gender coefficient declines from − 2.16 to − 1.19, corresponding to odds ratios of 0.12 and 0.30, respectively. An odds ratio of 0.12 means that holding constant all other variables in the model, males are about 8.5 times more likely than females to achieve the upper quartile. The corresponding interpretation of the odds ratio of 0.30 is that for the FT sample the male advantage is “only” a factor of 3.3. With one of the smallest gender coefficients among the 20 countries, the decline in the U.S. is only from − 0.91 to − 0.78, corresponding to odds ratios of 0.40 and 0.46, respectively. The male advantage in terms of odds ratios is thus reduced from a factor of 2.48 to 2.18.
In what follows, we discuss the upper quartile (Q4) results for the FT sample. Before turning to these results, we note that for each country the quality of the fit improves with the inclusion of additional blocks of predictors. In particular, the values of D for Model 3 range from 0.06 (Slovak Republic) to 0.30 (Germany), with a median value of 0.21.^{Footnote 30} The full set of D statistics is in Appendix 4.
Overall, we find for almost all countries, controlling for other factors in the model, significantly lower odds of reaching the upper quartile are associated with being female, being younger, or having a lower level of Educational Attainment. Higher Cognitive Skills are associated with significantly greater odds of reaching the upper quartile. All countrylevel results are contained in Appendix 5, to which we now turn.
With respect to the focal question, the results from fitting Model 1, which also includes Gender and Age as predictors, indicate that in all countries the coefficients associated with lower Parental Education and/or fewer Books in the Home are negative and significantly different from zero. (Recall that a negative coefficient corresponds to an odds ratio less than one relative to the reference category, while a positive coefficient corresponds to an odds ratio greater than one relative to the reference category.) Thus, these family background characteristics are strong predictors of reaching the upper quartile of the income distribution.
The addition of Cognitive Skills (Model 2) somewhat reduces the magnitudes of the coefficients for Family Background, but they usually remain statistically significant. However, with the addition of Educational Attainment (Model 3), the coefficients retain significance only in the Czech Republic, Estonia, France, Japan, Great Britain and the United States—albeit with magnitudes further reduced. It appears, then, that the long shadow of family background typically acts through the more proximal factors comprising Cognitive Skills and, especially, Educational Attainment.
Turning to demographic characteristics, in Model 1 the coefficients of the variables representing Gender and Age are large, negative, and strongly significant. Their magnitudes are not much diminished, if at all, by the addition of other predictors (Models 2 and 3). In part, these results are expected since in most work settings younger workers do not have sufficient seniority to move into the high end of the income distribution and females’ incomes typically trail males’ incomes. What is somewhat unexpected are the magnitudes of the coefficients for Gender and their robustness to the inclusion of Cognitive Skills and Educational Attainment. This robustness is evident in Fig. 3, which displays for 20 countries the coefficients for Gender for Models 1, 2, and 3.
To illustrate the range of gender disadvantage, consider the results for Model 3. The greatest disadvantage occurs in Japan, with a coefficient of − 1.56 corresponding to an odds ratio of 0.21. The least disadvantage occurs in Ireland, with a coefficient of − 0.31 corresponding to an odds ratio of 0.73. Thus, in Japan, other things being equal, the odds of males reaching the upper quartile are nearly five times greater than those for females. In Ireland the male advantage is only about 1.4 times. The median male advantage for the countries studies is about 3 in terms of odds ratios.
Cognitive Skills enter Model 2 as significant predictors of the criterion, but the introduction of Educational Attainment reduces the magnitudes of the corresponding coefficients by about 25% on the logit scale, though they remain statistically significant and substantively important. Note that the coefficients of Cognitive Skills cannot be interpreted in terms of odds ratios, as the variable is continuous and not categorical. It is possible, nonetheless, to make some meaningful comparisons between predictors. Reviewing the results for Model 3, we see that the impact on the log odds of a one standard deviation reduction in Cognitive Skills is typically about onefifth to onethird as large as the shift from the highest to the lowest category of Educational Attainment.
The coefficients of the indicators for categories of Educational Attainment are all large, negative, and strongly significant.^{Footnote 31} In particular, the coefficients for the lowest level of Educational Attainment correspond to odds that are very substantially lower than those of the reference category. Here, the United States is an outlier. In Model 3, the coefficient for the lowest category of Educational Attainment is − 3.01 corresponding to an odds ratio of 0.05. The interpretation is that, other things being equal, the odds of individuals in the highest category of Educational Attainment reaching the upper quartile are 20 times greater than the odds for those in the lowest category.
In a supplemental analysis (not shown here), Model 3 was augmented by a set of indicators representing seven Occupational Categories. In most countries, this addition led to only slight improvements to the fit, although most of the (partial) regression coefficients for the indicators associated with the different occupational categories were statistically significant.^{Footnote 32} The coefficients for Gender and Age were not much changed, while the coefficients for Cognitive Skills are somewhat reduced but still remained statistically significant. On the other hand, the coefficients for Educational Attainment were reduced across the board by about half on the logit scale, suggesting substantial confounding between Occupational Category and Educational Attainment. A plausible interpretation is that individuals with more education earn more because they are able to get jobs in higher paying occupations than individuals with less education.^{Footnote 33}
Figure 4 compares the advantage for males (relative to females) with the advantage for those in the highest category of Educational Attainment (relative to those in the lowest category). Advantage is represented by odds ratios based on the partial regression coefficients that have been adjusted for all the other variables included in Model 3. For most countries, not having finished secondary school appears to be a greater disadvantage then being female. Particularly extreme examples are the United States, Slovak Republic, Germany and Belgium. Estonia, Japan, and Poland are exceptions, with gender being a greater disadvantage. It is evident that in every country poorly educated women are extremely unlikely to reach the upper quartile.
When analogous analyses are carried out separately by gender there are substantial differences between males and females in some countries in the coefficients for Age and Educational Attainment. However, there are no systematic patterns. On the other hand, in half the countries Cognitive Skills are a stronger predictor for females, while in a quarter of the countries they are a stronger predictor for males.
Findings for the lower quartile (Q1)
Before turning to the logistic regression results, we discuss the D statistics displayed in Appendix 6. In comparison to the D statistics in Appendix 4, the model fit statistics for Q1 are considerably smaller than those for Q4. This may be due, in part, to the exclusion of respondents with zero incomes. For Model 3, the median D value is 0.13. (For Q4, the median D value for Model 3 is 0.22). Here, Canada has the poorest fit (D = 0.016) and Ireland the best (D = 0.29). The weak fit statistics argue for caution in interpreting the regression results, which are presented in Appendix 7.
The patterns in the coefficients are generally similar to those obtained for Q4. Parental Education and Books in the Home are significant in almost all countries in Model 1, but in Model 3, they are significant in only about half the countries with magnitudes much reduced from those in Model 1. By contrast, the logistic regression coefficients for Gender are large and positive in Model 1 and remain so in Model 3. Thus, controlling for the other variables in the model, females have considerably higher odds than males of being in Q1. Indeed, in about half the countries, the (adjusted) disadvantage associated with being female is greater than the (adjusted) disadvantage associated with being even in the lowest category of Educational Attainment. This pattern is different from the one in Q4, where in about threequarters of the countries the disadvantage associated with bring in the lowest category of Educational Attainment is greater than the disadvantage associated with being female.
More specifically, Japan is something of an outlier with a (partial) regression coefficient for Gender (b = 2.40) that is considerably higher than that for the Netherlands (ranked second, b = 1.88) and for South Korea (ranked third, b = 1.72). The corresponding odds ratios are [11.0, 6.6 and 5.6, respectively]. Clearly, these values represent very high levels of genderrelated earnings inequality. Countries with the smallest regression coefficients for gender are Canada, Denmark and the United States (b = 0.79, 0.80 and 0.81, respectively), all corresponding to odds ratios of approximately 2.2. Although smaller, these odds ratios are still substantial. It is worth noting that the United States is an outlier in that the disadvantage associated with all categories of Educational Attainment (below the reference category) are greater than that associated with being female.
Age appears as a strong predictor in almost all countries in Model 1. As one might expect, in comparison to the reference group (aged 45–54), the younger age groups generally have significantly higher odds of being in Q1. In Model 3, the youngest cohort (aged 25–34) has significantly higher odds of being in Q1 in 14 out of 21 countries.^{Footnote 34} For the middle cohort (aged 35–44) the coefficients are significant in seven countries, with smaller magnitudes.
The regression coefficients for the cognitive skill variable are negative, indicating that higher proficiency is associated with lower of odds of being in Q1. They are statistically significant in all countries except Canada and the U.S. The countries with the strongest results are Denmark, Norway, Belgium, and Great Britain (b = − 0.49, − 0.45, − 0.44, − 0.43, respectively). The impact on the log odds of a one standard deviation increase in Cognitive Skills is typically about onefifth to onethird as large as the shift from the lowest to the highest category of Educational Attainment.
Finally, others things being equal, individuals with lower levels of Educational Attainment have greater odds of being in the lower quartile in comparison to individuals with at least a Bachelor’s degree. The odds ratios are greatest in Poland, with the three lower attainment categories having odds ratios of 8.7, 4.3 and 3.1, respectively. In the United States the odds ratios corresponding to the three lower attainment categories are 5.5, 4.1 and 3.8, respectively.
Interesting differences arise in the genderspecific analyses. For males, the D statistics for Model 1 are uniformly close to zero indicating that demographic and background variables have very little explanatory power. Even with Model 3, the D statistics are very small (median = 0.045), indicating that the addition of Cognitive Skills and Educational Attainment does not add much explanatory power. Accordingly, we do not comment on the fitted models.
On the other hand, for females the Dstatistics are similar to those found for models fit to all the data (median = 0.12). Comparing the Model 3 results for females to those for the full sample (Appendix 7), we find that the regression coefficients in the analyses for females only are typically larger in absolute magnitude than the corresponding regression coefficients in the analyses for all respondents. That is, females being in the youngest age group, or the lowest Educational Attainment category, or having lower Cognitive Skills corresponds to a greater probability of being in the lower quartile in comparison to the results for all respondents, with Gender included as an explanatory variable. For females, the disadvantage associated with the youngest age category is greatest in Finland, Poland, and Spain; with lower Cognitive Skills it is greatest in Korea, Ireland, the Netherlands, and Germany; with the lowest category of Educational Attainment it is greatest in Japan, Poland, and Belgium.
We conclude this section with two graphical displays. Figure 5 plots for each country the odds ratios of the gender disadvantage in Q1 and Q4. In order to constrain the plot to the first quadrant, the Yaxis represents the (adjusted) odds ratio for females being in Q1, but the Xaxis represents the (adjusted) odds ratio for males being in Q4. (The latter is just the inverse of the female disadvantage.) In both cases, the odds ratios are derived from the estimated partial regression coefficients that have been adjusted for the other variables included in Model 3.
Countries where the gender disadvantage is approximately the same in both quartiles will fall on or near the 45° line. This is the case for most countries, with a few notable exceptions, In the Netherlands, Korea, and Germany, the female disadvantage is much greater for Q1 than for Q4, while the opposite is true for Estonia and Finland. In addition, Japan is an obvious outlier. The female disadvantage is extremely large for both quartiles, but much greater for the lower quartile. The magnitudes of these odds ratios are particularly striking—and concerning—given that they are based on partial regression coefficients and not on simple probability ratios.
Figure 6 is analogous to Fig. 5. The Yaxis represents the odds ratio for individuals in the lowest Educational Attainment category being in Q1 and the Xaxis represents the odds ratio for individuals in the highest Educational Attainment category being in Q4. Again, in both cases, the odds ratios are adjusted for the other variables included in Model 3. We see here that, with the exception of Poland, for most countries Educational Attainment matters more for Q4 than for Q1. The odds ratios for Q4 are particularly large in the United States, Slovak Republic, Germany, and Belgium.
Summary and limitations
This study contributes to the elucidation of the relationships of Parental Education to the foundational Cognitive Skills of Literacy and Numeracy, as well as the relationships between two indicators of family background to two wagerelated labor market outcomes. The populations of interest comprise individuals aged 25–54 in 21 OECD countries. An earlier report (OECD 2013b) established the strong association between Parental Education and those skills for all participating jurisdictions. Here we have shown that in all 21 countries, Parental Education is also strongly associated with Educational Attainment. Neither set of results is particularly surprising—except, perhaps, that these relationships are clearly evident even in those countries, such as Denmark, France and Sweden, that have invested heavily in reducing the impact of children’s family background on their adult outcomes.
Turning to the question of intergenerational progress in Educational Attainment, we documented the fact that in all countries no more than a quarter (and typically far fewer) of the population has lost ground relative to their parents. For those whose parents’ Educational Attainment were in the lowest two categories, the probability of exceeding their parents’ levels were typically between onethird and onehalf.
We then examined the relationship between Parental Education and Literacy and, separately, Numeracy, while controlling for Educational Attainment. Again, we found a fairly consistent pattern across countries; namely, higher levels of Parental Education are associated with higher skill levels within each Educational Attainment stratum. Generally, the association is stronger at lower levels of Educational Attainment and more pronounced in the contrast between the lowest and highest levels of Parental Education.
Thus, complementing the longestablished finding that the association between Parental Education and Cognitive Skills is largely captured through Educational Attainment, it appears that in many countries a substantial residual disadvantage accrues to those individuals whose parents are in the lowest educational category. At the same time, the finding that this residual disadvantage tends to be greater at lower levels of Educational Attainment indicates that, overall, more education does help to mitigate differences in learning outcomes associated with family background. Undoubtedly, there are multiple sources of this residual disadvantage. Most likely it reflects differences in the nature and intensity of interactions within the family, as well as differences in the quality of the education received among individuals in the same Educational Attainment category. Those differences may further impact the kind of jobs obtained, the level of cognitive demand in those jobs, as well as the opportunities to obtain further training.
Turning to the two wagerelated outcomes, the analyses led to somewhat different narratives. With regard to the upper quartile criterion, for fulltime workers, Parental Education and Books in the Home were generally statistically and substantively significant in Model 1 (baseline) but were much reduced in magnitude, and usually no longer statistically significant in Model 3 (baseline + Cognitive Skills + Educational Attainment). This finding is consistent with the hypothesis that the relationships between Family Background and adult outcomes are mediated by more proximal individual characteristics.
One set of results were quite unexpected: The disadvantages associated with being female, or being in the youngest age category, were very large, even after controlling for other factors. In fact, the coefficients for Gender and Age obtained in Model 1 generally did not change systematically with the addition of other explanatory variables (Models 2 and 3).
Higher levels of Cognitive Skills were strongly associated with higher odds of reaching the upper quartile of the income distribution. However, the addition of Educational Attainment resulted in substantial reductions in the regression coefficients of the cognitive skill variable. Indeed, Educational Attainment was the strongest predictor, with significant disadvantage associated with lower levels of attainment. Roughly speaking, the lower odds associated with being female, being in the age category 25–34, or with having only a secondary education, were about the same order of magnitude.
Turning to the results for the lower quartile, for all workers with nonzero incomes, we found that model fit was substantially lower than that for the upper quartile, with the fit for males being particularly poor. This phenomenon merits further investigation.
In comparison to the findings for the upper quartile, the disadvantage associated with Family Background was somewhat more robust to the addition of other factors and Gender was a relatively stronger predictor of disadvantage. With regard to Cognitive Skills, the regression coefficients in Model 3 were statistically significant (p < 0.01) in almost all countries. In absolute magnitude, however, they were smaller than the corresponding coefficients in Model 3 for the upper quartile.
As noted earlier, the findings for the lower quartile should be interpreted with some caution in view of the poorer model fit. At the same time, the differences in the regimes that appear to be operating in the two tails of the income distribution raise the possibility that the standard econometric models employed in these settings may be badly misspecified. Further investigation of this conjecture is called for.
With regard to the United States, the results presented in OECD (2013b) indicate that it is somewhat of an outlier with regard to the size of the gaps associated with differences in cognitive proficiency, as well as the gradients of cognitive proficiency with respect to various covariates. In this study, we also found that the United States was an outlier: (1) Differences in Literacy and Numeracy (by Parental Education) within Educational Attainment strata were among the largest in the study, and (2) With regard to reaching the upper quartile, the difference in the coefficients of the lowest and second lowest Educational Attainment categories, corresponding to 1.5 logits, was the largest in the study. At the same time, the coefficients of the other variables were not particularly exceptional.
PIAAC shares many of the limitations of all crosssectional studies. For example, respondents’ Cognitive Skills are measured contemporaneously and are likely influenced to some degree by experiences subsequent to the completion of their formal education. This introduces endogeneity into the model leading to estimation bias. Further, there are no measures of respondents’ socioemotional skills, skills that have been shown to be predictive of labor market success (Levin 2013) and, arguably of increasing importance (Deming 2017). Thirdly, the database contains full information on labor market experiences only for the previous 12 months. Thus, systematic differences in earlier work histories (e.g. working parttime, stopping out, etc.) among focal groups complicates interpretations of group differences. The highest category of Educational Attainment includes all individuals with a tertiary degree or higher. In the United States there is evidence of a divergence in income between those with a fouryear degree and those with more advanced credentials (Autor 2014). With larger sample sizes it would have been instructive to examine that divergence in all countries.
Two further cautions: With only two indicators directly related to family background, we surely have an incomplete representation of that construct. Finally, as noted earlier, there is substantial missing data on income in all countries and the range of the proportions of missing data, 15–42%, is considerable. Although the similarity across countries in the patterns of relationships between income quartile location and the explanatory variables is reassuring, comparisons of the magnitudes of the logistic regression coefficients are likely sensitive to differences in the amounts of missing data.
There are also wellknown limitations to the policy implications of findings from crosssectional data because of the difficulty in supporting causal interpretations of those findings. A number of authors have attempted to circumvent these difficulties either through methodologies for combining data from repeated crosssectional surveys (Gustafsson 2013) or through linking different surveys in order to follow specific cohorts (Kaplan 2009; Kaplan and McCarty 2013; Chmielewski 2015). See the review by Chmielewski and Dhuey (2017), as well as recent work by Hampf et al. (2017) and by Bind and Rubin (2017). Hampf et al. investigated a number of different approaches to establishing the causal linkage between skills and wages, concluding there was compelling evidence for such an inference. Bind and Rubin explored different strategies for embedding an observational study in a hypothetical randomized study, with the intent of comparing the resulting causal estimates across studies. These methodological developments represent important directions for strengthening the policy relevance of analyses of crosssectional largescale assessments surveys.
Conclusions
The overall conclusion is that Family Background, when represented by Parental Education and Books in the Home, indeed casts a long shadow on individuals’ life prospects: Across all countries studied, Parental Education accounts for substantial proportions of the variation in respondents’ Educational Attainment, as well as some of the differences in Cognitive Skills within Educational Attainment strata. These results complement the wellestablished associations between Parental Education and Cognitive Skills, such as Literacy and Numeracy. Turning to income, Family Background does help to predict who will be in the upper and lower quartiles of the national annual income distribution. However, as one might expect, the strength of the direct association is much attenuated by the addition of relevant intermediate factors (i.e. Cognitive Skills and Educational Attainment) that are more proximal in time to entry into the labor market. Being female greatly diminishes the odds of reaching the upper quartile and greatly enhances the odds of being in the lower quartile, even controlling for other factors. The strength and robustness of the gender advantage, favoring males, is somewhat surprising and should be of concern to all participating countries.
As always, appropriate policyrelevant interpretations of the statistical findings for a particular country depend to a great extent on knowledge of historical trends in economic activity, government policies, as well as considerations of culture, demography and the like (Cappellari et al. 2015). Formulating effective policies to counter the disadvantage associated with Family Background and Gender will depend on understanding a complex set of dynamics that differ markedly among countries and even regions within countries. Thus, in some countries the gender disadvantage may be largely due to the occupational choices women make in attempting to find an appropriate worklife balance. In other countries, it may be mostly due to structural bias in the labor market. In yet others, a combination of these, and perhaps other factors, offer the best explanation. A recent review (https://www.vox.com/2018/2/19/17018380/genderwagegapchildcarepenalty) cites a number of studies of the gender gap in wages, including one of Danish data that suggests that the gap widens with the birth of the first child.
With respect to the long shadow of family background in the United States, there is evidence of substantial variation by location in both economic mobility and the intergenerational transmission of advantage or disadvantage (Moretti 2013; Chetty et al. 2014; Rothstein 2017). Empirical studies suggest that this variation can be explained, to some degree, by differences by location in factors such as the distribution of socioeconomic status, marriage patterns, labor markets, and the interactions among them (Rothstein 2017; Khatiwada and Sum 2016). Of course, macroeconomic forces and policies also play an important role in determining who succeeds in the labor market (Bernstein 2016).
Thus, aggregate comparative results can only suggest interesting directions for further study to inform policy decisions (Carnoy 2015; Singer and Braun 2018; Singer et al. 2018). To take full advantage of these findings, it is essential to conduct indepth, longitudinal investigations of the context and circumstances pertaining to each country.^{Footnote 35} That said, PIAAC is unique in enabling the linkage of individuals’ family backgrounds, Cognitive Skills, and life/work experiences to a variety of adult outcomes. Consequently, PIAAC and its successors will serve as indispensable tools in attempts to identify and understand the key relationships that are essential foundations for evidenceinformed policy formulation.
Notes
 1.
Parental Education is a relatively simple indicator for a complex construct, family socioeconomic background. Thus, the apparent strength of its relationship with a criterion will tend to be enhanced by those unmeasured facets of the construct that are positively (or negatively) correlated with both variables, but attenuated by those unmeasured facets that are positively correlated with one variable and negatively correlated with the other. More comprehensive studies of the impact of family background on children’s development and adult outcomes are available. See, for example, Alexander et al. (2014).
 2.
OECD (2013b, p. 111–112) offers a compelling rationale for why family socioeconomic background (proxied by Parental Education) should be strongly associated with individuals’ Cognitive Skills.
 3.
The reference group is “Both respondent’s and parents’ education is at least upper secondary”. The term adjusted refers to the fact that there are other variables in the model generating the odds ratios. Statistical adjustment is with respect to Age, Gender, Occupation, and Nativity/Language background.
 4.
Park and Kyei also investigated the extent to which, at the national level, differences in schoolrelated factors could account for difference in gaps.
 5.
A slightly updated version of this figure can be found in Figure 5.3 of OECD (2016).
 6.
Years of education has a s.d. of 3.05 and literacy has a s.d. of 45.8.
 7.
Holzer and Lerman also investigated the relationship of each occupational category to earnings, with various controls in the regression model.
 8.
In order to apply the IV approach, the authors made use of another data base that contained longitudinal information on countries’ education policies.
 9.
A reviewer has pointed out that the different results for Years of Education and for Numeracy could be due, in part, to differences in the magnitudes of the measurement error associated with the two variables.
 10.
There are a number of differences between their analytic strategy and ours, aside from their use of IV. In particular, they employ data from the full age distribution, use the average of the plausible values as the predictor variable, and appear to use least squares (rather than logistic regression) for fitting a model with a dichotomous outcome.
 11.
The design employed in the administration of the cognitive items is an adaptive, multistage design.
 12.
PSTRE is a new domain of assessment made possible by computer administration.
 13.
Respondents who demonstrate very low levels of literacy on the initial (core) module are routed to an instrument that assesses fundamental components of literacy.
 14.
The actual data collection period varied from jurisdiction to jurisdiction. For more detailed information see Table 10.1 in OECD (2013a).
 15.
Data for the United Kingdom (GBR) was based on samples from England and Northern Ireland only.
 16.
PIAAC also collected data on a third skill, problem solving in a technologyrich environment [PSTRE]. We do not examine this skill in the present study.
 17.
More detail can be found in OECD (2013a) (Chapter 2).
 18.
“To increase the accuracy of the assessment of the uncertainty associated with the estimates of the population parameters of interest, PIAAC uses plausible values (PVs)—which are multiple imputations—drawn from an [estimated] posterior distribution [obtained by] combining the IRT scaling of the cognitive items with a latent regression model using information from the BQ … in a population model.” (OECD 2013a; Chapter 17, p1). “The ‘plausible value’ methodology correctly accounts for error (or uncertainty) at the individual level by using multiple imputed proficiency values (plausible values) rather than assuming that this type of uncertainty is zero.” (OECD 2013a; Chapter 17, p2).
 19.
Respondents taking the computer administration are presented with two full modules that contain items linked to either one or two Cognitive Skills. However, through the latent regression model all respondents are associated with sets of PV for all three skills.
 20.
Derived from variable YearlyIncPR from the PIAAC data files for public use.
 21.
Reasonably proficient is taken to mean that they possess the minimal skill levels generally required to succeed in the 21st century.
 22.
The process generates ten PV for each indicator variable.
 23.
The original version of the software could only carry out ordinary regression analysis. An extension completed in 2014 enabled logistic regression to be carried out as well.
 24.
This caution reflects the problem of multiplicity, which arises whenever multiple significance tests are conducted on a dataset. Strategies to control Type I error rates are available but, for this exploratory study, would take us too far afield.
 25.
Gamma is one of a large number of measures of association proposed by Goodman and Kruskal (1954).
 26.
We computed gamma coefficients separately for male and females. The country rankings are very similar to those in Table 2, Column A.
 27.
The use of both correlations and probabilities to describe intergenerational education mobility is not novel. See for example de Broucker and Underwood (1998).
 28.
This point illustrates the importance of considering the countryspecific context in explaining deviations from general patterns of results.
 29.
Some investigators introduce certain interactions in the predictor set, but they are not likely to pick up these sorts of differences.
 30.
The low D values for the Slovak Republic are likely due to the small number of sample respondents falling in Q4.
 31.
In 10 countries the sample size in the lowest category of Educational Attainment is around 50 or below. In 5 of those countries, the estimation of the corresponding regression coefficients is unstable.
 32.
In most countries the sample sizes for some of the categories are small and the corresponding estimates are unstable.
 33.
The author thanks the referee for this interpretation.
 34.
Estonia is the only country in which younger cohorts have lower odds of being in Q1. The relationship becomes insignificant in genderspecific analyses. South Korea is the only country in which younger cohorts have lower odds of being in Q1 in both genderspecific analyses.
 35.
For a discussion of the importance of conducting longitudinal policy analyses along with longitudinal data collection, see (Braun et al. 2006).
Abbreviations
 ALL:

Adult Literacy and Lifeskills
 BQ:

background questionnaire
 FT:

fulltime
 IALS:

International Adult Literacy Survey
 IDB:

international database
 IEA:

International Association for the Evaluation of Educational Achievement
 IRT:

item response theory
 IV:

instrumental variables
 LMO:

labor market outcomes
 OECD:

Organization for Economic Cooperation and Development
 OLS:

ordinary least squares
 PIAAC:

Programme for the International Assessment of Adult Competencies
 PSTRE:

problem solving in technology rich environments
 PV:

plausible values
References
Alexander, K., Entwistle, D., & Olson, L. (2014). The long shadow: family background disadvantaged urban youth, and the transition to adulthood. New York: Russell Sage.
Autor, D. H. (2014). Skills, education, and the rise of earnings inequality among the “other 99 percent”. Science, 344(6186), 843–851. https://doi.org/10.1126/science.1251868.
Bernstein, J. (2016). Wages in the United States: Trends, explanations and solutions. In I. Kirsch & H. I. Braun (Eds.), The dynamics of opportunity in America: Evidence and perspectives. New York: Springer.
Bind, MA. C., & Rubin, D. B. (2017). Bridging observational studies and randomized experiments by embedding the former in the latter. Statistical Methods in Medical Research. https://doi.org/10.1177/0962280217740609.
Braun, H. I. (2016). The dynamics of opportunity in America: A working framework. In I. Kirsch & H. I. Braun (Eds.), The dynamics of opportunity in America: Evidence and perspectives. New York: Springer.
Braun, H. I., & von Davier, M. (2017). The use of test scores from largescale assessment surveys: psychometric and statistical considerations. Largescale Assessments in Education, 5, 17.
Braun, H.I., Wang, A., Jenkins, F., & Weinbaum, E. (2006). The BlackWhite achievement gap: Do state policies matter? Education Policy Analysis Archives. https://doi.org/10.14507/epaa.v14n8.2006.
Cappellari, L., Castelnovo, P., Checchi, D., & Leonardi, M. (2015). Skilled or educated? Educational reform, human capital and earnings. http://www.iza.org/conference_files/2015_Skill_Mismatch/cappellari_l1064.pdf. Accessed 5 Apr 2018.
Carnoy, M. (2015). International test score comparisons and education policy: A review of the critiques. Boulder: National Education Policy Center.
Chetty, R., Hendren, N., Kline, P., & Saez, E. (2014). Where is the land of opportunity? The geography of intergenerational mobility in the United States. The Quarterly Journal of Economics, 129(4), 1553–1623.
Chmielewski, A.K. (2015). Can we close social origin gaps in Literacy over the life course? Evidence from synthetic cohorts in IALS, ALL, and PIAAC. Presentation at the PIAAC International Conference, Haarlem, Netherlands (24 November). https://www.aanmelder.nl/piaac_conference_2015/wiki/160925/presentations%20and%20papers. Accessed 5 Apr 2018.
Chmielewski, A. K., & Dhuey, E. (2017). The analysis of international largescale assessments to address causal questions in education policy. Commissioned paper. Washington, DC: National Academy of Education.
de Broucker, P., & Underwood, K. (1998). Intergenerational education mobility: An international comparison with a focus on postsecondary education. Education Quarterly Review, 5(2), 30.
Deming, D. J. (2017). The growing importance of social skills. The Quarterly Journal of Economics, 132(4), 1593–1640.
Ermisch, J., Jantti, M., & Smeeding, T. (2012). From parents to children: The intergenerational transmission of disadvantage. New York: Russell Sage Foundation.
Goodman, L., & Kruskal, W. (1954). Measures of association for crossclassifications. Journal of the American Statistical Association, 49(268), 732–764.
Gustafsson, J.E. (2013). Causal inference in educational effectiveness research: A comparison of three methods to investigate the effects of homework on student achievement. School Effectiveness and School Improvement, 24(3), 275–295.
Hampf, F., Wiederhold, S., & Woessmann, L. (2017). Skills, earnings, and employment: exploring causality in the estimation of returns to skills. Largescale Assessment in Education, 5, 12. https://doi.org/10.1186/s4053601700457.
Hanushek, E. A., Schwerdt, G., Wiederhold, S., & Woessmann, L. (2013). Returns to skills around the world: Evidence from PIAAC. (OECD Education Working Papers, No. 101). Paris: OECD Publishing.
Holzer, H.J., & Lerman, R.I. (2015). Cognitive Skills in the U.S. labor market: For whom do they matter? https://static1.squarespace.com/static/51bb74b8e4b0139570ddf020/t/54da7582e4b0cb4c49fc0d9d/1423603074704/Holzer_Lerman_PIAAC.pdf. Accessed 5 Apr 2018.
IEA (2015) Help Manual for the IDB Analyzer. Hamburg: IEA. www.iea.nl/data. Accessed 12 Nov 2017.
Kaplan, D. (2009). Causal inference in nonexperimental educational policy research. In G. Sykes, B. Schneider, & D. N. Plank (Eds.), Handbook on education policy research (pp. 139–153). New York: Taylor and Francis.
Kaplan, D., & McCarty, A.T. (2013). Data fusion with international largescale assessments: A case study using the OECD PISA and TALIS studies. Largescale assessments in education. http://largescaleassessmentsineducation.com/content/1/1/6. Accessed 5 Apr 2018.
Khatiwada, I., & Sum, A. (2016). The widening socioeconomic divergence in the U.S. labor market. In I. Kirsch & H. I. Braun (Eds.), The dynamics of opportunity in America: Evidence and perspectives (pp. 197–252). New York: Springer.
Kirsch, I. (2001). The International Adult Literacy Survey (IALS): Understanding what was measured. ETS Research Report RR0125. Princeton: Educational Testing Service.
Kirsch, I., Braun, H., Lennon, M.L., & Sands, A. (2016). Choosing our future: A story of opportunity in America. Princeton, NJ: Educational Testing Service. http://opportunityproject.ets.org/. Accessed 5 Apr 2018.
Kirsch, I., & Lennon, M. L. (2017). PIAAC: A new design for a new era. Largescale Assessments in Education, 5(11), 1–22.
Levin, H. M. (2013). The utility and need for incorporating nonCognitive Skills into largescale educational assessments. In M. von Davier, E. Gonzalez, I. Kirsch, & K. Yamamoto (Eds.), The role of international largescale assessments: Perspectives from technology, economy and educational research (pp. 67–86). Springer Science + Business Media: Dordrecht, Netherlands.
Lin, D., Lutter, R., & Ruhm, C. J. (2016). Cognitive performance and labor market outcomes. (National Bureau of Economic Research Working Paper, No. 22470). Boston: NBER.
Massing, N., & Schneider, S. L. (2017). Degrees of competency: the relationship between educational qualifications and adult skills across countries. Large Scale Assessment in Education, 5(6), 1–34.
Moretti, E. (2013). The new geography of jobs. New York: Mariner Books.
Murnane, R. J., Willett, J. B., Duhaldeborde, Y., & Tyler, J. H. (2000). How important are the Cognitive Skills of teenagers in predicting subsequent earnings? Journal of Policy Analysis and Management, 19(4), 547–568.
OECD. (2013a). Survey of adult skills technical report (2nd ed.). Paris: OECD Publishing.
OECD. (2013b). OECD skills outlook 2013: First results from the survey of adult skills. Paris: OECD Publishing.
OECD. (2016). Skills matter: Further results from the survey of adult skills, OECD skills studies. Paris: OECD Publishing.
Park, H., & Kyei, P. (2011). Literacy gaps by Educational Attainment: A crossnational analysis. Social Forces, 89(3), 879–904.
Reardon, S. (2011). The widening academic achievement gap between the rich and the poor: New evidence and possible explanations. In G. J. Duncan & R. J. Murnane (Eds.), Whither opportunity. New York: Russell Sage Foundation.
Reardon, S., & Portilla, X. A. (2016). Recent trends in income, racial, and ethnic school readiness gaps at kindergarten entry. AERA Open, 2(3), 1–18.
Ritzen, J. (2013). International largescale assessments as change agents. In M. von Davier, E. Gonzalez, I. Kirsch, & K. Yamamoto (Eds.), The role of largescale assessments: Perspectives from technology, economy and educational research. New York: Springer.
Rothstein, J. (2017). Inequality of educational opportunity? Schools as mediators of the intergenerational transmission of income. (Institute for Research on Labor and Employment. Working Paper). Berkeley: U.C. Berkeley.
Singer, J. D., & Braun, H. I. (2018). The (mis)uses of international education assessments, with five suggestions for improvement. Science, 360(6384), 38–40.
Singer, J. D., Braun, H. I., & Chudowski, N. (2018). International education assessments: Cautions, conundrums, and common sense. Washington, DC: National Academy of Education.
Smeeding, T. (2016). Gates, gaps and intergenerational mobility: The importance of an even start. In I. Kirsch & H. I. Braun (Eds.), The dynamics of opportunity in America: Evidence and perspectives. New York: Springer.
Tjur, T. (2008). Coefficients of determination in logistic regression models—a new proposal: The coefficient of discrimination. The American Statistician, 63(4), 366–372.
von Davier, M., Sinharay, S., Oranje, A., & Beaton, A. (2007). The statistical procedures used in National Assessment of Educational Progress: recent developments and future directions. In C. R. Rao & S. Sinharay (Eds.), Handbook of Statistics (Vol. 26, pp. 1039–1055). New York: Elsevier.
Authors' contributions
HB conceived, designed and implemented the study. Data analyses were conducted by graduate assistants under the direct supervision of the author. The study manuscript is the sole work of the author. The author read and approved the final manuscript.
Acknowledgements
The author extends his appreciation to the OECD for its support through the T.J. Alexander Fellowship, as well as its hospitality at the initiation of this work. He also benefitted from the comments and advice from Irwin Kirsch, Richard Murnane, William Thorne, Matthias von Davier, Kentaro Yamamoto, the editor and an anonymous referee. Excellent research assistance was provided by Shiya Yi, Yiran Chen, and Maria Baez Cruz. Any errors or omissions are the responsibility of the author.
Competing interests
The author declares he has no competing interests.
Availability of data and materials
All data and materials are publicly available.
Consent for publication
Not applicable.
Ethics approval and consent to participate
Not applicable.
Funding
The work reported here was supported in part by a T. J. Alexander Fellowship from the Organization for Economic Cooperation and Development.
Publisher’s Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Author information
Rights and permissions
Open Access This article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.
About this article
Received
Accepted
Published
DOI
Keywords
 PIAAC
 Parental education
 Cognitive Skills
 Educational Attainment
 Annual income
 Logistic regression