Over the last two decades, an increasing number of education systems (hereafter countries) have found their participation in large-scale cross-national educational assessments a more and more relevant part of quality evaluation of their school systems. Examples of these studies are the Programme for International Student Assessment (PISA), conducted by the Organisation for Economic Co-operation and Development (OECD), and the Trends in International Mathematics and Science Study (TIMSS), conducted by the International Association for the Evaluation of Educational Achievement (IEA). Germany is one of the countries that regularly takes part in these comparative studies.
The results of these studies allow countries not only to assess how well their own students are performing, on average, but also to assess that performance against the average performance of students in the other participating countries. These rankings play an important role in government-led educational decision-making, which forms the basis for reforms. In order to draw valid conclusions about students’ abilities during this process, test-takers need to be motivated to expend full effort throughout the entire testing session. However, such tests have no positive or negative consequences for the test-takers, no matter how successfully or unsuccessfully they perform.
These tests are often referred to as low-stakes tests. Accordingly, it is uncertain whether the students do actually expend full effort; it could be that the students’ results do not depict their true level of ability due to low motivation. Therefore, the results of low-stakes assessments may not constitute a valid measure of students’ abilities. In this case, a valid interpretation of the test results is threatened. Our aim in this study is to provide a closer look at the role of test-taking motivation on student performance in low-stakes assessments. Before describing our research questions in detail, we define test-taking motivation and provide an overview of previous research.
Literature
Test-taking motivation
Test-taking motivation is a specific type of achievement motivation that can be understood as an active process by which goal-oriented activity is initiated and maintained (Schunk et al. [2008]). It is assumed that students have domain-specific achievement motivation (e.g., motivation to engage in mathematics) and situation-specific achievement motivation (e.g., motivation to work hard in a specific school-based assessment). Domain-specific motivational constructs such as self-concept in mathematics cover a relatively stable personal trait, while situation-specific motivational constructs cover a state that can differ (e.g., depend on how the student feels “on the day”). Test-taking motivation is assigned to the latter motivational constructs, because taking a test is a specific situation for students. Baumert and Demmrich ([2001]) define this type of motivation as “the willingness to engage in working on test items and to invest effort and persistence in this undertaking” (p. 441).
In high-stakes tests, test-takers typically show high motivation to perform well because of the positive or negative consequences of their performance on that test (Barry and Finney [2009]). Research exploring test-taking motivation relative to low-stakes assessments presents a less clear picture. Most of these studies show a connection between test-taking motivation and performance on the one hand, and between test-taking motivation and test stakes on the other (Cole et al. [2008]; Eklöf [2007], [2008]; Thelk et al. [2009]; Wise and DeMars [2005]; Wolf and Smith [1995]). However, some studies have found no such relationships (Baumert and Demmrich [2001]; O’Neil et al. [1995], [2005]). In the following subsections, we describe studies that have detected associations between test-taking motivation, performance, and test stakes, and those that have not. These studies include some of those just listed.
Studies showing associations
The investigation by Eklöf ([2007], [2008]) focused on the test-taking motivation of Swedish Grade 8 students in TIMSS 2003, deemed a low-stakes assessment, and examined both domain-specific and situation-specific aspects of motivation. In this study, the following motivational scales explained 31% of the variance in the students’ average mathematics achievement scores: mathematics self-concept and value of mathematics as domain-specific factors of motivation as well as test-taking motivation as a situation-specific aspect of motivation. Of these variables, mathematics self-concept was the most important predictor. However, after controlling for the domain-specific factors of motivation, Eklöf no longer found a significant relationship for the situation-specific aspect of motivation. Eklöf assumed that test-taking motivation had no effect because most of these Swedish Grade 8 students, having not previously experienced receiving grades or taken external tests, did not perceive the test as a low-stakes one.
Eklöf and Nyroos’s ([2013]) analyses of data pertaining to performance of Grade 9 students on the Swedish national test of science achievement in 2009 supported the findings from the 2003 TIMSS data: a significant relationship between performance in science and (a) reported effort (r = 0.25), (b) perceived importance of the test (r = 0.20), and (c) test anxiety (r = −0.10). However, the authors could not consider the domain-specific aspects of motivation in their analyses because data on this matter were not collected during the assessment.
Cole et al. ([2008]) investigated the relationship of the following situation-specific aspects of motivation to the mathematics test performance of undergraduate students: interest, effort, and perceived usefulness and importance of the test. The results of the path analyses revealed that usefulness and importance of the test were strong predictors of effort (e.g., R2 = 0.26 for mathematics), which in turn was an important predictor of test performance.
Lau et al. ([2009]) tried to vary test-taking effort in a low-stakes assessment by changing the behavior of the test proctors (invigilators). The proctors were trained to point out the importance and usefulness of the test to the students and to encourage them to work hard. The proctors were also asked to create a productive working environment. The research team investigated the students’ effort in testing sessions before (traditional sessions) and after implementation of the proctor-strategies (strategic sessions). Student effort was higher and less variable in the strategic sessions than in the traditional sessions (effect sizes between d = 0.35 and d = 0.57). The effect of increased effort on performance could not be analyzed because the tests before and after the implementation were slightly different in content, making performance on them noncomparable.
Other studies that have found a strong relationship between test-taking motivation and performance include those by Thelk et al. ([2009]) and Wise and DeMars ([2005]). The latter two authors showed from their synthesis of 12 empirical studies that motivated students outperformed their unmotivated classmates by more than one-half of a standard deviation. However, Wise and DeMars cautioned that the relationship between test performance and test-taking motivation could have been distorted by academic ability as a mediator variable.
Studies showing no associations
One of the studies that found no relationship between test-taking motivation, performance, and test-stakes is that by O’Neil et al. ([2005]). They analyzed the effect of financial incentives on test-taking motivation and performance in mathematics, and divided their sample of test-takers into two groups. The Group 1 students were told they would receive a financial incentive of $10 per item correct. Also, in order to increase the credibility of the study, test-takers immediately received $20 if they got two simple items at the beginning of the test correct. Group 2 received no incentives for their participation. Group 1 reported significantly higher levels of test-taking effort and self-efficacy than Group 2 did. However, despite the high reward and the higher level of reported effort for the incentive group, there was no significant difference in performance between the treatment and the control group. The authors assumed that this outcome was due to the lack of correlation between effort and performance for the whole sample.
Similar results were found in a PISA 2000 pilot study in Germany (Baumert and Demmrich [2001]). The study examined whether increasing the test’s stakes led to a higher level of test-taking motivation and a higher level of performance. Using an experimental design, the researchers manipulated the test conditions across four different groups of test-takers. The incentive for Group 1 was informational feedback, for Group 2 it was grades, and for Group 3 a financial reward. The fourth group was positioned as a reference group. Its members received the usual instructions accompanying PISA assessments and also had emphasized to them the social importance of tests in international comparative studies. In all groups, the invested effort was high, and the personal value of a successful test and the perceived usefulness of the test were the same. Furthermore, the authors found no treatment effects on test performance. While they considered many situation-specific aspects of motivation in their analyses, no domain-specific aspects of motivation were included.
The importance of investigating school-track-specific differences for tracked school systems
Before describing research on specific differences in motivation across types of school, which is one focus of our study, we consider it useful to explain Germany’s tracked school system. After completing elementary school (grade 4 or grade 6, depending on the federal state), German students are assigned to different school tracks, primarily according to their scholastic performance. The academic track is the Gymnasium. The intermediate track has several school types, such as the Realschule, and the lower track is the Hauptschule. Of these school types, the Gymnasium (academic track) is the only one that exists in all German federal states.
One of the rationales for tracking in Germany is that school lessons can be better optimized according to student requirements if students are in homogeneous learning groups. For instance, because students in homogenous learning groups assumedly require similar learning time, groups with high achievers can cover more learning topics as well as topics with higher cognitive demands (Köller and Baumert [2001], [2012]). In short, the supposition is that students attain higher learning outcomes in homogeneous learning groups than in heterogeneous ones.
Significant differences in mean achievement occur across the schools in the three different tracks, while mean achievement in schools of the same track is generally similar (Trautwein et al. [2006]). One investigation, for example, showed students in grade ten in academic-track schools outperforming students in intermediate-track schools and in lower-track schools in a mathematic test even after the researchers had controlled for math achievement in grade 7 at individual and school levels (Köller and Baumert [2001]). The differences between the schools in each track were only minor. Köller and Baumert suggested that one reason for the superior performance of the academic-track schools is because of their instruction culture, seen partly as a consequence of the teacher training (Köller and Baumert [2001], [2012]). Differentiation in student performance in any one school track or school will still occur, of course, commensurate with socioeconomic, psychosocial, motivational and cognitive variables. However, because achievement covaries with socioeconomic status to a very strong extent, social segregation is an undesirable ancillary effect of tracking. In essence, the different tracks “act” as developmental environments differentially influencing student performance (Baumert et al. [2003]).
Reference to one of the studies already discussed in this paper—that by Baumert and Demmrich ([2001])—is useful at this point. In addition to looking at the influence of incentives on test-taking motivation, the authors also compared the effort students in the lower-track Hauptschule and the academic-track Gymnasium put into their work on the particular test. The intended effort turned out to be the same for both school types, but the invested effort was lower for the Hauptschule students than for the Gymnasium students. The students in the academic-tracked schools reported a more positive emotional state and less task-irrelevant cognitions than the students in the lower-track schools. For the entire sample, self-reported effort and worry were the most powerful predictors of test performance. However, there was no investigation of the interplay between the motivational variables and their effects on performance for the different school tracks conducted.
Research on differences in domain-specific and trait-like motivational constructs across the school types has shown mixed results. Two investigations provide useful examples. Artelt et al. ([2001]) found no differences across school tracks in students’ mathematics self-concept or interest in the subject. The absence of self-concept differences suggests the “big fish little pond” effect may have been at play here (Marsh [1987]; Trautwein et al. [2006]). According to this effect, students construct their self-concept by comparing themselves with their schoolmates; not by comparing themselves with all students of their age. Thus, students with a similar level of performance will report lower self-concepts if they are in a high-achieving environment (such as the academic track) than in a low-achieving environment. Consequently, despite students in academic-track schools knowing that their performance is higher than the performance of students in lower-track schools, they do not show a corresponding higher self-concept (Artelt et al. [2001]).
In contrast, Baumert et al. ([2006]) found specific differences in students’ self-efficacy beliefs across the school tracks. The authors used national data from the German extension sample of PISA 2000 to explore the influence of school structure on the emergence of differentiated learning environments. They also found evidence of the big fish little pond effect in that the self-efficacy beliefs of students with a similar level of achievement decreased as the track level of the school increased. Baumert and colleagues also conjectured that the larger proportion of class repeaters in the lower than higher tracks might lead to lower self-efficacy beliefs among the students in the lower-track schools. Although this effect did not reach significance, the results nevertheless suggest that the concentration of underachievers in lower tracks can affect students’ effort.
Study objectives
The current state of research indicates that there is a relationship between test performance and test-taking motivation in low-stakes assessments. However, consideration of situation-specific and domain-specific aspects of motivation is lacking in most of the aforementioned studies. Moreover, the lack of research on school-track-specific differences in test-taking motivation and the mixed results of the cited studies on these differences points to the need for more investigation of test-taking motivation across school tracks. We therefore examined the relationship between different motivational aspects and students’ performance in general and across school tracks in particular. Our initial research questions were the following:
1a) To what extent do domain-specific and situation-specific aspects of motivation predict students’ performance in a low-stakes mathematics test?
1b) Are there school-track-specific differences in the relationship between performance in mathematics and domain-specific and situation-specific aspects of motivation?
In many studies, test-taking motivation is mainly operationalized through questions about students’ invested effort, which covers the main element of the test-taking motivation definition. Accordingly, in a second step, we examined whether invested effort was influenced by other motivational aspects and again considered different school tracks in our research questions:
2a) To what extent do situation-specific aspects of motivation predict the invested effort of test-takers in a low-stakes test?
2b) Are there school-track-specific differences in the proportion of invested effort?
In summary, our research questions addressed two separate matters. The first focused on the relationship between performance and domain-specific as well as situation-specific aspects of motivation. The second focused on invested effort and its relationship with situation-specific aspects of motivation. Both sets of research questions also required us to consider differences in student performance across Germany’s school tracks.