Tipultech logo

The testing effect

Author: Dr Simon Moss


After learning material, students often need to complete tests or exams. These tests do not only assess the learning of students& in addition, these tests have also been shown to facilitate learning (see Carpenter, Pashler, & Vul, 2006;; Karpicke & Roediger, 2007b& McDaniel, Roediger, & McDermott, 2007;; Roediger & Karpicke, 2006a& 2006b). In other words, the tests themselves seem to activate retrieval processes that ultimately facilitate the learning and memory of study material.

Variations of the testing effect

Campbell and Mayer (2009) illustrated the testing effect within the format of lectures. In this study, participants attended a lecture, in which 25 Powerpoint slides were presented. The material revolved around the effects of feedback on learning. After four of the slides, a multiple choice question was presented. To answer the question correctly, participants would need to integrate information from multiple slide. Each participant was granted access to an electronic device in which they could present their response anonymously. After the aggregate of these responses was presented, the lecturer then justified the correct answer.

Relative to participants who had not been exposed to the multiple choice questions, the students who did answer these questions performed more proficiently on a subsequent exam. This effect was observed when the exam comprised short answers, demanding retention rather than transfer& performance on subsequent multiple choices did not improve significantly, highlighting the testing effect is not universal (Campbell & Mayer, 2009).

Mechanisms that underpin the testing effect

According to Mayer (2001, 2005, 2008), three sets of cognitive processes underpin meaningful learning. First, participants need to orient their attention to germane information, called selecting. Second, individuals need to coalesce or combine this information together, called organizing. Finally, they need to integrate this consolidated information with existing knowledge, called integration.

Tests can facilitate each of these processes. For example, in anticipation of forthcoming questions, individuals need to orient their attention to information that might be germane to future tests, which facilitates selecting (Campbell & Mayer, 2009). Similarly, to answer the questions themselves, individuals often need to combine information from multiple topics, which fosters organizing and integration (Campbell & Mayer, 2009). Finally, when individuals receive feedback, they often need to adjust their assumptions, which can also facilitate organizing and integration (Campbell & Mayer, 2009).

Alternatively, the testing effect may be ascribed to an increase in the number of retrieval routes. For example, if people learn information in a specific room, cues that evoke memories of this room may also activate this information. If people later test themselves on the material in a different context, cues that evoke memories of this setting will also activate this information. A broader range of cues, therefore, will prompt these memories (Bjork, 1975;; for a distinct, but related, mechanism, see McDaniel & Masson, 1985 ).

Factors that amplify or diminish the testing effect

Retrieval versus recognition tests

Some tests, such as multiple choice examinations, demand recognition rather than recall. That is, participants merely need to recognize the correct answer from a range of alternatives. Other tests, such as examinations in which the answers entail paragraphs or essays, demand recall. In general, tests that demand recall amplify the testing effect (Butler & Roediger, 2007;; Glover, 1989;; Kang, McDermott, & Roediger, 2007).

These findings are consistent with the argument that difficult retrieval processes might enhance, or even underpin, the testing effect--called the theory of retrieval difficulty (Bjork & Bjork, 1992). Indeed, as this argument would predict, when the test is delayed rather than immediate, the testing effect is more pronounced (Jacoby, 1978;; Modigliani, 1976;; Pashler, Zarow, & Tripplet, 2003).

According to the theory of retrieval difficulty (Bjork & Bjork, 1992), the capacity to retrieve information depends on two factors. The first factor, called storage strength, refers to the enduring accessibility of the information, partly related to the frequency of the words or concepts. The second factor, called retrieval strength, refers to the momentary accessibility of the information.

Interestingly, according to the theory, if retrieval strength is high, the storage strength of this information will not increase appreciably. If retrieval strength is low, the storage strength of this information will increase to a larger extent. For example, as Agarwal, Karpizke, Kang, Roediger III, and McDermott (2008) argue, if the information is available when individuals complete the initial test--that is, an open book test--the retrieval strength is high. That is, the information is very accessible. As a consequence, the storage strength of this information will not improve appreciably. Hence, this information might not be as accessible after a delay.

Delayed versus immediate feedback

After individuals complete a test, feedback can be immediate or delayed. Delayed feedback seems to magnify the feedback effect (Bangert-Drowns, Kulik, Kulikm & Morgan, 1991). For example, as Scmidt, Young, Swinnen, and Shapiro, 1989) showed, delayed rather than immediate feedback was more likely to facilitate retention in the future in a motor learning activity. Hence, the testing effect is more pronounced, if feedback is delayed until after the individual has completed an entire exam rather than presented after each answer.

Open versus closed book tests

Several studies have examined whether tests, primarily administered to facilitate learning, should be open or closed book. In contrast to open book tests, during close book tests, students are not permitted to consult their notes or textbooks during the examination process.

Several arguments imply that open book tests might be superior. First, in most circumstances, open book tests are designed to encourage more advanced cognitive processes, such as problem solving and reasoning rather than rote memorization (Feller, 1994, Jacobs & Chase, 1992;; see also Eilersten & Valdermo, 2000). These advanced cognitive processes might facilitate the capacity of individuals to memorize and apply the material in different contexts.

Second, when individuals prepare before the exam, open book tests might promote less stress and anxiety (Theophilides & Dionysiou, 1996;; Theophilides & Koustelini, 2000). The stress and anxiety, provoked by closed book tests, might reduce the capacity of individuals to relate the material to broader knowledge structures. That is, stress and anxiety inhibits many knowledge structures (Kuhl, 2000).

Third, open book tests might provoke fewer errors of commission than closed book tests. That is, during open book tests, individuals are less likely to entertain incorrect facts about the topic, because they can access key knowledge during the examination. Hence, during open book tests, individuals are less likely to integrate false information into their knowledge structures (Butler, Marsh, Goode, & Roediger, 2006;; Roediger & Marsh, 2005).

In contrast, some arguments imply that closed book examinations might enhance the testing effect relative to open book examinations. Closed book tests might demand more difficult retrieval processes. When retrieval processes are more difficult and intricate, the testing effect might be amplified (e.g., Bjork, 1999;; Karpicke & Roediger, 2007a). Consistent with this proposition, relative to recognition tests, recall tests, which demand more difficult retrieval processes, are more likely to amplify the testing effect (Butler & Roediger, 2007;; Glover, 1989;; Kang, McDermott, & Roediger, 2007).

In addition, when individuals complete open book tests, they often receive more immediate feedback about their performance. While they complete the examination, they can ascertain whether some of their initial assumptions were correct. In contrast, when individuals complete closed book tests, feedback is almost invariably delayed. Delayed feedback has also been shown to amplify the testing effect (Bangert-Drowns, Kulik, Kulikm & Morgan, 1991).

Agarwal, Karpizke, Kang, Roediger III, and McDermott (2008) conducted a study to compare the effect of open and closed book tests. In their first study, read six passages, each about 1000 words in length, from a textbook. In the first session, six different study conditions were arranged. In the second session, one week later, the final test was presented--a closed book test. During the study session, some participants merely studied the material. Other participants completed a closed or open book test. Some of the participants who completed the closed book tests also evaluated their own performance later, with the passage available. Some of the participants who completed the open book tests completed this exam while studying.

Overall, a testing effect was observed. Nevertheless, if closed book tests were administered, the testing effect was more pronounced when individuals evaluated their performance--although this effect could be ascribed to additional exposure to the material. On the delayed test, closed and open book tests generated comparable levels of performance on the delayed test, administered one week after the study session.

Agarwal, Karpizke, Kang, Roediger III, and McDermott (2008) generated a similar pattern of findings in their second study. These testing effects were observed even if participants who did not complete these tests during the study session were exposed to the material several times. Interestingly, participants predicted, albeit incorrectly, that studying the material twice would be more effective than studying the material once and then testing themselves before the final examination.

High stakes during the initial testse

When the stakes to perform well on the initial test are high, evoking pressure, the testing effect actually dissipates. To illustrate, in one study, conducted by Hinze and Rapp (2014), participants read various passages of text about biology. To increase the stakes, some participants were told they will receive an extra $5 if, later, they and another person perform well on a subsequent quiz of this material. In addition, they were informed the other person had already performed well--an instruction that raised their pressure to excel. Next, over the next 5 minutes, some but not all participants completed various quizzes to test their learning of this material. Finally, 7 days later, participants returned to complete a final test that assessed understanding of this material. Before completing this test, all participants were informed they will receive the $5 regardless of performance.

Performance on the quizzes did not depend on whether the stakes were elevated or not. So the stakes did not affect initial retrieval. Participants who completed the quizzes, on average, performed better on the final test, consistent with the testing effect. However, the benefit of this quiz was not observed when the stakes had been raised.

Arguably, the testing effect can be ascribed to the possibility that, during the initial quizzes, participants tend to reclassify and elaborate the material. These cognitive operations demand effort and executive control. The elevated stakes and performance pressure may compromise the motivation of individuals to engage in these processes and, therefore, could diminish the testing effect.

Emotional events after the test

The testing effect is often ascribed to processes that coincide with the retrieval of this material during the test. However, as Finn and Roediger (2011) showed, even processes after the retrieval of this material could amplify the testing effect. Specifically, if emotional images follow the first test, the testing effect is amplified.

To illustrate, in one study, participants who spoke English first learnt various Swahili words. In particular, they received a series of Swahili words, each presented alongside the English translation, such as lulu-pearl. Then, participants received the initial test. That is, some of the Swahili words appeared alone. Participants were then asked to retrieve the English translation. Finally, they completed the final assessment, similar to the first test, but comprising more words.

During the initial test, after each correct answer, no picture, a blank picture, or a distressing picture was presented. If the picture was distressing, participants were especially likely to remember the English translation of that word in the final assessment as well. That is, these upsetting photographs magnified the testing effect. These distressing pictures were effective even if presented 2 seconds after the words appeared--but only if the participants had retrieved the right answer.

According to Finn and Roediger (2011), these emotional images activate limbic regions such as the amygdala, which in turn tends to activate the hippocampus. The hippocampus facilitates the capacity of individuals to remember these pairs of words. Nevertheless, emotional pictures during the learning of these words can be disruptive, because attention may be diverted from the material that needs to be memorized.

Related techniques

The clicker technique

The clicker technique is a teaching method that has been shown to facilitate learning. During a workshop, the instructors present multiple choice questions, assessing knowledge or opinions. The students or participants use a hand-held device, called a clicker, to indicate their response. The distribution of responses is then presented. This technique has been shown to expedite the rate of learning (Anderson, Healy, Kole, & Bourne, 2013).

The clicker technique offers two key benefits. First, instructors do not need to devote time to material that participants already know. Second, the clicker technique generates the same benefits as other tests. The combination of studying vital material and retesting has been shown to be more beneficial than either facet alone (Anderson, Healy, Kole, & Bourne, 2013). Anderson, Healy, Kole, and Bourne (2013) showed the clicker technique promotes learning that generalizes to other domains and compresses the time needed to teach material.

Practical implications

After individuals learn material, they should receive an optional test. The test could include some open-book or multiple choice questions, to facilitate initial confidence, as well as closed-book questions, involving short answers, which might facilitate future retention. Feedback should be presented after the test is completed.


Agarwal, P. K., Karpizke, J. D., Kang, S. H. K., Roediger III, H. L., & McDermott, K. B. (2008). Examining the testing effect with open- and closed-book tests. Applied Cognitive Psychology, 22, 861-876.

Anderson, L. S.., Healy, A. F., Kole, J. A., & Bourne, I. E. (2013). The clicker technique: Cultivating efficient teaching and successful learning. Applied Cognitive Psychology, 27, 222-234. doi: 10.1002/acp.2899

Baillie, C., & Toohey, S. (1997). The power test: Its impact on student learning in a materials science course for engineering. Assessment & Evaluation in Higher Education, 22, 33-49.

Bjork, R. A. (1994). Memory and metamemory considerations in the training of human beings. In J. Metcalfe, & A. Shimamura (Eds.), Metacognition: Knowing about knowing (pp. 185-205). Cambridge, MA: MIT Press.

Bjork, R. A. (1999). Assessing our own competence: Heuristics and illusions. In D. Gopher, & A. Koriat (Eds.), Attention and performance XVII. Cognitive regulation of performance: Interaction of theory and application (pp. 435-459). Cambridge, MA: MIT Press.

Bjork, R. A., & Bjork, E. L. (1992). A new theory of disuse and an old theory of stimulus fluctuation. In A. Healy, S. Kosslyn, & R. Shiffrin (Eds.), From learning processes to cognitive processes: Essays in honor of William K. Estes (Vol. 2, pp. 35-67). Hillsdale, NJ: Erlbaum.

Butler, A. C., Marsh, E. J., Goode, M. K., & Roediger, H. L. (2006). When additional multiple-choice lures aid versus hinder later memory. Applied Cognitive Psychology, 20, 941-956.

Butler, A. C., & Roediger, H. L. (2007). Testing improves long-term retention in a simulated classroom setting. European Journal of Cognitive Psychology,

Campbell, J., & Mayer, R. E. (2009). Questioning as an instructional method: Does it affect learning from lectures. Applied Cognitive Psychology, 23, 747-759.

Carpenter, S. K., Pashler, H., & Vul, E. (2006). What types of learning are enhanced by a cued recall test? Psychonomic Bulletin & Review, 13, 826-830.

Chan, J. C. K., McDermott, K. B., & Roediger, H. L. (2006). Retrieval-induced facilitation: Initially nontested material can benefit from prior testing of related material. Journal of Experimental Psychology: General, 135, 553-571.

Cnop, I., & Grandsard, F. (1994). An open-book exam for non-mathematics majors. International Journal of Mathematical Education in Science and Technology, 25, 125-130.

Dunlosky, J., & Nelson, T. O. (1992). Importance of the kind of cue for judgments of learning (JOL) and the delayed-JOL effect. Memory & Cognition, 20, 374-380.

Eilertsen, T. V., & Valdermo, O. (2000). Open-book assessment: A contribution to improved learning? Studies in Educational Evaluation, 26, 91-103.

Feller, M. (1994). Open-book testing and education for the future. Studies in Educational Evaluation, 20, 235-238.

Finn, B., & Roediger, H. L. (2011). Enhancing retention through reconsolidation: Negative emotional arousal following retrieval enhances later recall. Psychological Science, 22, 781-786. doi:10.1177/0956797611407932

Flesch, R. (1948). A new readability yardstick. Journal of Applied Psychology, 32, 221-233.

Glover, J. A. (1989). The testing phenomenon: Not gone, but nearly forgotten. Journal of Educational Psychology, 81, 392-399.

Hinze, S. R., & Rapp, D. N. (2014). Retrieval (sometimes) enhances learning: performance pressure reduces the benefits of retrieval practice. Applied Cognitive Psychology, 28, 597-606. doi: 10.1002/acp.3032

Hogan, R. M., & Kintsch, W. (1971). Differential effects of study and test trials on long-term recognition and recall. Journal of Verbal Learning & Verbal Behavior, 10, 562-567.

Ioannidou, M. K. (1997). Testing and life-long learning: Open-book and closed-book examination in a university course. Studies in Educational Evaluation, 23, 131-139.

Jacoby, L. L. (1978). On interpreting the effects of repetition: Solving a problem versus remembering a solution. Journal of Verbal Learning & Verbal Behavior, 17, 649-667.

Kang, S. H. K., McDermott, K. B., & Roediger, H. L. (2007). Test format and corrective feedback modify the effect of testing on long-term retention. European Journal of Cognitive Psychology, 19, 528-558.

Karpicke, J. D., & Roediger, H. L. (2007a). Expanding retrieval promotes short-term retention, but equally spaced retrieval enhances long-term retention. Journal of Experimental Psychology: Learning, Memory, and Cognition, 33, 704-719.

Karpicke, J. D., & Roediger, H. L. (2007b). Repeated retrieval during learning is the key to long-term retention. Journal of Memory and Language, 57, 151-162.

Koriat, A. (1997). Monitoring one's own knowledge during study: A cue-utilization approach to judgments of learning. Journal of Experimental Psychology: General, 126, 349-370.

Koriat, A., Bjork, R. A., Sheffer, L., & Bar, S. K. (2004). Predicting one's own forgetting: The role of experience-based and theory-based processes. Journal of Experimental Psychology: General, 133, 643-656.

Koriat, A., Sheffer, L., & Ma'ayan, H. (2002). Comparing objective and subjective learning curves: Judgments of learning exhibit increased underconfidence with practice. Journal of Experimental Psychology: General, 131, 147-162.

Kuhl, J. (2000). A functional-design approach to motivation and volition: The dynamics of personality systems interactions. In M. Boekaerts, P. R. Pintrich, & M. Zeidner (Eds.), Self-regulation: Directions and challenges for future research (pp. 111-169). New York: Academic Press.

Mayer, R. E. (1975). Forward transfer of different reading strategies due to test-like events in mathematics text. Journal of Educational Psychology, 67, 165-169.

Mayer, R. E. (2001). Multimedia learning. New York: Cambridge University Press.

Mayer, R. E. (2005). The cognitive theory of multimedia learning. In R. E. Mayer (Ed.), The Cambridge handbook of multimedia learning (pp. 31-48). New York: Cambridge University Press.

Mayer, R. E. (2008). Learning and instruction (2nd ed.). Upper Saddle River, NJ: Merrill Prentice Hall Pearson.

McDaniel, M. A., Anderson, J. L., Derbish, M. H., & Morrisette, N. (2007). Testing the testing effect in the classroom. European Journal of Cognitive Psychology, 19, 494-513.

McDaniel, M. A., & Masson, M. E. J. (1985). Altering memory representations through retrieval. Journal of Experimental Psychology: Learning, Memory, and Cognition, 11, 371-385.

McDaniel, M. A., Roediger, H. L., & McDermott, K. B. (2007). Generalizing test-enhanced learning from the laboratory to the classroom. Psychonomic Bulletin & Review, 14, 200-206.

Modigliani, V. (1976). Effects on a later recall by delaying initial recall. Journal of Experimental Psychology: Human Learning & Memory, 2, 609-622.

Pashler, H., Zarow, G., & Triplett, B. (2003). Is temporal spacing of tests helpful even when it inflates error rates? Journal of Experimental Psychology: Learning, Memory, and Cognition, 29, 1051-1057.

Pauker, J. D. (1974). Effect of open book examinations on test performance in an undergraduate child psychology course. Teaching of Psychology, 1, 71-73.

Roediger, H. L., & Karpicke, J. D. (2006a). The power of testing memory: Basic research and implications for educational practice. Perspectives on Psychological Science, 1, 181-210.

Roediger, H. L., & Karpicke, J. D. (2006b). Test-enhanced learning: Taking memory tests improves long-term retention. Psychological Science, 17, 249-255.

Roediger, H. L., & Marsh, E. J. (2005). The positive and negative consequences of multiple-choice testing. Journal of Experimental Psychology: Learning, Memory, and Cognition, 31, 1155-1159.

Schmidt, R. A., Young, D. E., Swinnen, S., & Shapiro, D. C. (1989). Summary knowledge of results for skill acquisition: Support for the guidance hypothesis. Journal of Experimental Psychology: Learning, Memory, and Cognition, 15, 352-359.

Theophilides, C., & Dionysiou, O. (1996). The major functions of the open-book examination at the university level: A factor analytic study. Studies in Educational Evaluation, 22, 157-170.

Theophilides, C., & Koutselini, M. (2000). Study behavior in the closed-book and open-book examination: A comparative analysis. Educational Research and Evaluation, 6, 379-393.

Wheeler, M. A., Ewers, M., & Buonanno, J. F. (2003). Different rates of forgetting following study versus test trials. Memory, 11, 571-580.

Whitten, W. B., & Bjork, R. A. (1977). Learning from tests: Effects of spacing. Journal of Verbal Learning & Verbal Behavior, 16, 465-478.

Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.

Last Update: 6/28/2016