Internet versus paper administration

Author: Dr Simon Moss

Overview

Researchers and practitioners enjoy many key benefits when they administer questionnaires over the internet. The computer can administer, score, and interpret the responses to questionnaires automatically and efficiently (McBride, 1998;; Cook, Heath, Thompson, & Thompson, 2001). In addition, individuals can be tested in many locations simultaneously and economically. (Michels, Meade, Lautenschlager, & Gentry, 2003

Nevertheless, often responses to the questions on paper were invoked to validate the questionnaire. Whether or not this validation process generalizes to questionnaire administered over the internet remains a topic of contention.

Indeed, if internet and paper questionnaires are not comparable to one another, many complications could arise. First, organizations often offer both versions to, for example, job applicants. Indeed, minorities, deprived, or uneducated individuals are often unable to access the internet (Sharf, 2000, Harris, 2003). Furthermore, internet administration is not always applicable because the environment is not controlled and privacy cannot always be guaranteed (Ferrando & Lorenzo-Seva, 2005). If internet and paper questionnaires are not comparable to one another, some applicants, thus, might be unfairly advantaged.

Second, if these formats are not comparable, other key properties and functions of the questionnaire, such as norms and implications, might not generalize from paper to internet.

Because of these issues, many studies have examined the equivalence of internet and paper versions of questionnaires in the domain of reading comprehension (Pomplun, Frey, & Becker, 2002), employee motivation (Yost & Homer, 1998), emotions (Fouladi, McCarthy, & Moller, 2002), decision making (Gati & Saka, 2001;; Rosenfeld, Doherty, Vicino, Kantor, & Greaves, 1989), ability (Potosky & Bobko, 2004), self esteem (Vispoel, Boo, & Bleiler, 2001), and personality (Meade, Michels, & Lautenschlager, 2007).

Social desirability

A variety of studies have examined whether personality tests, and other inventories, are more or less susceptible to social desirability biases when completed on the internet rather than on paper (e.g., .g., Dwight & Fiegelson, 2000;; Lautenschlager & Flaherty, 1990;; Martin & Nagao, 1989;; Potosky & Bobko, 1997;; Richman, Kiesler, Weisband, & Drasgow, 1999). If individuals are, at least, moderately comfortable with using computers, social desirability biases-that is, the extent to which participants distort their responses to portray themselves more favorably-are similar in internet and paper formats. Any differences are usually minimal.

Personality tests

In general, the differences between internet and paper versions of personality inventories are modest. Research has shown the mean level of desirable traits, such as conscientiousness, agreeableness, and emotional stability, is marginally lower when job applicants or incumbents complete the internet version (Ployhart, Weekley, Holtz, & Kemp, 2002)-but these disparities are modest.

Second, the variance seems to be higher when participants complete the questionnaire on the internet (Ployhart, Weekley, Holtz, & Kemp, 2002;; Salgado & Moscoso, 2003). This property could be desirable, enabling researchers to discriminate individuals more readily. Alternatively, this property could reflect elevated levels of random error.

Nevertheless, contrary to the proposition that random error might increase when participants complete the questionnaires on the internet, research has shown that indices of reliability, such as internal consistency, are either more encouraging (Ployhart, Weekley, Holtz, & Kemp, 2002) or equally encouraging (Bartram & Brown, 2004;; Salgado & Moscoso, 2003) in the internet variants compared to the paper versions. Likewise, the factor structure or correlations amongst items do not seem to depend on the format in which the questionnaires are completed (Bartram & Brown, 2004;; Salgado & Moscoso, 2003).

Perhaps the most comprehensive examination of whether or not responses differ between online and paper versions was conducted by Meade, Michels, and Lautenschlager (2007). These authors examined various confirmatory models to assess whether responses to the OPQ, a personality measure that gauges 11 traits, differs between internet and paper.

This study showed that many, but not all, measures of personality were identical in these formats. All the statistical properties of responses to questions on conscientiousness were the same in the internet and paper versions. Nevertheless, the mean and variance of some traits did differ between these formats.

Interestingly, more sophisticated tests, such as measures of invariance, uncovered discrepancies between the two formats that conventional statistical procedures overlooked. These discrepancies imply the precise meaning and implications of a measure might differ between the formats (Meade, Michels, & Lautenschlager, 2007).

Responses did differ between individuals granted choice over which format to use and individuals granted no choice. Furthermore, some differences between internet and paper versions emerged only when participants were not granted any choice over which format to use. This effect, however, might apply only when participants feel that choice could have been offered (Meade, Michels, & Lautenschlager, 2007).

Workplace characteristics

Some studies have examined whether the properties of surveys that examine workplace characteristics depend upon whether the questionnaire was completed on the internet or on paper. Stanton (1998), for example, found that perceptions of workplace fairness did not differ between these two formats. That is, Stanton (1998) showed elevated levels of measurement invariance-that is, the degree to which the means, variance, and covariance across measures is identical in the internet and paper versions (Horn & McArdle, 1992;; see Vandenberg & Lance, 2000, for an overview of the multiple group confirmatory factor analysis, often used to establish measurement invariance).

Other researchers have also uncovered indices of measurement invariance, but using item response theory. The relative responses across items, as gauged by item response theory, does not seem to differ between online and paper versions-as shown in surveys of employee opinions (Young, Daum, Robie, & Macey, 2000, as cited in Meade, Michels, & Lautenschlager, 2007) and satisfaction with colleagues or supervisors (Donovan, Drasgow, & Probst, 2000).

Amazon Mechanical Turk

Buhrmester, Kwang, and Gosling (2011) outline the benefits of Amazon Mechanical Turk, a potential source of participants on the internet. The people who utilize this site are either requesters, who post jobs, or workers, who complete these jobs. Over 100 000 people from about 100 or more nations utilize this site. Requesters pay workers as soon as they feel the job was completed. To pay workers, or potential participants, the requesters or researchers deposit money into a credit card account. Payments can then be awarded automatically or manually. Amazon charges a 10% commission.

The amount of money affects the response rate. For surveys that demand less than 5 minutes to complete, 10 cents is sufficient for a response rate of 25%. For surveys that demand 10 minutes to complete, 50 cents will generate a response rate of about 32%. For long surveys, around 30 minutes, 50 cents will generate a response rate of about 17%. Lower response rates tend to impede recruitment but do not seem to diminish the reliability of responses. Indeed, reliability does not seem to differ between MTurk workers and traditional samples. Most workers, however, complete the tasks because of enjoyment or other intrinsic motivations (Buhrmester et al., 2011). Relative to other internet sources, on average, participants are older and more diverse on this site.

Limitations of past research

Many studies have shown that properties of data do not change dramatically when questionnaires are completed on the internet rather than on paper. Nevertheless, a few studies that demonstrate similarities across the formats might not generalize to all contexts, situations, settings, and procedures. To ensure that individuals can generalize to other contexts, according to Lievens and Harris (2003), a theoretical framework needs to be constructed to predict which factors are likely to impinge on level of measurement invariance (Lievens, Van Dam, & Anderson, 2002).

To illustrate, as proposed by Meade, Michels, & Lautenschlager (2007), if limitations in measurement invariance arise when individuals are randomly assigned to internet or paper conditions, researchers can conclude that properties of the format itself affects characteristics of the data. In contrast, limitations in measurement invariance arise only when participants are granted the opportunity to choose which format they will utilize, researchers can conclude that individual differences that underpin preferences for internet or paper affect the characteristics of data.

This approach was indeed applied by Meade, Michels, & Lautenschlager (2007). However, whether the test characteristics or individual differences are responsible for discrepancies between the format remain uncertain-the various traits generated conflicting implications.

Many other models could be considered. Anxiety, for example, could elevate social desirability biases, because individuals often comply with social norms when threatened.

References

Bartram, D., & Brown, A. (2004). Information exchange article: Online testing: Mode of administration and the stability of OPQ 32i scores. International Journal of Selection and Assessment, 12, 278-284.

Booth-Kewley, S., Edwards, J. E., & Rosenfeld, P. (1992). Impression management, social desirability, and computer administration of attitude questionnaires: Does the computer make a difference? Journal of Applied Psychology, 77, 562-566.

Buhrmester, M. D., Kwang, T., & Gosling, S. D. (2011). Amazon's Mechanical Turk: A new source of inexpensive, yet high-quality data? Perspectives on Psychological Science, 6, 3-5. doi:10.1177/1745691610393980

Chapman, D. S., & Webster, J. (2003). The use of technologies in the recruiting, screening, and selection processes for job candidates. International Journal of Selection and Assessment, 11, 113-120.

Cook, C., Heath, F., Thompson, R. L., & Thompson, B. (2001). Score reliability in Web-or-Internet-based surveys: Unnumbered graphic rating scales versus Likert-type scales. Educational and Psychological Measurement, 61, 697-706.

Donovan, M. A., Drasgow, F., & Probst, T. M. (2000). Does computerizing paper-and-pencil job attitude scales make a difference? New IRT analyses offer insight. Journal of Applied Psychology, 85, 305-313.

Dwight, S. A., & Fiegelson, M. E. (2000). A quantitative review of the effect of computerized testing on the measurement of social desirability. Educational and Psychological Measurement, 60, 340-360.

Epstein, J., Klinkenberg, W. D., Wiley, D., & McKinley, L. (2001). Insuring sample equivalence across Internet and paper-and-pencil assessments. Computers in Human Behavior, 17, 339-346.

Ferrando, P. J., & Lorenzo-Seva, U. (2005). IRT-related factor analytic procedures for testing the equivalence of paper-and-pencil and internet-administered questionnaires. Psychological Methods, 10, 193-220

Fouladi, R. T., McCarthy, C. J., & Moller, N. P. (2002). Paper-and-pencil or online? Evaluating mode effects on measures of emotional functioning and attachment. Assessment, 9, 204-215.

Gati, I., & Saka, N. (2001). Internet-based versus paper-and-pencil assessment: Measuring career decision-making difficulties. Journal of Career Assessment, 9, 379-416

Hacker, K. L., & Steiner, R. (2002). The digital divide for Hispanic Americans. Howard Journal of Communications, 13, 267-283.

Harris, M. M. (2001). Trends and issues in I-O Psychology: A glimpse into the crystal ball. The Industrial-Organizational Psychologist, 38, 70-73.

Harris, M. M. (2003). Speeding down the information highway: Internet recruitment and testing. The Industrial-Organizational Psychologist, 41, 103-106.

Horn, J. L., & McArdle, J. J. (1992). A practical and theoretical guide to measurement invariance in aging research. Experimental Aging Research, 18, 117-144.

Lautenschlager, G. J., & Flaherty, V. L. (1990). Computer administration of questions: More desirable or more social desirability? Journal of Applied Psychology, 75, 310-314.

Lievens, F., & Harris, M. M. (2003). Research on Internet recruitment and testing: Current status and future directions. In C. L. Cooper & I. T. Robertson (Eds.), International review of industrial and organizational psychology (Vol. 18). New York: Wiley.

Lievens, F., Van Dam, K., & Anderson, N. (2002). Recent trends and challenges in personnel selection. Personnel Review, 31, 580-601.

Loges,W. E., & Jung, J. Y. (2001). Exploring the digital divide: Internet connectedness and age. Communication Research, 28, 536-562.

Martin, C. L., & Nagao, D. H. (1989). Some effects of computerized interviewing on job applicant responses. Journal of Applied Psychology, 74, 72-80.

Meade, A. W., Michels, L. C., & Lautenschlager, G. J. (2007). Are internet and paper-and-pencil personality tests truly comparable? An experimental design measurement invariance study. Organizational Research Methods, 10, 322-245.

Pettit, F. A. (2002). A comparison of World-Wide Web and paper-and-pencil personality questionnaires. Behavior Research Methods, Instruments & Computers, 34, 50-54.

Ployhart, R. E., Weekley, J. A., Holtz, B. C., & Kemp, C. (2003). Web-based and paper-and-pencil testing of applicants in a proctored setting: Are personality, biodata, and situational judgment tests comparable? Personnel Psychology, 56, 733-752.

Pomplun, M., Frey, S., & Becker, D. F. (2002). The score equivalence of paper-and-pencil and computerized versions of a speeded test of reading comprehension. Educational and Psychological Measurement, 62, 337-354.

Potosky, D., & Bobko, P. (1997). Computer versus paper-and-pencil mode and response distortion in noncognitive selection tests. Journal of Applied Psychology, 82, 293-299.

Potosky, D., & Bobko, P. (2004). Selection testing via the Internet: Practical considerations and exploratory empirical findings. Personnel Psychology, 57, 1003-1034.

Richman, W. L., Kiesler, S., Weisband, S., & Drasgow, F. (1999). A meta-analytic study of social desirability distortion in computer-administered questionnaires, traditional questionnaires, and interviews. Journal of Applied Psychology, 84, 754-775.

Rosenfeld, P., Doherty, L. M., Vicino, S. M., Kantor, J., & Greaves, J. (1989). Attitude assessment in organizations: Testing three microcomputer-based survey systems. Journal of General Psychology, 116, 145-154.

Salgado, J. F., & Moscoso, S. (2003). Internet-based personality testing: Equivalence of measures and assessees' perceptions and reactions. International Journal of Selection and Assessment, 11, 194-205.

Sharf, J. (2000). As if "g-loaded" adverse impact isn't bad enough, Internet recruiters can expect to be accused of "e-loaded" impact. The Industrial-Organizational Psychologist, 38, 156.

Smith, D. B., Hanges, P. J., & Dickson, M. W. (2001). Personnel selection and the five-factor model: Reexamining the effects of applicant's frame of reference. Journal of Applied Psychology, 86, 304-315.

Stanton, J. M. (1998). An empirical assessment of data collection using the Internet. Personnel Psychology, 51, 709-726.

Vandenberg, R. J., & Lance, C. E. (2000). A review and synthesis of the measurement invariance literature: Suggestions, practices, and recommendations for organizational research. Organizational Research Methods, 3, 4-69.

Vispoel, W. P., Boo, J., & Bleiler, T. (2001). Computerized and paper-and-pencil versions of the Rosenberg self-esteem scale: A comparison of psychometric features and respondent preferences. Educational and Psychological Measurement, 61, 461-474.

Webster, J., & Compeau, D. (1996). Computer-assisted versus paper-and-pencil administration of questionnaires. Behavior Research Methods, Instruments & Computers, 28, 567-576.

Academic Scholar?
Join our team of writers.
Write a new opinion article,
a new Psyhclopedia article review
or update a current article.
Get recognition for it.