Alderson, C., Clapham, C., & Wall, D. (1995). Language test construction and evaluation. New York: Cambridge University Press.
Anastasi, A. (1986). Evolving concepts of test validation. Annual Reviews of Psychology, 37, 1-15.
Angoff, W. H. (1988). Validity: An evolving concept. In H. Wainer & Braun, H. (Eds.) Test validity (pp. 19-32). Hillsdale, NJ: Erbaum.
Bachman, L. F. (1990). Fundamental considerations in language testing. Oxford: Oxford University Press.
Brown, H. D. (2004). Language assessment: Principles and classroom practices. London: Longman.
Brown, J. D. (2005). Testing in language programs: A comprehensive guide to English language assessment. New York: McGraw-Hill.
Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd Ed.). Hillsdale, NJ: Lawerence Erlbaum Associates, Inc.
Dorans, N. J., & Holland, P. W. (1993). DIF detection and description: Mantel-Haenszel and standardization. In P. W. Holland & H. Wainer (Eds), Differential item functioning (pp. 137-166). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
French, A. A., & Miller, T. R. (1996). Logistic regression and its use in detecting differential item functioning in polytomous items. Journal of Educational Measurement, 33 (3), 315-332.
Geranpayeh, A., & Kunnan, A. J. (2007). Differential item functioning in terms of age in the certificate in advanced English examination.
Language Assessment Quarterly, 4 (2), 190-222.
Hatch, E., & Farhady, H. (1982). Research design and statistics for applied linguistics. Rowley, Massachusetts: Newbury House.
Hatch, E., & Lazaraton, A. (1997). The research manual: Design and statistics for applied linguistics. Boston, MA: Heinle &Heinle Publishers.
Jodin, M. G., & Gierl, M. J. (1999). Evaluating type I error and power using an effect size measure with the logistic regression procedure for DIF detection. Applied Measurement in Education, 14, 329-349.
Kim, M. (2001). Detecting DIF across the different language groups in a speaking test. Language Testing, 18, 89-114.
Lai, J. S., Teresi, J., & Gershon, R. (2005). Procedures for the analysis of differential item functioning (DIF) for small sample sizes. Evaluation & the Health Professions, 28 (3), 283-294.
McNamara, T., & Roever, C. (2006). Language testing: The social dimension. New York: Blackwell publishing.
Monahan, P. O., McHorney, C. A., Stump, T. E., & Perkins, A. J. (2007).
Odds ratio, delta, ETS classification, and standardization measures of DIF magnitude for binary logistic regression. Journal of Educational and Behavioral Statistics, 32 (1), 92-109.
Mousavi, S. A. (2009). An encyclopedic dictionary of language testing. Tehran: Rahnamma Press.
Noortgate, W. V. D., & Boeck, P. D. (2005). Assessing and examining differential item functioning using logistic mixed models. Journal of Educational and Behavioral Statistics, 30 (40), 443-464.
O'Neill, K. A., & McPeek, W. M. (1993). Item and test characteristics that are associated with differential item functioning. In P. W. Holland & H. Wainer (Eds), Differential item functioning (pp. 255-267). Hillsdale, NJ: Lawrence Erlbaum Associates, Inc.
Pae, T. (2004). Gender effect on reading comprehension with Korean EFL learners. System, 32, 265-281.
Park, T. (2006). Detecting DIF across different language and gender groups in the MELAB essay test using the logistic regression method. Spaan Fellow Working Papers in Second or Foreign Language Assessment, 4, 81-96.
Perrone, M. (2006). Differential item functioning and item bias: Critical considerations in test fairness. Columbia University Working Papers in TESOL & Applied Linguistics, 6 (2), 1-3.
Rezaee, A., & Salehi, M. (2008). The construct validity of a language proficiency test: A multitrait multimethod approach. TELL, 2 (8), 93-110.
Roever, C. (2001). Web-based language testing. Language and Learning and Technology, 5 (2), 84-94.
Roever, C. (2005). “That’s not fair!” Fairness, bias, and differential item functioning in language testing. Retrieved November 18, 2006, from the University of Hawai’i System Web site: http://www2.hawaii.edu/~roever
Salehi, M., & Rezaee, A. (2009). On the factor structure of the grammar section of university of Tehran English Proficiency Test (the UTEPT). Indian Journal of Applied Linguistics, 35 (2), 169-187.
Scherman, C. A., & Goldstein, H. W. (2008). Examining the relationship between race-based Differential Item Functioning and Item Difficulty. Educational and Psychological Measurement, 68, 537-553.
Shoahmy, E. (2000). Fairness in language testing. In A. J. Kunnan (Ed.), Fairness and validation in language assessment: Selected papers from the 19th Language Testing Research Colloquium, Orlando, Florida (pp. 15-19). Cambridge, UK: Cambridge University Press.
Swaminathan, H., & Rogers, H. J. (1990). Detecting differential item functioning using logistic regression procedures. Journal of Educational Measurement, 27 (4), 361-370.
Swanson, D. B., Clauser, B. E., Case, S. M., Nungester, R. J., Featherman, C. (2002). Analysis of Differential Item Functioning (DIF) Using Hierarchical Logistic Regression Models. Journal of Educational and Behavioral Statistics, 27 (1), 53- 75.
Tae, P. (2004). Gender effect on reading comprehension with Korean EFL learners. System, 32, 265-281.
Takala, S., & Kaftandjieva. (2000). Teat fairness: A DIF analysis of an L2 vocabulary test. Language Testing, 17, 323-340.
Teresi, J. (2004). Differential item functioning and health assessment. Columbia University Stroud Center and faculty of Medicine. New York State Psychiatric Institute, Research Division, Hebrew Home for the Aged at Riverdale. 1-24.
Zumbo, B. D. (1999). A Handbook on the theory and methods of Differential Item Functioning (DIF): Logistic regression modeling as a unitary framework for binary and likert-type (Ordinal) item scores. Ottawa, ON: Directorate of Human Resources Research and Evaluation, Department of National Defense.