RETHINKING ASSESSMENT IN SCIENCE EDUCATION IN THE AGE OF GENERATIVE AI

Emudiaga Gabriel

Authors

Gabriel Emudiaga Esakpaide Department of Science Education, Delta State University, Abraka, Nigeria Author

Keywords:

Generative AI, science education, assessment validity, academic integrity, AI literacy, Nigerian education

Abstract

Generative AI has disrupted educational assessment by threatening the inferential basis on which validity depends. When submitted work may represent either authentic reasoning or algorithmic output, assessment cannot support valid inferences about competence. Detection and prohibition approaches fail technically, conceptually, and behaviourally: students are driven by performance expectations, not policy. This paper proposes a four-principle framework for restoring assessment validity in AI-rich environments, grounded in validity theory (Kane, 2006, 2013; Messick, 1989) and contextualised within Nigerian science education. The framework integrates established traditions of authentic assessment, epistemic documentation, and equitable design, recontextualised under generative AI conditions in a resource-constrained setting where existing frameworks assume technological access that most Nigerian schools do not possess. The framework makes reasoning visible and evaluable through task design and proportional documentation, removing substitution incentives rather than attempting identification. Its validation rests on theoretical coherence and contextual appropriateness, pending empirical testing. Critically, the framework measures reasoning articulation (students' capacity to explain their thinking) as a proxy for reasoning execution (actual cognitive work), a shift that requires explicit acknowledgement. For Nigerian science education, it provides concrete operationalisation of assessment principles aligned with curriculum policy while directly addressing the AI validity crisis.

References

Abbas, M. Y. (2025). Distinguishing artificial intelligence-generated text from human text: Evaluating GPT-4o's self-detection capacity. Computers and Education, 45(2), 103-118.

Adesokan, A., Salman, O., & Raheem, O. (2025). Generative AI use and metacognitive awareness among Nigerian secondary students: A mixed-methods study. Journal of Educational Technology and Society, 28(1), 45-62.

Arifin, S., Chen, L., Thompson, M., & Williams, R. (2025). Inquiry-based learning and critical thinking development: A meta-analytic review. Review of Educational Research, 95(1), 1-38.

Ateeq, M., Al-Balushi, S., Al-Harthi, M., & Hassan, M. (2024). Educational impact on student achievement in higher education: A PLS-SEM approach. Education and Information Technologies, 29(3), 2847-2870.

Banta, T. W., & Palomba, C. A. (2014). Assessment essentials: Planning, implementing, and improving student learning (2nd ed.). Jossey-Bass.

Bennett, R. E. (2011). Formative assessment: A critical review. Assessment in Education: Principles, Policy & Practice, 18(1), 5-25.

Cotton, D. R., Cotton, P. A., & Shipway, J. R. (2023). Chatting with ChatGPT: How may I assist you? Journal of Academic Language & Learning, 17(1), 180-193.

Darling-Hammond, L., Ancess, J., & Falk, B. (2013). Authentic assessment in action: Studies of schools and students at work. Teachers College Press.

Erol, Y., Coban, C., & Yasar, S. (2025). Reliability and bias in AI-generated text detection systems. Computational Linguistics Review, 51(2), 234-256.

European Network for Academic Integrity. (2024). Guidance on institutional approaches to academic integrity in the age of artificial intelligence. ENAI Publications.

Eze, P. O., & Obi, T. U. (2019). Assessment in Nigerian secondary science education: Current practices and policy implications. African Journal of Educational Assessment, 7(2), 112-130.

Federal Ministry of Education. (2015). National curriculum for secondary education in Nigeria. FME Publications.

Garzon, J., Baldiris, S., & Fabregat, R. (2025). Generative AI and virtual reality: Emerging applications in educational assessment. Computers & Education, 216, 104897.

Gill, B. P., Zhang, M., & Chen, W. (2023). ChatGPT and academic integrity: An empirical investigation of detection evasion. International Review of Research in Open and Distributed Learning, 24(4), 289-310.

Hargreaves, A. (2005). Educational change takes ages: Life, career and generational factors in teachers' emotional responses to educational change. Teaching and Teacher Education, 21(8), 967-983.

Herman, J. L., & Lara-Steidel, A. P. (2025). Authentic assessment for deeper learning in science education. Science Education Review, 34(1), 45-68.

Holmes, K., Bailik, S., & Fadel, C. (2019). Artificial intelligence in education: Promises and implications. Center for Curriculum Redesign.

Holmes, W., Persson, J., & Chounta, I. A. (2022). Artificial intelligence and tutoring systems: Advances, opportunities, and challenges. Journal of Educational Computing Research, 60(5), 1104-1126.

Holmes, W., Tuomi, I., & Kamarainen, A. (2022). State-of-the-art and practice in AI-enhanced learning environments. European Commission, JRC Publications Repository. https://doi.org/10.2760/137379

Ibitoye, M., Okafor, C., Nnamdi, O., & Adebayo, J. (2025). Information communication technology integration in Nigerian secondary schools: A systematic review of capacity and constraints. Computers and Education, 217, 104923.

Kane, M. T. (2006). Validation. In R. L. Brennan (Ed.), Educational measurement (4th ed., pp. 17-64). Praeger.

Kane, M. T. (2013). Validating the interpretations and uses of test scores. Journal of Educational Measurement, 50(1), 1-73.

Kind, P. M., & Osborne, J. F. (2017). Styles of science reasoning: A student typology informed by epistemology. International Journal of Science Education, 39(6), 731-748.

Lincoln, Y. S., & Guba, E. G. (1985). Naturalistic inquiry. Sage Publications.

Luckin, R. (2023). Machine learning and human intelligence: The future of education in the age of AI. UCL Institute of Education Press.

Macgilchrist, F. (2021). Postdigital heterogeneity: Digital living in the midst of smart devices, algorithms and data. Postdigital Science and Education, 3(2), 441-461. https://doi.org/10.1007/s42438-020-00166-9

Messick, S. (1989). Validity. In R. L. Linn (Ed.), Educational measurement (3rd ed., pp. 13-103). Macmillan.

National Research Council. (2012). A framework for K-12 science education: Practices, crosscutting concepts, and core ideas. National Academies Press.

Nwali, C. U., & Udumukwu, O. E. (2025). Academic staff perspectives on AI authorship and institutional policy gaps in Nigerian universities. Higher Education Research & Development, 44(2), 234-252.

Okafor, C. E., Nwoke, B. I., & Eze, S. O. (2018). Inquiry-based science teaching and student achievement in Nigerian secondary schools. African Educational Research Journal, 6(3), 127-142.

Ojimba, D. P. (2023). National examination systems and science curriculum implementation in Nigeria: Alignment and challenges. Journal of Curriculum Studies in Nigeria, 15(1), 89-107.

Osborne, J. F., & Dillon, J. (2008). Science education in Europe: Critical reflections. King's College London, Educational Resource Information Center.

Pearce, J. M., & Chiavaroli, N. (2020). Practical considerations for scaling oral assessment. Assessment & Evaluation in Higher Education, 45(2), 156-174.

Pellegrino, J. W., DiBello, L. V., & Goldman, S. R. (2016). A framework for understanding and improving science assessment. Measurement: Interdisciplinary Research and Perspectives, 14(2), 35-56.

Pérez-Pérez, M., López-García, C., & Rodríguez-Martínez, E. (2026). Transparency requirements in AI governance: A PRISMA 2020 scoping review of disclosure mandates and compliance outcomes. Science, Technology & Human Values, 51(1), 112-142.

Peterson, J. (2025). The detection signature problem: How probabilistic generation defeats static AI detection systems. Artificial Intelligence and Society, 40(1), 189-207.

Salaudeen, Y., Pitan, O. S., & Aminu, M. A. (2025). Measurement and validity in assessment: Conceptual distinctions and implications for AI-assisted evaluation systems. International Journal of Assessment and Evaluation, 33(2), 78-96.

Scarfe, E., Watchirn, E., & Gibbs, G. (2024). Can we detect where text is written by language models? A comparative analysis of detection tools' performance. Journal of Academic Integrity, 7(2), 45-62.

Schon, D. A. (1983). The reflective practitioner: How professionals think in action. Basic Books.

Selwyn, N. (2019). Automation, education, and the surveillance society. Oxford University Press.

Selwyn, N. (2022). The digital education myth: Trajectories of technology in education. Polity Press.

Shi, Z., Chen, H., Wang, Y., & Liu, Q. (2025). Large language models and authorship attribution: Understanding conflation of plausibility and authenticity. Natural Language Processing Review, 28(3), 234-260.

Stribling, C. M., Clifton, J. W., & Maclean, R. (2024). Large language models in graduate-level science assessment: Performance, competency mapping, and implications for credentialing. Computers and Education, 210, 104816.

Supovitz, J. A. (2009). Can high stakes testing leverage educational improvement? Prospects from the last decade of testing and accountability reform. Journal of Educational Change, 10(2-3), 211-227.

Touretzky, D. S., Martin, C., & Seehorn, D. (2019). Envisioning AI for K-12: Reflections on the national artificial intelligence research resource task group recommendations. ArXiv Preprint. https://arxiv.org/abs/2010.08892

UNESCO. (2023). Global status report on artificial intelligence in education. UNESCO Publications.

Universal Design for Learning. (2023). UDL guidelines 2.2: Supporting all learners in all contexts. CAST.

University College Cork. (2025). Institutional guidance on academic integrity policies in an era of generative AI. UCC Publications.

Usman, I., Gada, A., & Musa, B. (2025). Tiered AI involvement frameworks and student learning outcomes: Evidence from Nigerian higher education institutions. Technology, Pedagogy and Education, 34(1), 56-74.

Wang, J., Xia, M., & Ye, Z. (2025). The reliability problem in AI-generated text detection: Performance variation across task types. ACM Computing Surveys, 58(3), 1-42.

West, S., Shebab, E., & Williamson, M. (2023). ChatGPT and laboratory reports: Evaluating authenticity and coherence in student submissions. Assessment & Evaluation in Higher Education, 48(7), 1058-1076.

Wiggins, G. P. (1998). Educative assessment: Designing assessments to improve student performance. Jossey-Bass.

World Bank. (2023). Digital development overview: Sub-Saharan Africa connectivity and infrastructure. World Bank Publications.

Ya'u, Y. Z., & Mohammed, H. I. (2025). Generative AI use patterns and academic performance among Nigerian university students. Computers and Education, 214, 104879.

Yakubu, A., David, O., & Abubakar, M. (2025). Generative AI use intentions among Nigerian university students: A structural equation modelling approach. Computers and Education, 215, 104902.

RETHINKING ASSESSMENT IN SCIENCE EDUCATION IN THE AGE OF GENERATIVE AI

Authors

Keywords:

Abstract

References

Downloads

Published

Issue

Section

License

How to Cite

Similar Articles

Latest publications

Information