Surface Fluency and Internal Imperfection in Natural Language Explanations for CSAT “Language and Media” Word Items

Oh Jieun Park Inkyu

서울대학교 서울대학교

Korean Language Education Research Vol. 61 No. 1 pp.309-346 (2026)

DOI: 10.20880/kler.2026.61.1.309

Abstract

This study analyzes the characteristics of natural language explanations generated by large language models when solving word-related items in the Korean College Scholastic Ability Test (CSAT). Word-related items from the 2022 to 2026 CSAT were input into the Gemini 3 Pro model to generate natural-language explanations, and their features were examined. The analysis indicates that these explanations exhibit two key characteristics: surface fluency and internal imperfection. This study delineates three manifestations of surface fluency: outputs relying on question components, the use of salient cues in the question, and the use of pre-trained grammatical terminology. It also identifies three manifestations of internal imperfections: the generation of multiple solution pathways, the sequential propagation of errors during the output process, and limitations in pre-trained knowledge. Because surface fluency is visible whereas internal error risk is not, grammar education that fosters users’ critical awareness is urgently required.

Keywords

자연어 설명표면적 유창성내부적 불완전성대규모 언어 모델생성형 인공지능

References

구본관(2010), 「문법 능력과 문법 평가 문항 개발의 방향」, 『국어교육학연구』 37, 185 - 218.
권태현(2024), 「ChatGPT를 활용한 쓰기 채점 및 피드백 방안 –프롬프트 전략을 중심으로」, 『새국어교육』 141, 7 - 42.
김규훈(2023), 「2022학년도 이후 수능 문법 문항의 비평 연구 –지문형 및 통합형 문항을 중심 으로-」, 『우리말 글』 97, 71 - 99.
김민해·이유진·서나영·천하연·전대일(2025), 「LLM을 활용한 수능 국어 영역 문법 문제 생성 시스템 제안」, 『에듀테인먼트연구』 7(1), 311 - 327.
김승주(2022), 「딥러닝 자연어처리 기법을 활용한 논증적 글쓰기 자동 채점 방안 연구: 교사 채 점자와 기계 채점자의 협업적 채점 수행 모델을 기반으로」, 한국교원대학교 박사학위논문.
김은선(2025), 「교사와 인공지능 글쓰기 피드백에 대한 초등학생의 반응」, 『한국초등국어교육』 80, 5 - 35.
나상수(2025), 「텍스트 생성 과정에서의 문법 활용 능력 평가 모델 구현 연구」, 서울대학교 박 사학위논문.
남가영(2017), 「통합형 문법 평가문항의 양상과 설계 방향 –국가 수준 문법 평가문항을 중심 으로-」, 『우리말 글』 75, 161 - 200.
남길임·황은하·송현주·안의정(2024), 「생성형 AI의 문법적 능력에 대한 국어학적 연구 - 형 태 통사적 특성을 중심으로 -」, 『한말연구』 65(11), 1 - 21.
류수열·주세형·남가영(2021), 『국어과 교사 전문성 신장 노트 2 국어교육 평가론』, 서울: 사회 평론아카데미.
박서윤·강예지·강조은·김유진·이재원·정가연·최규리·김한샘(2024), 「GPT - 4를 활용한 인 간과 인공지능의 한국어 사용 양상 비교 연구」, 『국어국문학』 206, 5 - 47.
박종미(2025ㄱ), 「문법 학습 비계 설정자로서 ChatGPT 활용을 위한 과제와 방향」, 『국어교육 학연구』 60(1), 83 - 109.
박종미(2025ㄴ), 「ChatGPT의 국어 문법 개념 이해도 평가 - 의미 유사도 분석을 통한 교육적 활용 가능성 탐색 -」, 『청람어문교육』 103, 309 - 336.
유원준·안상준(2026), 『딥 러닝을 이용한 자연어 처리 입문』, 검색일자 2026년 1월 28일, https://wikidocs.net/book/2155.
이경숙(2025), 「수능 국어 영역에서 ChatGPT의 문법 능력에 대한 연구: 2022~2025년 ‘언어와 매체’를 중심으로」, 『언어와 정보 사회』 54, 157 - 189.
이관규(2008), 『학교 문법 교육론』, 서울: 고려대학교 민족문화연구소.
이관희·정희창(2010), 「국민의 문법 능력 평가 연구」, 『우리말 글』 67, 53 - 76.
이관희·최선희·김자영(2022), 「문법 문항에 반영된 예상 오개념 분석 –대학수학능력시험 및 6월·9월 모의평가를 대상으로」, 『국어교육』 179, 207 - 249.
이기돈(2025), 「수능 수학 영역에서 선다형 문항 선지의 측정평가적 기능 검토 및 단답형 확대 제안」, 『학교수학』 27(2), 393 - 412.
이도영·김잔디·민송기·서수현·안혁·장창중·한재덕(2021), 『평가 문항 출제의 정석 국어과 선다형 시험 평가 문항 어떻게 만들어지나?』, 고양: 한국교육방송공사.
이선웅(2012),『한국어 문법론의 개념어 연구』, 서울: 월인.
장세민(2025. 11. 19.), “제미나이 3로 2026 수능 풀어보니”...GPT - 5.1 누르고 압도적 1위, AI Times, 검색일자 2026년 1월 28일, https://www.aitimes.com/news/articleView.html?idxno=204103.
정민주·서수현·남민우·최숙기·이상일·남가영(2022), 「좋은 국어과 평가 문항 특성에 대 한 질적 분석 연구–국어과 평가 문항 양호도 분석틀 개발 연구(2)」, 『청람어문교육』 89, 43 - 78.
정한데로(2023), 「‘생성 AI 화자’의 단어 형성-<ChatGPT>, <Bard>, <CLOVA X>를 대상으로 -」, 『한말연구』 64(54), 1 - 26.
조진수(2026), 「인간 언어에 대한 인식 확장과 미래 문법 교육과정을 위한 핵심 질문의 재구 성」, 『국어교육연구』 90, 291 - 305.
주세형(2009), 「국가 수준 학업 성취도 평가에서의 소위 텍스트 중심 원리에 대한 비판
주지연(2020), 「문법지식의 불확정성과 문법 교육」, 『국어교육연구』 73, 153 - 184.
최인찬·권도형(2024), 「생성형 인공지능 ChatGPT의 국어능력은 어떠한가? - 2024학년도 대 학수학능력시험 국어영역 문항 풀이 결과의 오류 유형 분석을 중심으로」, 『리터러시 연 구』 15(2), 279 - 318.
Ahn, J. J. & Yin, W. (2025), “Prompt - reverse inconsistency: Llm self - inconsistency beyond generative randomness and prompt paraphrasing”, arXiv preprint arXiv:2504.01282.
Barez, F., Wu, T. Y., Arcuschin, I., Lan, M., Wang, V., Siegel, N., ... & Bengio, Y. (2025), “Chain - of - thought is not explainability”, Preprint, alphaXiv, v1. Brown, T., Mann, B., Ryder, N., Subbiah, M., Kaplan, J. D., Dhariwal, P., ... & Amodei, D. (2020), “Language models are few - shot learners” Advances in neural information processing systems 33, 1877 - 1901.
Cameron, R. W. (2025), “Demystifying reasoning models”, 검색일자 2026. 1. 31., https:// cameronrwolfe.substack.com/p/demystifying - reasoning - models.
Creswell, J. (2012), 『질적 연구방법론: 다섯 가지 접근』, 조흥식·정선욱·김진숙·권지성(역), 서울: 학지사, 2015.
Gemini API(2026. 1. 29.), Gemini 3 개발자 가이드, 검색일자 2026. 1. 31., https://ai.google.dev/gemini - api/docs/gemini - 3?hl=ko.
Google(2025. 11. 18.), “A new era of intelligence with Gemini 3”, Google Keyword, 검 색일자 2026. 1. 28., https://blog.google/products/gemini/gemini - 3/#note - from - ceo. hehee9(2025. 12. 17.), “2026 - CSAT”, Github, 검색일자 2026. 1. 21., https://github.com/ hehee9/2026 - CSAT. Olsson, C., Elhage, N., Nanda, N., Joseph, N., DasSarma, N., Henighan, T., ... &
Olah, C. (2022), “In - context learning and induction heads”, arXiv preprint arXiv:2209.11895.
OpenAI(2025. 12. 11.), “Open AI, Introducing GPT - 5.2”, 검색일자 2026. 1. 28., https:// openai.com/index/introducing - gpt - 5 -2. OpenAI Platform(n. d.), “Completions(Legacy)”, 검색일자 2026. 1. 31., https://platform. openai.com/docs/api - reference/completions.
Sammani, F., Mukherjee, T., & Deligiannis, N. (2022), “Nlx - gpt: A model for natural language explanations in vision and vision - language tasks”. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition.
Wang, L., Li, L., Dai, D., Chen, D., Zhou, H., Meng, F., ... & Sun, X. (2023), “Label words are anchors: An information flow perspective for understanding in - context learning”, arXiv preprint arXiv:2305.14160.
Wei, J., Wang, X., Schuurmans, D., Bosma, M., Xia, F., Chi, E., ... & Zhou, D. (2022), “Chain - of - thought prompting elicits reasoning in large language models”, Advances in neural information processing systems 35, 24824 - 24837.

Article Info

Abstract

Keywords

References