Journal of Astronautic Metrology and Measurement ›› 2025, Vol. 45 ›› Issue (2): 1-30.doi: 10.12060/j.issn.1000-7202.2025.02.01

    Next Articles

A Review of Large Language Model Evaluation Methods

SONG Jialei1,2,ZUO Xingquan1,2,ZHANG Xiujian3,4,HUANG Hai1,2   

  1. 1.School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876,China;
    2.Key Laboratory of Trustworthy Distributed Computing and Services,Ministry of Education,Beijing 100876,China;
    3.Beijing Aerospace Institute for Metrology and Measurement Technology,Beijing 100076,China;
    4.Key Laboratory of Artificial Intelligence Measurement and Standards for State Market Regulation,Beijing 100076,China
  • Online:2025-04-15 Published:2025-04-29

Abstract: With the rapid development of large language models, their broad application prospects have attracted significant attention from both the academic and industrial communities.Before a large language model is applied to practice,its performance and potential risks need to be comprehensively evaluated.In recent years,the evaluation methods of large language models have been studied from multiple perspectives by researchers.In this paper,the evaluation metrics,methods and benchmarks of large language models in terms of performance,robustness,and alignment,are reviewed systematically and the advantages and disadvantages of various evaluation metrics and methods are analyzed.Finally,the future research directions and challenges of large language model evaluation are discussed.

Key words: Large language models, Evaluation methods, Evaluation metrics, Evaluation benchmarks

CLC Number: