A Review of Large Language Model Evaluation Methods

doi:10.12060/j.issn.1000-7202.2025.02.01

Journal of Astronautic Metrology and Measurement ›› 2025, Vol. 45 ›› Issue (2): 1-30.doi: 10.12060/j.issn.1000-7202.2025.02.01

A Review of Large Language Model Evaluation Methods

SONG Jialei^1,2，ZUO Xingquan^1,2，ZHANG Xiujian^3，4，HUANG Hai^1,2

1.School of Computer Science,Beijing University of Posts and Telecommunications,Beijing 100876，China；
2.Key Laboratory of Trustworthy Distributed Computing and Services,Ministry of Education,Beijing 100876，China；
3.Beijing Aerospace Institute for Metrology and Measurement Technology,Beijing 100076，China；
4.Key Laboratory of Artificial Intelligence Measurement and Standards for State Market Regulation，Beijing 100076，China

Online:2025-04-15 Published:2025-04-29

Abstract

Abstract: With the rapid development of large language models, their broad application prospects have attracted significant attention from both the academic and industrial communities.Before a large language model is applied to practice,its performance and potential risks need to be comprehensively evaluated.In recent years,the evaluation methods of large language models have been studied from multiple perspectives by researchers.In this paper，the evaluation metrics,methods and benchmarks of large language models in terms of performance,robustness,and alignment,are reviewed systematically and the advantages and disadvantages of various evaluation metrics and methods are analyzed.Finally,the future research directions and challenges of large language model evaluation are discussed.

Key words: Large language models, Evaluation methods, Evaluation metrics, Evaluation benchmarks

CLC Number:

TP181，V19

SONG Jialei, ZUO Xingquan, ZHANG Xiujian, HUANG Hai . A Review of Large Language Model Evaluation Methods[J]. Journal of Astronautic Metrology and Measurement, 2025, 45(2): 1-30.

A Review of Large Language Model Evaluation Methods

PDF (PC)

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics

Comments