宇航计测技术 ›› 2024, Vol. 44 ›› Issue (6): 1-13.doi: 10.12060/j.issn.1000-7202.2024.06.01

• 人工智能计量测试专栏 •    下一篇

面向实体搜索的大语言模型测试评估技术

游新冬,张旭,吕学强,董志安,马登豪   

  1. 北京信息科技大学 网络文化与数字传播北京市重点实验室,北京 100192
  • 出版日期:2024-12-15 发布日期:2025-01-21
  • 作者简介:游新冬(1979-)女,教授,博士,主要研究方向:人工智能、大数据及知识图谱。
  • 基金资助:
    国家自然科学基金项目(621710431);北京市自然科学基金项目(4232025);青海省创新平台建设专项(2022-ZJ-T02);北京市教委科研计划科技一般项目(KM202311232003,KM202311232002)

Evaluation for Large Language Models in Entity Search

YOU Xindong,ZHANG Xu,LYU Xueqiang,DONG Zhian,MA Denghao   

  1. Beijing Key Laboratory of Internet Culture & Digital Dissemination Research,
    Beijing Information Science & Technology University,Beijing 100192,China
  • Online:2024-12-15 Published:2025-01-21

摘要: 实体搜索旨在从大量文档中准确找到与用户查询相关的实体,是信息检索中一个重要任务。实体搜索任务在提升用户体验、跨领域应用、大数据分析和智能服务中发挥着关键作用。随着大语言模型(LLM)的发展,其在多个领域中展现了卓越的性能。LLM的强大语义理解和生成能力能有效提升实体搜索的准确度,但目前针对实体搜索任务的LLM效果评测尚未充分展开。因此,提出了一种面向实体搜索任务的LLM评测框架,通过构建并公开发布跨领域中文实体搜索测试集,不仅能够完善该评测体系,还能为进一步优化和应用这些模型提供有价值的参考。此体系在九个开源LLM上进行了测试,展示了这些LLM在实体搜索中的实际效果。通过对比试验,从不同角度评估并分析了LLM的性能,为其在实体搜索领域的应用提供实证依据,并为未来的研究提供新思路。

关键词: 实体搜索, 大语言模型, 测评方法

Abstract: Entity search,which is a critical task in information retrieval,aims to accurately identify target entities to a user query from a vast collection of documents.It plays a key role in enhancing user experience,enabling cross-domain applications,facilitating big data analysis,and supporting intelligent services.With the development of large language models (LLMs),they have demonstrated outstanding performance across various fields.The powerful capabilities of semantic understanding and generation of LLMs can significantly improve the accuracy of entity search.However,the evaluation of LLMs specifically for entity search tasks has not yet been fully explored.Therefore,an evaluation framework for LLMs tailored to entity search tasks are proposed,which can not only improve the evaluation framework but also provide valuable insights for further optimization and application of these models.By constructing and publicly releasing a cross-domain Chinese entity search test set,nine open-source LLMs are tested and their practical performance in entity search is demonstrated.Through comparative experiments,the performance of LLMs are evaluated and analyzed from multiple perspectives,providing empirical evidence for their application in the entity search domain and offering new insights for future research.

Key words: Entity search, Large language model, Evaluation

中图分类号: