Kakao, The First Korean IT Company to Develop and Publicly Release Open Source Benchmarking Dataset for Evaluating AI Language Models' Function Call Performance
- First dataset in Korea for evaluating “function call” performance that links language models with external tools
- Evaluation criteria items include accuracy of extracting function names and arguments, accuracy of delivering the results of function call, recognition of omission of information and generation of additional inquiries, detection of correlation with callable function, etc.
- Available as open source on GitHub to activate the AI ecosystem, with plans to continue to expand the scale of dataset, etc.
[September 27, 2024] Kakao will continue to endeavor to develop and activate the AI technology ecosystem.
Kakao (CEO Shina Chung) announced that it has developed and publicly released “FunctionChat-Bench”, a dataset that can evaluate the performance of AI language models' function call, as an open source on the 23rd of the last month.
Function call refers to the technology that links language models with external tools, such as APIs, so that the AI language model could instruct to perform an action which could not be performed on its own or receive in real time the information not learned in advance. This technology is essential to implement service based on language models, and could be extended to new functions by overcoming the limitations of language models. For example, if a language model is connected to specific APIs, such as a map, users can use the function call feature for calling real-time road information.
To sophisticate the function call technology, Kakao developed, for the first time as Korea’s IT company, the "FunctionChat-Bench" dataset that could evaluate the performance in a Korean conversation environment in a multi-faceted way. Most of the existing function call performance evaluation datasets are based on the English language built by global companies, and Kakao is the first that developed the Korean language-based dataset.
The dataset is composed of the items that evaluate the following: accuracy of extracting function names and arguments; accuracy of delivering the results of function call; generation of additional inquiries by recognizing omitted information; and detection of the correlation with callable function, etc. The datasets of other companies are built with focus on the generation of accurate function call messages of language models. On the contrary, Kakao’s dataset is different in that it evaluates even the capability that could generate a message of appropriate interaction with users, which is required before and after calling a function.
Kakao publicly released the dataset on the open source community GitHub to activate the Korean AI language model and create an open AI environment. Kakao will continue to improve its usability by expanding the scale of dataset and adding its English version, etc.
Performance Leader BH Kim at Kanana Alpha explained, “The development of the ‘FunctionChat-Bench’ dataset and the public release as an open source would contribute to the Korea’s AI technology ecosystem based on Korean language”, adding “Kakao will endeavor to improve the usability of dataset, based on the performance evaluation of its function call technology.” (End)
Reference) Public release of open source benchmarking dataset that evaluates the performance of AI language models' function call
https://github.com/kakao/FunctionChat-Bench
- Press Release Kakao and OpenAI announce Strategic Collaboration, a first in Korea #Kakao#OpenAI#Strategic Collaboration#kakao AI
- if(kakao) How does Kakao make daily life with AI? #AI#AINative#ifkakao