Correctness is one of the biggest challenges when using large language models for customer support. In our previous AFAS Software case study [1], we defined correctness, demonstrated how it can be achieved with limited data to generate a near real-time response, and developed a domain-agnostic solution inspired by human decision-making. The solution is projected to save approximately 15,000 hours per year.
Moving forward, we need to identify why an answer is correct or incorrect.
The first question is a master’s thesis to be conducted in collaboration with Dr. Michiel Overeem, Manager Product Development at AFAS Software.
Skills learned: problem solving, scientific thinking, data science, machine learning, and scientific writing.
Available spots: 1
[1] Is Our Chatbot Telling Lies? Assessing Correctness of an LLM-based Dutch Support Chatbot. Journal of Systems and Software. Preprint: https://arxiv.org/abs/2411.00034
[2] https://blog.vllm.ai/2025/12/14/halugate.html
[3] https://www.uber.com/en-NL/blog/ureview/
Should you be interested in the project, consider sending an email describing your motivation (reading [1] will help), skills you bring, and skills you intend to learn. Further, let me know of any logistical considerations I should take into account.