Can Large Language Models generate large, quality software?

Code generation is one of the many envisioned applications of the Large Language Models (LLMs). So far, we know the performance of the code generated using LLMs for small code chunks that are mostly single methods [1]. What about the large code changes?

This thesis will make two contributions. It will generate a code for medium-size, preferably larger software projects using LLMs. It will also assess the quality of the code generated using LLMs.

The contributions of this thesis will be: How do you generate relatively large software projects using LLM? Assess the quality of the code generated using LLMs using SIG quality benchmark and their comparison

The success of this project hinges on the literature review and a deeper understanding of the state-of-the-art. Articles [1] and [2] could be a good starting point. Next, the focus should be on generating ‘relatively’ large software projects using LLM. Finally, the code quality will be assessed using the various code quality analysis tools available at SIG.

Available spots: 1

Pointers to literature

[1] Mark Chen et. al. “ Evaluating Large Language Models Trained on Code“, arXiv : 2107.03374v2. Available from https://arxiv.org/abs/2107.03374.

[2] HumanEval dataset: https://huggingface.co/datasets/openai_humaneval.

[3] Xueying Du et. al. “ClassEval: A Manually-Crafted Benchmark for Evaluating LLMs on Class-level Code Generation”, arXiv:2308.01861v2. Available from https://arxiv.org/abs/2308.01861.

Can Large Language Models generate large, quality software?

Pointers to literature

Supervisor(s)