What is it about?

Large Language Models (LLMs) have gained recognition as valuable assets across virtually all industries, yet they rely heavily on manually crafted input prompts. In real-world applications, the dependence on specialized staff, skilled prompt engineering, and domain-specific knowledge often leads to suboptimal performance and increased costs. This paper investigates the use of Genetic Algorithms (GAs) to autonomously generate, evolve, and judge LLM prompts. Specifically, we customized a standard GA implementation to handle textual individuals, which are manipulated by LLM-guided genetic operators that iteratively create and enhance candidate prompts. Additionally, LLMs are employed to assess the correctness of outputs, forming the basis of our GA's fitness function. Our findings suggest that GA-driven prompt engineering can consistently produce solutions that are superior in both accuracy and efficiency compared to those acquired by prompt engineers who possess technical skills but lack domain-specific knowledge and a full understanding of vendor-specific prompting idiosyncrasies. This conclusion is supported by experimental results obtained from four public datasets and three modern LLMs developed by OpenAI, Meta, and Mistral AI. Ultimately, this study highlights the viability and potential of fully automated optimization tools in minimizing human effort in writing high-performance prompts.

Featured Image

Why is it important?

Large Language Models (LLMs) have significantly influenced the field of Artificial Intelligence (AI). By capitalizing on transformer-based neural networks, extensive datasets, and reinforcement learning approaches, these models exhibit a strong capacity for understanding and generating human-like text. Nevertheless, their accuracy still relies heavily on the clarity and specificity of user-provided instructions, a process known as prompt engineering. This reliance poses considerable challenges for teams operating with limited resources or handling domain-specific content. Consequently, automating prompt engineering has the potential to substantially enhance LLM performance across a wide range of tasks and industries by reducing dependence on specialized input. The primary goal of this work is to evaluate the feasibility of employing Genetic Algorithms (GAs) to automate prompt engineering in LLMs. By treating prompts as individuals in a GA, one can systematically explore numerous configurations that may surpass manual methods. Specifically, we introduce modifications to standard GA operators to handle textual individuals, data samples, and meta prompts (i.e., generic LLM instructions), guiding the GA in iteratively generating and refining prompts and thereby promoting their optimization. To test the viability of this approach, we conduct experiments to address the following questions: (i) Can GAs reliably discover high-performing prompts without human intervention or domain-specific expertise? (ii) How do GA-derived prompts compare to those developed by human experts? Our empirical findings are encouraging and suggest that GA-optimized prompts are both quantitatively and qualitatively superior to those obtained through manual crafting.

Perspectives

The transition to fully automated prompt engineering represents a significant advancement in LLMs’ practical utilization and general adoption. This study underscores how GA-optimized prompts can accelerate the deployment of LLMs by specialized industries, enabling these models to address domain-specific problems with little to no human effort. Our customized GA approach successfully addresses the limitations of manual prompt engineering, as well as the constraints of existing automated and semi-automated methods. Our findings can be organized into three main contributions: first, they demonstrate that GA-guided LLMs systematically take on tasks commonly managed by human specialists; second, they confirm that certain LLMs can reliably judge their own outputs and, thus, guide optimization through the fitness function; and lastly, they provide clear evidence that GA-guided LLMs can surpass manual prompt engineering in carrying out domain-specific tasks, consistent with insights reported in related research. Ultimately, we anticipate that as LLMs continue to evolve, methodologies similar to the one presented here will gain even greater attention, propelling advancements in specialized applications and reshaping the ways in which humans interact with these models.

Leandro Loss
AML RightSource

Read the Original

This page is a summary of: An LLM-Based Genetic Algorithm for Prompt Engineering, July 2025, ACM (Association for Computing Machinery),
DOI: 10.1145/3712255.3726633.
You can read the full text:

Read

Contributors

The following have contributed to this page