What is it about?

Large language models are increasingly used to automate data science workflows: they can analyze datasets, write code, run experiments, debug errors, and prepare final reports. However, these agents often struggle when they need to use new or specialized machine learning libraries. Their knowledge may be outdated, incomplete, or missing details about a specific tool’s API. This paper presents AutoDS-Tools, a multi-agent framework that helps LLM-driven AutoML agents understand and use external data science libraries more reliably. Instead of relying only on the language model’s internal knowledge, the system automatically builds structured documentation from a library’s GitHub repository and makes it available to agents through graph-based retrieval. The framework includes several specialized agents: an Analyst studies the dataset and task, a Researcher investigates the selected library, a Manager creates the execution plan, a Coder implements the solution, a Debugger resolves errors, and a Presenter reviews the final artifacts. Together, these agents form an automated workflow for solving machine learning tasks with external tools.

Featured Image

Why is it important?

Machine learning libraries evolve quickly, and many useful tools are domain-specific or recently released. LLM agents cannot be expected to know every tool in advance. AutoDS-Tools addresses this problem by allowing agents to learn from structured API documentation at task time. This makes automated data science systems more flexible and easier to extend. Instead of manually adding support for every new library, users can provide a repository, let the system index its public API, and allow the agent to use that knowledge while solving a task.

Perspectives

This work points toward a more adaptable generation of AI agents for data science. Rather than depending only on the model’s built-in knowledge or requiring manual integration for every new library, agents can be connected to external tools through automatically generated, structured API knowledge. In the longer term, this approach could make automated data science systems easier to maintain and extend. As ML libraries continue to change, agents that can inspect documentation, reason over API relationships, and apply unfamiliar tools may become more useful for real-world machine learning workflows across different domains.

Igor Hromov

Read the Original

This page is a summary of: Towards Automated Integration of Novel ML tools Into LLM-driven AutoML Agents, International Foundation for Autonomous Agents and Multiagent Systems,
DOI: 10.65109/ppzn3366.
You can read the full text:

Read

Contributors

The following have contributed to this page