In the rapidly evolving world of AI, one constant challenge is the ability to leverage high-powered AI tools without getting tangled in their intricate web of codes and implementations. To address this issue, the developers of Transformers have introduced an experimental API, the Transformers Agent. It's like having a friendly assistant that can interpret your requests in natural language and use a set of curated tools to carry them out.
Meet the Transformers Agent
Transformers Agent is an experimental API, currently at version v4.29.0, that builds on the concept of tools and agents. The agent in this context is a large language model (LLM) capable of interpreting natural language and using predefined tools to perform the requested tasks.
Although the API is subject to change and results can vary, its design allows easy extension. As the system evolves, it can incorporate any tool developed by the community, thus enhancing its capabilities.
The API is particularly powerful for multimodal tasks. You can play with it in a colab to generate images or read text out loud using simple commands like agent.run("Caption the following image", image=image) or agent.run("Read the following text out loud", text=text).
Instantiating an Agent
To use the Transformers Agent, you first need to instantiate an agent, either from OpenAI or opensource alternatives from BigCode and OpenAssistant. You can do this with a few lines of code, as shown:
javaCopy code
from transformers import OpenAiAgentagent = OpenAiAgent(model="text-davinci-003", api_key="<your_api_key>")
If you prefer BigCode or OpenAssistant, you can log in to access the Inference API and instantiate the agent as shown:
pythonCopy code
from huggingface_hub import login
from transformers import HfAgent
login("<YOUR_TOKEN>")
# For Starcoder
agent = HfAgent("https://api-inference.huggingface.co/models/bigcode/starcoder")
Your New Personal Assistant
The Transformers Agent provides two primary APIs for executing tasks: run() and chat(). The run() method is for single execution of tasks. It selects the appropriate tool for your task and runs them accordingly. The chat() method, on the other hand, is designed for a chat-based approach, which is useful when you want to maintain state across instructions.
For instance, you can generate an image and transform it using the following code:
cssCopy code
picture = agent.run("Generate a picture of rivers and lakes.")
updated_picture = agent.run("Transform the image in `picture` to add an island to it.", picture=picture)
Remote Execution and Security
The Transformers Agent also supports remote execution using remote executors for several of the default tools the agent has access to. This feature allows efficient execution of tasks without needing significant RAM or GPU. You can do this by setting remote=True in the run() or chat() method.
However, as exciting as it may sound, the code execution feature might raise some security concerns. But don't worry, the only functions that can be called are the tools you provided and the print function. Plus, no attribute lookup or imports are allowed, which limits potential attack vectors. You can also choose to return the code to execute and decide whether to do it or not by setting return_code=True in the run() method.
Tools and Custom Tools
The power of the Transformers Agent lies in the tools it uses. Tools are simple functions with a name and a description. They perform specific tasks, and the agent uses their descriptions to generate the relevant code.
In summary, Transformers Agent is a powerful and flexible API that provides a natural language interface to a wide range of tools. It can handle simple tasks like generating captions for images, as well as more complex tasks like downloading a webpage, summarizing its content, and reading the summary out loud.
Furthermore, Transformers Agent is extensible by design. It comes with a curated set of tools, but you can easily extend it to use any tool developed by the community. This makes it a versatile tool that can adapt to a wide range of tasks and needs.
Whether you're looking for a simple way to generate code samples, a convenient interface for complex tasks, or a platform to share and use custom tools, Transformers Agent has something to offer. With its combination of simplicity, power, and flexibility, it's a valuable tool for anyone working with Transformers.
It's exciting to see how this technology is evolving and what the community will build with it. The possibilities are vast, and we're just scratching the surface of what can be achieved with Transformers Agent. We can't wait to see where this journey takes us next.
Comments