Desktop Control Skill
This skill provides comprehensive desktop automation capabilities through PyAutoGUI, allowing AI agents to control the mouse, keyboard, take screenshots, and interact with the desktop environment.
How to Use This Skill
As an AI agent, you can invoke desktop automation commands using the uvx desktop-agent CLI.
Command Structure
All commands follow this pattern:
uvx desktop-agent <category> <command> [arguments] [options]
Categories:
mouse - Mouse control
keyboard - Keyboard input
screen - Screenshots and screen analysis
message - User dialogs
app - Application control (open, focus, list windows)
Available Commands
π±οΈ Mouse Control (mouse)
Control cursor movement and clicks.
uvx desktop-agent mouse move <x> <y> [--duration SECONDS]
uvx desktop-agent mouse click [x] [y] [--button left|right|middle] [--clicks N]
uvx desktop-agent mouse double-click [x] [y]
uvx desktop-agent mouse right-click [x] [y]
uvx desktop-agent mouse middle-click [x] [y]
uvx desktop-agent mouse drag <x> <y> [--duration SECONDS] [--button BUTTON]
uvx desktop-agent mouse scroll <clicks> [x] [y]
uvx desktop-agent mouse position
Examples:
uvx desktop-agent mouse move 960 540 --duration 0.5
uvx desktop-agent mouse right-click 500 300
uvx desktop-agent mouse scroll -5
β¨οΈ Keyboard Control (keyboard)
Type text and execute keyboard shortcuts.
uvx desktop-agent keyboard write "<text>" [--interval SECONDS]
uvx desktop-agent keyboard press <key> [--presses N] [--interval SECONDS]
uvx desktop-agent keyboard hotkey "<key1>,<key2>,..."
uvx desktop-agent keyboard keydown <key>
uvx desktop-agent keyboard keyup <key>
Examples:
uvx desktop-agent keyboard write "Hello World" --interval 0.05
uvx desktop-agent keyboard hotkey "ctrl,c"
uvx desktop-agent keyboard hotkey "ctrl,shift,esc"
uvx desktop-agent keyboard press enter --presses 3
Common Key Names:
- Modifiers:
ctrl, shift, alt, win
- Special:
enter, tab, esc, space, backspace, delete
- Function:
f1 through f12
- Arrows:
up, down, left, right
πΌοΈ Screen & Screenshots (screen)
Capture screenshots and analyze screen content. Supports targeting specific windows.
uvx desktop-agent screen screenshot <filename> [--region "x,y,width,height"] [--window <title>] [--active]
uvx desktop-agent screen locate <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-center <image_path> [--confidence 0.0-1.0] [--window <title>] [--active]
uvx desktop-agent screen locate-text-coordinates <text> [--window <title>] [--active]
uvx desktop-agent screen read-all-text [--window <title>] [--active]
uvx desktop-agent screen pixel <x> <y>
uvx desktop-agent screen size
uvx desktop-agent screen on-screen <x> <y>
Examples:
uvx desktop-agent screen screenshot active.png --active
uvx desktop-agent screen screenshot chrome.png --window "Google Chrome"
uvx desktop-agent screen locate-center button.png --window "Notepad"
π¬ Message Dialogs (message)
Display user interaction dialogs.
uvx desktop-agent message alert "<text>" [--title TITLE] [--button BUTTON]
uvx desktop-agent message confirm "<text>" [--title TITLE] [--buttons "OK,Cancel"]
uvx desktop-agent message prompt "<text>" [--title TITLE] [--default TEXT]
uvx desktop-agent message password "<text>" [--title TITLE] [--mask CHAR]
Examples:
uvx desktop-agent message alert "Task completed!"
uvx desktop-agent message confirm "Continue with operation?"
uvx desktop-agent message prompt "Enter your name:"
π± Application Control (app)
Control applications across Windows, macOS, and Linux.
uvx desktop-agent app open <name> [--arg ARGS...]
uvx desktop-agent app focus <name>
uvx desktop-agent app list
Examples:
uvx desktop-agent app open notepad
uvx desktop-agent app open "chrome" --arg "https://google.com"
uvx desktop-agent app open "Safari"
uvx desktop-agent app focus "Untitled - Notepad"
uvx desktop-agent app list
Common Automation Workflows
Workflow 1: Open Application and Type
uvx desktop-agent app open notepad
uvx desktop-agent app focus notepad
uvx desktop-agent keyboard write "Hello from Desktop Skill!"
Workflow 2: Screenshot + Analysis
uvx desktop-agent screen size
uvx desktop-agent screen screenshot current_screen.png
uvx desktop-agent screen locate save_button.png
Workflow 3: Form Filling
uvx desktop-agent mouse click 300 200
uvx desktop-agent keyboard write "John Doe"
uvx desktop-agent keyboard press tab
uvx desktop-agent keyboard write "[email protected]"
uvx desktop-agent keyboard press enter
Workflow 4: Copy/Paste Operations
uvx desktop-agent keyboard hotkey "ctrl,a"
uvx desktop-agent keyboard hotkey "ctrl,c"
uvx desktop-agent mouse click 500 600
uvx desktop-agent keyboard hotkey "ctrl,v"
Safety Considerations
When using this skill, AI agents should:
- Verify coordinates: Use
screen size and on-screen before clicking
- Add delays: Insert appropriate delays between commands for UI responsiveness
- Validate images: Ensure image files exist before using
locate commands
- Handle failures: Commands may fail if windows change or elements move
- User safety: Always confirm destructive actions with user via
message confirm
Troubleshooting
PyAutoGUI Fail-Safe
PyAutoGUI has a fail-safe: moving mouse to screen corner aborts operations. This is a safety feature.
Image not found
When using screen locate, ensure:
- Image file exists and path is correct
- Adjust
--confidence