I built a mobile automation library for AI agents and assumed Appium already solved this.
Every agent that touches a phone hits the same walls. Permission dialogs. System popups. App-not-responding prompts. Screen-off timeouts. The stuff humans swipe past without thinking — agents crash on it.
The standard answer is Appium: Java server, Selenium WebDriver, 40+ dependencies, 5-minute boot times. You need a Mac, Xcode, and an afternoon just to get the first tap working. Your agent spends more time waiting for the server than doing anything useful.
And what does the agent actually see? Most tools hand it a screenshot or raw XML. Screenshots: same problem as the web — models misread buttons, hallucinate elements, miss text. Raw accessibility XML:
thousands of nodes, redundant containers, invisible wrappers, system decoration. Your agent burns tokens on noise.
baremobile does neither.
It talks directly to your phone over ADB. No Java server. No Selenium layer. No Appium middleware. Then instead of screenshots or raw XML, it reads the screen the way a screen reader does. Semantic structure only — buttons, text fields, lists, switches. Interactive elements get [ref=N] markers. The agent picks a ref and acts on it.
200+ Android widget classes get mapped to a handful of semantic roles. A 4-step pruning pipeline strips invisible nodes, redundant containers, and system noise. What's left is clean YAML the agent can actually reason about.
Four ways to use it:
- Library — import into your agent code, connect to a device, snap/tap/type/swipe by ref
- MCP server — plug into Claude Desktop, Cursor, or Claude Code
- CLI — shell scripts and agentic pipelines read output files directly
- On-device via Termux — runs on the phone itself, no host machine needed. Plus direct device APIs: SMS, calls, GPS, camera, clipboard
Works on Android and iOS. Works with bareagent. Works without it.
This is the third piece of the bare ecosystem:
- bareagent — gives agents a think→act loop. Replaces LangChain, CrewAI, AutoGen. 1,500 lines.
- barebrowse — gives agents a real browser. Replaces Playwright, Puppeteer. 2,400 lines.
- baremobile — gives agents a phone. Replaces Appium, Espresso, UIAutomator. 2,800 lines.
Three vanilla JS modules. Zero dependencies. Same API patterns. Each works standalone. Together, one agent can reason, browse the web, and control your phone — all through the same snapshot→act interface.
What you can build with the three of them: headless web automation, mobile QA without heavyweight frameworks, personal AI assistants that browse and act on your behalf, agentic workflows that span
web and mobile.
Most automation stacks ship 200MB of opinions before you write a line of code. These don't.
npm install baremobile
https://github.com/hamr0/ba...