GitHubMIT

Build AI agents with desktop context

Open source API for AI agents that can see your screen. Access screen recordings, screen text, and audio transcriptions through simple HTTP endpoints on localhost:3030.

api.ts
async fetchUser(id) {
const res = await fetch(url);
return res.json(); // TypeError
}
“I've fixed this before... where was it?”

Quick answer

Build AI agents with full desktop context. screenpipe provides a developer API for screen capture, text extraction, audio transcription, and AI-powered search. Open source.

See it in action

Building desktop AI is hard

You want to build AI features, not screen recording infrastructure.

01

Screen recording across platforms means dealing with different OS APIs and permissions

02

Text extraction needs to handle multiple languages and formats

03

Audio transcription requires speech-to-text pipelines

04

Storing and searching through recordings needs efficient indexing

05

All this infrastructure work before you can build actual features

Desktop context as an API

screenpipe handles the hard parts - cross-platform capture, text extraction, transcription, search. You just call the REST API on localhost:3030.

Search API

Full-text search across screen text, audio transcriptions, and UI elements. Filter by app, window, time range.

Frames API

Get recent screen frames with extracted text. Useful for real-time screen understanding.

Audio API

Access audio transcriptions with timestamps and speaker detection.

Health API

Check recording status - whether screen, audio, and UI capture are working.

How it works

1

Install and run screenpipe

Download from screenpi.pe. The API starts automatically on port 3030.

# macOS/Linux
curl -fsSL https://screenpi.pe/install.sh | sh
screenpipe

# Or download the app from screenpi.pe
2

Search your screen history

Query the search endpoint with filters for content, time, and apps.

curl "http://localhost:3030/search?q=error&content_type=ocr&limit=10"
3

Build your features

Use any language that can make HTTP requests. Here's a Python example:

import requests

# Search for content
response = requests.get(
    "http://localhost:3030/search",
    params={
        "q": "meeting notes",
        "content_type": "ocr",
        "start_time": "2024-01-01T00:00:00Z",
        "limit": 20
    }
)
results = response.json()

for item in results["data"]:
    print(item["content"]["text"])

Code examples

Search API

Search across all captured content with filters

# Basic search
curl "http://localhost:3030/search?q=react+hooks&limit=10"

# Filter by content type (ocr, audio, ui)
curl "http://localhost:3030/search?q=meeting&content_type=audio"

# Filter by app name
curl "http://localhost:3030/search?q=error&app_name=Terminal"

# Filter by time range
curl "http://localhost:3030/search?q=bug&start_time=2024-01-01T00:00:00Z&end_time=2024-01-02T00:00:00Z"

Health check

Verify screenpipe is recording correctly

curl "http://localhost:3030/health"

# Response:
# {
#   "status": "healthy",
#   "frame_status": "ok",
#   "audio_status": "ok",
#   "ui_status": "ok",
#   "last_frame_timestamp": "2024-01-15T10:30:00Z",
#   "last_audio_timestamp": "2024-01-15T10:30:00Z"
# }

TypeScript/JavaScript

Query the API with fetch

// Search for content from the last 24 hours
const response = await fetch(
  "http://localhost:3030/search?" + new URLSearchParams({
    q: "api documentation",
    content_type: "ocr",
    limit: "10",
    start_time: new Date(Date.now() - 24 * 60 * 60 * 1000).toISOString()
  })
);

const results = await response.json();

// Process results
results.data.forEach(item => {
  console.log(item.content.text);
  console.log(item.content.app_name);
  console.log(item.content.timestamp);
});

Key benefits

Skip months of infrastructure work
Cross-platform (macOS, Windows, Linux)
MIT licensed - use commercially
Active community on Discord
All data stays local on your machine

Frequently asked questions

Any language that can make HTTP requests works with screenpipe. The API is REST-based on localhost:3030. We have examples in Python, TypeScript, Rust, Go, and more. The MCP integration also works with any MCP-compatible client.

Approximately 20GB per month. Storage is compressed and you can configure retention periods.

Yes. screenpipe is MIT licensed, which allows commercial use, modification, and distribution. You can build paid products on top of screenpipe without licensing fees.

The API provides access to recent frames and can be polled frequently for near-real-time access. WebSocket streaming is on the roadmap. For now, polling every 1-2 seconds works well for most real-time use cases.

Start building

From zero to desktop AI agent in minutes. Open source.