How Recording Works

Understand what mimic captures and how it transforms your actions into automation.

What Gets Captured

When you start a recording session, mimic observes multiple data streams simultaneously to build a complete picture of your workflow:

photo_camera Screenshots

Captured automatically on every mouse click, providing visual context for each action.

left_click Mouse Activity

Click positions, scroll events, and movement heatmaps show interaction patterns.

⌨keyboard_keys Keystrokes

mimic understands how you interact with your computer so that it can do the task right the first time.

Contextual Data

Beyond raw inputs, mimic also records rich contextual information:

Window Timeline: Tracks which applications you use and for how long.
Browser URLs: Captures visited pages in Chrome, Safari, Firefox, Arc, Edge, and Brave.
Clipboard Events: Notes when you copy and paste content (content preview only, not full text).
Audio Narration: Optional voice recording is transcribed with timestamps for step-by-step context.

Recording Intent

Before each session, you can specify your recording intent to help the AI better interpret your actions:

Purpose	Description
Workflow	Capture a repeatable procedure for SOP creation
Training	Create instructional material for team onboarding
Optimization	Analyze for AI or automation opportunities
Documentation	General reference and process documentation

Capture Styles

You can also choose how the recording should be interpreted:

Literal: Exact steps as performed—best for precise, repetitive tasks.
Generalized: Abstract into a flexible template—best for variable workflows.

lightbulb_2

Generalized capture is useful when specific values (like dates or names) change between runs, allowing the AI to parameterize those inputs.

Session Data Structure

Each recording session produces a structured JSON file containing all captured data, organized for AI processing. This includes timestamped screenshots, interaction logs, window changes, and optional audio transcripts with speaker detection.