Accessibility Technology

Control Your Phone With
Eyes & Gestures

A production-ready system for touchless interaction using hybrid gaze estimation, 6-DOF head pose, temporal stabilization, micro-saccade filtering, dynamic calibration & AI intent prediction — 100% on-device, no cloud, no neural implants.

<80msLatency

60 FPSCamera

9-PointCalibration

100%On-Device

Real-time Gaze Tracking

System Layers

Vision Input Layer

MediaPipe Face Mesh + Hands captures eye position, head pose, and hand landmarks at 60/120 FPS with multi-point iris + eyelid contours.

Pupil + iris detection
Head pose estimation
21-point hand landmarks

Gaze Mapping Engine

Hybrid gaze model fuses binocular iris offset, head pose vector, and pupil boundary with temporal filtering to eliminate jitter.

Adaptive Kalman + EMA + window
Micro-saccade filter (12px/200ms)
Confidence-weighted fusion

UI Target Detection

Registers interactive components as bounding boxes. Detects gaze intersection with 300ms dwell time.

Bounding box registry
Dwell-time focus
Visual glow feedback

Gesture Engine

MediaPipe Hands landmarks power pinch detection, air tap recognition, and open palm cancel.

Pinch → Select
Air tap → Click
Open palm → Cancel

Calibration System

9-point calibration + continuous dynamic micro-calibration from confirmed interactions keeps gaze drift corrected over time.

9-point polynomial regression
Dynamic bias drift correction
Confidence-gated updates

Privacy & Safety

All video processing is 100% on-device. No camera data ever leaves the browser or device.

Zero cloud processing
No data transmission
Local calibration storage

How It Works

Launch

User launches accessibility mode — camera permissions requested

Calibrate

13-point ridge regression builds personalized gaze-to-screen model

Gaze

User looks at a UI element — system detects and highlights it

Gesture

User performs pinch or air tap to confirm and activate the element

Action

Element activates with visual + audio feedback confirming the selection

Layer 1 — Vision Input

Front Camera 120/60/30 FPS auto

Face Mesh 468 landmarks

Hand Tracking 21 landmarks

Head Pose 6-DOF estimation

Layer 2 — Processing

Gaze Vector Pupil → direction

                
                Kalman Filter
                Noise smoothing

EMA Filter α=0.3 smoothing

Gesture Classifier Landmark distances

Layer 3 — Gaze Mapping

Calibration Model Polynomial regression

                
                Screen Mapping
                Gaze → (x,y) coords

Dwell Timer 300ms stabilization

Debounce Logic Anti-jitter guard

Layer 4 — UI Interaction

Element Registry Bounding boxes

Hit Detection Gaze ∩ BBox

                
                Visual Feedback
                Glow + highlight

Audio TTS Speech feedback

Layer 5 — Action Output

Element Activation Simulated tap/click

Action Log Local storage only

Privacy Guard Zero data egress

Technology Stack

Frontend

Hono + TypeScript Vanilla JS CSS Animations Web Speech API

Vision AI

MediaPipe Face Mesh MediaPipe Hands WebGL Backend WASM Processing

Signal Processing

Kalman Filter EMA Smoothing Polynomial Regression Debounce Logic

Deployment

Cloudflare Pages Edge Network Zero-latency CDN HTTPS Only

Performance Targets

<80ms

System Latency

End-to-end response time

60 FPS

Camera Processing

Real-time frame analysis

<150ms

Gesture Detection

Hand gesture recognition

>85%

Gaze Accuracy

Post dynamic-calibration precision

Camera Feed

Camera not started

Click Start to begin

Active Mode

Detection Status

Face Detection Inactive

Gaze Tracking Inactive

Hand Detection Inactive

Gesture None

0 FPS

0 ms

0% Conf.

Gaze X: —

Gaze Y: —

Target: —

Hi! This is the AccessEye demo. Try looking at the buttons below and performing a pinch gesture to interact.

10:30 AM

The system will highlight buttons as your gaze focuses on them. Hold your gaze for 300ms to select.

10:31 AM

Gaze at a reply, then pinch:

Interaction Log

— System ready. Start camera to begin.

Phase 2 — Hybrid Engine ACTIVE

Hybrid | 30FPS | Kalman+EMA+Window

Gaze Confidence

Brightness—

Occlusion—

Glare—

P2 Latency—

Head Pose (6-DOF)

YAW 0°

PITCH 0°

ROLL 0°

Micro-Saccade Filter

Fixation Scanning

Saccades 0

Fixations 0

AI Intent Prediction

Starting camera to begin...

Confidence: 0%

AI intent is ON — predictions update every fixation

Pipeline Benchmark

Dynamic Calibration

Micro-samples: 0

Drift Bias X/Y: 0 / 0

Phase 3 — Advanced Filters

IVT Status: Scanning

Velocity: 0px/f

Adaptive Dwell Timer

Current Preset: Normal (300ms)

PACE Recalibration

Passive samples: 0

Smooth Pursuit Calib.

Follow a moving dot to calibrate without fixed staring

Accuracy Validation

5-point test — thresholds 70% / 85%+ pass

Head-Free Stabilization

Compensates for head movement dynamically

Overview

AccessEye provides a JavaScript API for integrating eye tracking and gesture control into any web application. All processing runs client-side using MediaPipe WebGL workers.

Privacy First: No video data ever leaves the device. All ML inference runs in WebAssembly/WebGL workers in the browser.

Initialization

// Initialize the AccessEye system
const eye = new AccessEye({
  videoElement: document.getElementById('camera'),
  overlayCanvas: document.getElementById('overlay'),
  dwellTime: 300,      // ms before element focuses
  smoothing: 0.3,      // EMA alpha (0-1)
  useKalman: true,     // Kalman filter enabled
  audioFeedback: true,  // Web Speech API TTS
  debug: false
});

await eye.initialize();
await eye.startCamera();

Register UI Elements

// Register a single element
eye.registerElement({
  id: 'sendButton',
  element: document.getElementById('send-btn'),
  label: 'Send Message',    // TTS label
  onActivate: () => sendMessage()
});

// Or register multiple at once
eye.registerElements([
  { id: 'sendBtn',  x: 200, y: 400, width: 120, height: 60,
    label: 'Send', onActivate: () => send() },
  { id: 'menuIcon', x: 20,  y: 40,  width: 40,  height: 40,
    label: 'Menu', onActivate: () => openMenu() }
]);

// Remove element
eye.unregisterElement('sendButton');

Calibration System

// Run 5-point calibration flow
const result = await eye.calibrate({
  points: 13,         // 13-point ridge regression grid
  samplesPerPoint: 30, // frames to average
  timeout: 10000      // max 10s
});

// Calibration result
// { success: true, accuracy: 92.3, model: [...] }

// Save calibration (localStorage)
eye.saveCalibration();

// Load saved calibration
eye.loadCalibration();

// Calibration points schema
// TopLeft(10%,10%), TopRight(90%,10%),
// Center(50%,50%), BottomLeft(10%,90%),
// BottomRight(90%,90%)

Event System

// Listen for gaze events
eye.on('gaze', ({ x, y, confidence }) => {
  console.log(\`Gaze at \${x}, \${y}\`);
});

// Element focused (gaze entered + dwell met)
eye.on('focus', ({ elementId, label }) => {
  console.log(\`Focused: \${label}\`);
});

// Element activated (gesture confirmed)
eye.on('activate', ({ elementId, gesture }) => {
  console.log(\`Activated via \${gesture}\`);
});

// Gesture detected
eye.on('gesture', ({ type, confidence }) => {
  // type: 'pinch' | 'airTap' | 'openPalm'
});

Gaze Engine Internals

Component	Method	Description
Pupil Detection	Face Mesh iris landmarks 468–472	Left/right iris center coords
Gaze Vector	Head pose + iris offset	3D direction from eye
Kalman Filter	2-state Kalman (pos+vel)	Removes jitter noise
EMA Smoothing	α=0.3 per-axis	Temporal smoothing
Screen Mapping	Polynomial regression	Calibrated gaze→screen coords
Dwell Timer	300ms window	Anti-accidental selection

Gesture Recognition

Gesture	Detection Logic	Action	Debounce
Pinch	Thumb–index distance <0.05 (normalized)	Select / Click	500ms
Air Tap	Index forward Z-delta >0.04 in 150ms	Click	600ms
Open Palm	All finger spread >0.08 avg	Cancel / Back	800ms

Testing Plan

Glasses Users

Test with thick-frame glasses and anti-reflective coatings. Adjust iris detection threshold for glare compensation.

✓ Supported

Low Lighting

Test at <100 lux. MediaPipe Face Mesh remains robust to 50 lux with confidence threshold of 0.6.

✓ Supported

Slow Head Movement

EMA + Kalman smoothing handles slow-head tremor users. Dwell window extended to 400ms for motor impairments.

✓ Supported

Tremor Conditions

Kalman velocity state dampens high-frequency tremor. Gaze stabilization window set to 300–400ms.

✓ Supported

Chrome Extension — Manifest V3

AccessEye Voice Control

Control any website with your voice. Open tabs, scroll pages, navigate back and forward — all hands-free. Runs persistently in the background across every tab you visit.

MV3Manifest

Always OnPersistent

All SitesCoverage

Zero CloudPrivacy

Download Extension ZIP · Unpack & Load in Chrome

👁 AccessEye

Listening

⏹ Stop Listening

Tabs

"New tab"Open + focus

"Next tab"Tab right

"Close tab"Close current

Scroll

"Scroll down"Page down

"Scroll to top"Jump top

Get Set Up in 4 Steps

Download the Extension

Click the download button above to get the extension ZIP file to your computer.

Download ZIP

Unzip the File

Extract the ZIP to a permanent folder on your computer. Don't delete this folder — Chrome needs it.

# Mac / Linux
unzip accesseye-extension.zip -d ~/AccessEye

# Windows: Right-click → Extract All

Load in Chrome

Open Chrome's extension manager, enable developer mode, and load the unpacked extension folder.

a Toggle Developer mode ON (top-right)

b Click Load unpacked

c Select the extension/ folder inside your unzipped directory

d AccessEye appears in your extensions list ✓

Start Listening

Click the AccessEye icon in your Chrome toolbar, then hit Start Listening. Allow microphone access when prompted. You're live.

✓ Pin it to your toolbar for quick access

✓ Works on every website automatically

✓ Voice recognition runs in the background

All Voice Commands

Tab Control

"New tab"Open + focus blank tab

"Next tab"Switch right

"Previous tab"Switch left

"Close tab"Close current tab

"Go to next tab"Same as next tab

"Switch tab"Same as next tab

Scrolling

"Scroll down"600px down

"Scroll up"600px up

"Scroll to top"Jump to top

"Scroll to bottom"Jump to bottom

"Go down"Same as scroll down

"Page down"Same as scroll down

Navigation

"Go back"Browser back

"Go forward"Browser forward

"Reload"Refresh page

"Refresh"Same as reload

Interaction

"Click [text]"Click element by label

"Press [text]"Same as click

"Tap [text]"Same as click

"Select [text]"Same as click

How It Works

Mic Input

Web Speech API
offscreen doc

Voice Engine

Normalize + parse
intent mapping

Background Worker

Service worker
always running

Chrome APIs

tabs.create()
tabs.update()

Content Script

Injected in every
page: scroll/click

Control Your Phone With Eyes & Gestures

System Layers

Vision Input Layer

Gaze Mapping Engine

UI Target Detection

Gesture Engine

Calibration System

Privacy & Safety

How It Works

Launch

Calibrate

Gaze

Gesture

Action

System Architecture

Technology Stack

Frontend

Vision AI

Signal Processing

Deployment

Performance Targets

Camera Feed

Active Mode

Detection Status

Accessibility Demo — Messaging App

API Documentation

Overview

Initialization

Register UI Elements

Calibration System

Event System

Gaze Engine Internals

Gesture Recognition

Testing Plan

Glasses Users

Low Lighting

Slow Head Movement

Tremor Conditions

Browser Extension

AccessEye Voice Control

Get Set Up in 4 Steps

Download the Extension

Unzip the File

Load in Chrome

Start Listening

All Voice Commands

How It Works

Eye Tracking Calibration

Control Your Phone With
Eyes & Gestures