Accessibility Technology

Control Your Phone With
Eyes & Gestures

A production-ready system for touchless interaction using hybrid gaze estimation, 6-DOF head pose, temporal stabilization, micro-saccade filtering, dynamic calibration & AI intent prediction — 100% on-device, no cloud, no neural implants.

<80msLatency
60 FPSCamera
9-PointCalibration
100%On-Device
Real-time Gaze Tracking

System Layers

Vision Input Layer

MediaPipe Face Mesh + Hands captures eye position, head pose, and hand landmarks at 60/120 FPS with multi-point iris + eyelid contours.

  • Pupil + iris detection
  • Head pose estimation
  • 21-point hand landmarks

Gaze Mapping Engine

Hybrid gaze model fuses binocular iris offset, head pose vector, and pupil boundary with temporal filtering to eliminate jitter.

  • Adaptive Kalman + EMA + window
  • Micro-saccade filter (12px/200ms)
  • Confidence-weighted fusion

UI Target Detection

Registers interactive components as bounding boxes. Detects gaze intersection with 300ms dwell time.

  • Bounding box registry
  • Dwell-time focus
  • Visual glow feedback

Gesture Engine

MediaPipe Hands landmarks power pinch detection, air tap recognition, and open palm cancel.

  • Pinch → Select
  • Air tap → Click
  • Open palm → Cancel

Calibration System

9-point calibration + continuous dynamic micro-calibration from confirmed interactions keeps gaze drift corrected over time.

  • 9-point polynomial regression
  • Dynamic bias drift correction
  • Confidence-gated updates

Privacy & Safety

All video processing is 100% on-device. No camera data ever leaves the browser or device.

  • Zero cloud processing
  • No data transmission
  • Local calibration storage

How It Works

01

Launch

User launches accessibility mode — camera permissions requested

02

Calibrate

5-point calibration builds personalized gaze-to-screen model

03

Gaze

User looks at a UI element — system detects and highlights it

04

Gesture

User performs pinch or air tap to confirm and activate the element

05

Action

Element activates with visual + audio feedback confirming the selection

Layer 1 — Vision Input
Front Camera 120/60/30 FPS auto
Face Mesh 468 landmarks
Hand Tracking 21 landmarks
Head Pose 6-DOF estimation
Layer 2 — Processing
Gaze Vector Pupil → direction
Kalman Filter Noise smoothing
EMA Filter α=0.3 smoothing
Gesture Classifier Landmark distances
Layer 3 — Gaze Mapping
Calibration Model Polynomial regression
Screen Mapping Gaze → (x,y) coords
Dwell Timer 300ms stabilization
Debounce Logic Anti-jitter guard
Layer 4 — UI Interaction
Element Registry Bounding boxes
Hit Detection Gaze ∩ BBox
Visual Feedback Glow + highlight
Audio TTS Speech feedback
Layer 5 — Action Output
Element Activation Simulated tap/click
Action Log Local storage only
Privacy Guard Zero data egress

Technology Stack

Frontend

Hono + TypeScript Vanilla JS CSS Animations Web Speech API

Vision AI

MediaPipe Face Mesh MediaPipe Hands WebGL Backend WASM Processing

Signal Processing

Kalman Filter EMA Smoothing Polynomial Regression Debounce Logic

Deployment

Cloudflare Pages Edge Network Zero-latency CDN HTTPS Only

Performance Targets

<80ms
System Latency
End-to-end response time
60 FPS
Camera Processing
Real-time frame analysis
<150ms
Gesture Detection
Hand gesture recognition
>85%
Gaze Accuracy
Post dynamic-calibration precision

Camera Feed

Camera not started

Click Start to begin

Active Mode

Detection Status

Face Detection Inactive
Gaze Tracking Inactive
Hand Detection Inactive
Gesture None
0 FPS
0 ms
0% Conf.
Gaze X:
Gaze Y:
Target:

Accessibility Demo — Messaging App

Start camera, then move your gaze over buttons. Perform pinch or air tap to activate.

Hi! This is the AccessEye demo. Try looking at the buttons below and performing a pinch gesture to interact.

10:30 AM

The system will highlight buttons as your gaze focuses on them. Hold your gaze for 300ms to select.

10:31 AM
Gaze at a reply, then pinch:
Toolbar Actions:
Interaction Log
System ready. Start camera to begin.

Overview

AccessEye provides a JavaScript API for integrating eye tracking and gesture control into any web application. All processing runs client-side using MediaPipe WebGL workers.

Privacy First: No video data ever leaves the device. All ML inference runs in WebAssembly/WebGL workers in the browser.

Initialization

// Initialize the AccessEye system
const eye = new AccessEye({
  videoElement: document.getElementById('camera'),
  overlayCanvas: document.getElementById('overlay'),
  dwellTime: 300,      // ms before element focuses
  smoothing: 0.3,      // EMA alpha (0-1)
  useKalman: true,     // Kalman filter enabled
  audioFeedback: true,  // Web Speech API TTS
  debug: false
});

await eye.initialize();
await eye.startCamera();

Register UI Elements

Register interactive elements to make them gaze-targetable:

// Register a single element
eye.registerElement({
  id: 'sendButton',
  element: document.getElementById('send-btn'),
  label: 'Send Message',    // TTS label
  onActivate: () => sendMessage()
});

// Or register multiple at once
eye.registerElements([
  { id: 'sendBtn',  x: 200, y: 400, width: 120, height: 60,
    label: 'Send', onActivate: () => send() },
  { id: 'menuIcon', x: 20,  y: 40,  width: 40,  height: 40,
    label: 'Menu', onActivate: () => openMenu() }
]);

// Remove element
eye.unregisterElement('sendButton');

Calibration System

// Run 5-point calibration flow
const result = await eye.calibrate({
  points: 5,          // 5-point grid
  samplesPerPoint: 30, // frames to average
  timeout: 10000      // max 10s
});

// Calibration result
// { success: true, accuracy: 92.3, model: [...] }

// Save calibration (localStorage)
eye.saveCalibration();

// Load saved calibration
eye.loadCalibration();

// Calibration points schema
// TopLeft(10%,10%), TopRight(90%,10%),
// Center(50%,50%), BottomLeft(10%,90%),
// BottomRight(90%,90%)

Event System

// Listen for gaze events
eye.on('gaze', ({ x, y, confidence }) => {
  console.log(`Gaze at ${x}, ${y}`);
});

// Element focused (gaze entered + dwell met)
eye.on('focus', ({ elementId, label }) => {
  console.log(`Focused: ${label}`);
});

// Element activated (gesture confirmed)
eye.on('activate', ({ elementId, gesture }) => {
  console.log(`Activated via ${gesture}`);
});

// Gesture detected
eye.on('gesture', ({ type, confidence }) => {
  // type: 'pinch' | 'airTap' | 'openPalm'
});

Gaze Engine Internals

ComponentMethodDescription
Pupil DetectionFace Mesh iris landmarks 468–472Left/right iris center coords
Gaze VectorHead pose + iris offset3D direction from eye
Kalman Filter2-state Kalman (pos+vel)Removes jitter noise
EMA Smoothingα=0.3 per-axisTemporal smoothing
Screen MappingPolynomial regressionCalibrated gaze→screen coords
Dwell Timer300ms windowAnti-accidental selection

Gesture Recognition

GestureDetection LogicActionDebounce
PinchThumb–index distance <0.05 (normalized)Select / Click500ms
Air TapIndex forward Z-delta >0.04 in 150msClick600ms
Open PalmAll finger spread >0.08 avgCancel / Back800ms

Testing Plan

Glasses Users

Test with thick-frame glasses and anti-reflective coatings. Adjust iris detection threshold for glare compensation.

✓ Supported

Low Lighting

Test at <100 lux. MediaPipe Face Mesh remains robust to 50 lux with confidence threshold of 0.6.

✓ Supported

Slow Head Movement

EMA + Kalman smoothing handles slow-head tremor users. Dwell window extended to 400ms for motor impairments.

✓ Supported

Tremor Conditions

Kalman velocity state dampens high-frequency tremor. Gaze stabilization window set to 300–400ms.

✓ Supported