Skip to main content
The NeuralNet tracker uses deep learning models to detect and track your face without requiring any physical markers. It’s the most convenient tracker to set up - just point your webcam at your face and start tracking.

How It Works

The tracker uses two neural networks:
  1. Localizer network: Quickly finds your face in the camera frame
  2. Pose estimator network: Accurately determines head position and orientation from the face region
Both networks are optimized ONNX models that run efficiently on CPU using ONNX Runtime.
The neural network models are included with OpenTrack. The default model (head-pose-0.4-big-int8.onnx) is quantized to INT8 for faster inference while maintaining good accuracy.

Requirements

Hardware

  • Webcam: Any standard webcam (640x480 or higher)
  • CPU: Modern multi-core processor (the tracker is CPU-optimized)
  • RAM: At least 4GB system memory

Software

  • ONNX Runtime (included with OpenTrack)
  • OpenCV (included with OpenTrack)
  • Trained pose estimation model (included)
No special hardware or physical markers required - just your face and a webcam!

Setup Instructions

1

Select Camera

In OpenTrack tracker settings:
Camera: Select your webcam
Resolution: 640x480 or higher
Force FPS: 30-60 (higher is smoother)
Use MJPEG: Enable if supported
2

Configure Field of View

Set your camera’s horizontal field of view:
Field of View: 56 degrees (typical for most webcams)
The FOV affects depth estimation accuracy. Most webcams are 50-65 degrees.
3

Set Head Offset

Configure the offset from face detection point to head rotation center:
Offset Forward: 200mm (typical distance from face to neck pivot)
Offset Up: 0mm
Offset Right: 0mm
Or use automatic calibration:
  1. Click Start in calibration section
  2. Rotate your head while keeping body still
  3. System calculates optimal offset
  4. Click Stop when satisfied
4

Adjust Performance Settings

Optimize for your system:
Number of Threads: 1-4 (more threads = faster, but diminishing returns)
ROI Filter Alpha: 1.0 (lower = smoother ROI transitions)
ROI Zoom: 1.0 (how much to zoom into face region)
5

Start Tracking

  1. Click Start in OpenTrack
  2. Position your face in camera view
  3. The tracker will automatically detect and track your face
  4. Green overlay shows detected face region
The tracker may take 1-2 seconds to initially detect your face. Once locked on, tracking is continuous.

Configuration Options

Camera Settings

OptionDefaultDescription
Camera Name-Select your webcam
Force Resolution0 (auto)Set specific resolution
Field of View56°Camera horizontal FOV
Force FPSDefaultLock framerate (30, 60, 90, etc.)
Use MJPEGfalseEnable MJPEG compression

Head Position Offset

OptionDefaultDescription
Offset Forward200mmDistance from detected face point to neck pivot
Offset Up0mmVertical offset
Offset Right0mmHorizontal offset
The forward offset (typically 150-250mm) is crucial for accurate translation tracking. It represents the distance from the face plane to your neck’s rotation center.

Neural Network Settings

OptionDefaultDescription
PoseNet Filehead-pose-0.4-big-int8.onnxNeural network model file
Number of Threads1CPU threads for inference (1-4 recommended)
Show Network InputfalseDisplay preprocessed input to neural network

Filtering and Smoothing

OptionDefaultDescription
Internal Filter EnabledtrueEnable built-in Kalman filtering
ROI Filter Alpha1.0Region-of-interest smoothing (0-1, lower = smoother)
ROI Zoom1.0Zoom factor for face region extraction
Deadzone Size1.0Circular deadzone size (mm)
Deadzone Hardness1.5Deadzone transition sharpness
The internal Kalman filter smooths the raw pose estimates from the neural network. It helps reduce jitter while maintaining responsiveness. The filter uses:
  • Unscented transform for non-linear pose spaces
  • Quaternion representation for rotations
  • Velocity prediction for smooth motion
Disable this filter only if you want to use OpenTrack’s external filters exclusively.

Advanced Features

Automatic Face Localization

The tracker uses a two-stage approach:
  1. Coarse search: The localizer network scans the full frame at multiple scales
  2. Fine tracking: Once a face is found, the pose estimator focuses on that region
  3. ROI tracking: Subsequent frames only process the region around the last known face position
  4. Recovery: If face is lost, returns to full-frame search
This approach provides both fast initial detection and efficient continuous tracking.

Adaptive ROI Filtering

The region of interest is filtered over time to prevent jittery crops:
ROI_new = ROI_old * (1 - alpha) + ROI_detected * alpha
Lower alpha values create smoother ROI transitions but may lag behind fast movements.

Deadzone Filter

A circular deadzone filter reduces noise from small movements:
  • Creates a “dead zone” around the current position
  • Small movements within the zone are dampened
  • Large movements pass through normally
  • Useful for steady aiming in games

Neural Network Models

Default Model

head-pose-0.4-big-int8.onnx
  • Quantized to INT8 for fast CPU inference
  • Input: 129x129 grayscale face crop
  • Output: Head pose (3D position + rotation quaternion + uncertainty)
  • Inference time: ~15-30ms on modern CPU

Custom Models

You can use custom ONNX models:
  1. Train your own face pose estimation model
  2. Export to ONNX format
  3. Place in OpenTrack’s model directory
  4. Select in PoseNet File dropdown
Custom models must match the expected input/output format. See the model adapter code for interface details.

Performance Optimization

1

Adjust Thread Count

Start with 1 thread, increase to 2-4 if CPU usage is low:
  • 1 thread: ~30-40 FPS on modern CPU
  • 2 threads: ~50-60 FPS (diminishing returns)
  • 4 threads: ~60-70 FPS (limited gains)
More threads increase CPU usage significantly.
2

Optimize Resolution

Higher camera resolution improves face detection but reduces FPS:
  • 320x240: Very fast, may miss faces at distance
  • 640x480: Good balance (recommended)
  • 1280x720: Better detection range, slower
3

Reduce ROI Zoom

Lower ROI zoom (0.8-0.9) processes fewer pixels:
  • May increase FPS slightly
  • Risk of losing face if it moves quickly
  • Only use if you need every bit of performance
4

Disable Preview

Turn off video preview when not needed:
  • Saves CPU cycles for drawing
  • Minimal but measurable FPS improvement

Troubleshooting

Lighting issues:
  • Ensure face is well-lit
  • Avoid backlighting (window behind you)
  • Add front lighting if needed
Camera positioning:
  • Face should be roughly centered in frame
  • Keep face within 30-100cm from camera
  • Don’t rotate face more than 60° from camera
Settings:
  • Increase camera resolution
  • Adjust camera exposure settings
  • Try different ROI zoom values
  • Enable internal filter (if disabled)
  • Reduce ROI filter alpha (e.g., 0.5-0.7)
  • Use OpenTrack’s Accela filter
  • Increase deadzone size
  • Improve lighting for better face detection
  • Ensure stable camera mounting
  • Increase ROI zoom to 1.2-1.5
  • Increase ROI filter alpha for faster response
  • Improve lighting conditions
  • Move closer to camera
  • Reduce extreme head rotations
  • Reduce number of threads (try 1-2 only)
  • Lower camera resolution
  • Enable MJPEG compression
  • Close background applications
  • Disable “Show Network Input” option
  • Use lower resolution camera mode
  • Recalibrate head offset (forward distance is critical)
  • Verify FOV setting matches your camera
  • Check that face is detected at correct size
  • Ensure camera is at eye level

Advantages and Limitations

Advantages

  • No markers or special hardware
  • Easy setup - just point and track
  • Works in normal lighting
  • No wearing anything on head
  • Good for casual use
  • Constantly improving with better models

Limitations

  • Higher latency than marker tracking (~50-80ms)
  • More CPU intensive
  • Requires good lighting
  • Limited rotation range (±70°)
  • Less accurate than IR point tracking
  • Can be affected by facial expressions

Comparison with Other Trackers

FeatureNeuralNetArUcoPointTracker
Marker requiredNoYes (paper)Yes (IR LEDs)
Setup difficultyVery EasyEasyMedium
Hardware costVery LowVery LowMedium
AccuracyGoodGoodExcellent
LatencyMedium (50-80ms)Low (20-30ms)Very Low (10-20ms)
CPU usageMedium-HighLowLow
Rotation range±70°±60°±90°

Tips for Best Results

  1. Lighting: Use soft, diffuse front lighting - avoid harsh shadows on face
  2. Camera: Position at eye level, 50-80cm away, angled slightly downward
  3. Background: Keep background simple to help face detection
  4. Movement: Start with small movements to establish tracking before large rotations
  5. Calibration: Take time to properly calibrate the forward offset
  6. Filtering: Use OpenTrack’s Accela filter for smoother gaming experience

Technical Details

Model Architecture

The default model uses:
  • MobileNet-based backbone for efficiency
  • Regression head for pose parameters
  • Uncertainty estimation for filter confidence
  • INT8 quantization for 4x speedup

Coordinate Systems

  • Face coordinates: Detected face center in camera space
  • Head coordinates: Rotation center after applying offset
  • Output: Standard OpenTrack 6DOF (X, Y, Z, Yaw, Pitch, Roll)

Build Requirements

For building with NeuralNet support:
# ONNX Runtime required
# See BUILD.md in tracker-neuralnet directory

See Also