How It Works
The tracker uses two neural networks:- Localizer network: Quickly finds your face in the camera frame
- Pose estimator network: Accurately determines head position and orientation from the face region
The neural network models are included with OpenTrack. The default model (head-pose-0.4-big-int8.onnx) is quantized to INT8 for faster inference while maintaining good accuracy.
Requirements
Hardware
- Webcam: Any standard webcam (640x480 or higher)
- CPU: Modern multi-core processor (the tracker is CPU-optimized)
- RAM: At least 4GB system memory
Software
- ONNX Runtime (included with OpenTrack)
- OpenCV (included with OpenTrack)
- Trained pose estimation model (included)
No special hardware or physical markers required - just your face and a webcam!
Setup Instructions
Configure Field of View
Set your camera’s horizontal field of view:The FOV affects depth estimation accuracy. Most webcams are 50-65 degrees.
Set Head Offset
Configure the offset from face detection point to head rotation center:Or use automatic calibration:
- Click Start in calibration section
- Rotate your head while keeping body still
- System calculates optimal offset
- Click Stop when satisfied
Configuration Options
Camera Settings
| Option | Default | Description |
|---|---|---|
| Camera Name | - | Select your webcam |
| Force Resolution | 0 (auto) | Set specific resolution |
| Field of View | 56° | Camera horizontal FOV |
| Force FPS | Default | Lock framerate (30, 60, 90, etc.) |
| Use MJPEG | false | Enable MJPEG compression |
Head Position Offset
| Option | Default | Description |
|---|---|---|
| Offset Forward | 200mm | Distance from detected face point to neck pivot |
| Offset Up | 0mm | Vertical offset |
| Offset Right | 0mm | Horizontal offset |
The forward offset (typically 150-250mm) is crucial for accurate translation tracking. It represents the distance from the face plane to your neck’s rotation center.
Neural Network Settings
| Option | Default | Description |
|---|---|---|
| PoseNet File | head-pose-0.4-big-int8.onnx | Neural network model file |
| Number of Threads | 1 | CPU threads for inference (1-4 recommended) |
| Show Network Input | false | Display preprocessed input to neural network |
Filtering and Smoothing
| Option | Default | Description |
|---|---|---|
| Internal Filter Enabled | true | Enable built-in Kalman filtering |
| ROI Filter Alpha | 1.0 | Region-of-interest smoothing (0-1, lower = smoother) |
| ROI Zoom | 1.0 | Zoom factor for face region extraction |
| Deadzone Size | 1.0 | Circular deadzone size (mm) |
| Deadzone Hardness | 1.5 | Deadzone transition sharpness |
Understanding the Internal Filter
Understanding the Internal Filter
The internal Kalman filter smooths the raw pose estimates from the neural network. It helps reduce jitter while maintaining responsiveness. The filter uses:
- Unscented transform for non-linear pose spaces
- Quaternion representation for rotations
- Velocity prediction for smooth motion
Advanced Features
Automatic Face Localization
The tracker uses a two-stage approach:- Coarse search: The localizer network scans the full frame at multiple scales
- Fine tracking: Once a face is found, the pose estimator focuses on that region
- ROI tracking: Subsequent frames only process the region around the last known face position
- Recovery: If face is lost, returns to full-frame search
Adaptive ROI Filtering
The region of interest is filtered over time to prevent jittery crops:alpha values create smoother ROI transitions but may lag behind fast movements.
Deadzone Filter
A circular deadzone filter reduces noise from small movements:- Creates a “dead zone” around the current position
- Small movements within the zone are dampened
- Large movements pass through normally
- Useful for steady aiming in games
Neural Network Models
Default Model
head-pose-0.4-big-int8.onnx- Quantized to INT8 for fast CPU inference
- Input: 129x129 grayscale face crop
- Output: Head pose (3D position + rotation quaternion + uncertainty)
- Inference time: ~15-30ms on modern CPU
Custom Models
You can use custom ONNX models:- Train your own face pose estimation model
- Export to ONNX format
- Place in OpenTrack’s model directory
- Select in PoseNet File dropdown
Performance Optimization
Adjust Thread Count
Start with 1 thread, increase to 2-4 if CPU usage is low:
- 1 thread: ~30-40 FPS on modern CPU
- 2 threads: ~50-60 FPS (diminishing returns)
- 4 threads: ~60-70 FPS (limited gains)
Optimize Resolution
Higher camera resolution improves face detection but reduces FPS:
- 320x240: Very fast, may miss faces at distance
- 640x480: Good balance (recommended)
- 1280x720: Better detection range, slower
Reduce ROI Zoom
Lower ROI zoom (0.8-0.9) processes fewer pixels:
- May increase FPS slightly
- Risk of losing face if it moves quickly
- Only use if you need every bit of performance
Troubleshooting
Face not detected
Face not detected
Lighting issues:
- Ensure face is well-lit
- Avoid backlighting (window behind you)
- Add front lighting if needed
- Face should be roughly centered in frame
- Keep face within 30-100cm from camera
- Don’t rotate face more than 60° from camera
- Increase camera resolution
- Adjust camera exposure settings
- Try different ROI zoom values
Jittery or unstable tracking
Jittery or unstable tracking
- Enable internal filter (if disabled)
- Reduce ROI filter alpha (e.g., 0.5-0.7)
- Use OpenTrack’s Accela filter
- Increase deadzone size
- Improve lighting for better face detection
- Ensure stable camera mounting
Tracking loses face easily
Tracking loses face easily
- Increase ROI zoom to 1.2-1.5
- Increase ROI filter alpha for faster response
- Improve lighting conditions
- Move closer to camera
- Reduce extreme head rotations
Low framerate / High CPU usage
Low framerate / High CPU usage
- Reduce number of threads (try 1-2 only)
- Lower camera resolution
- Enable MJPEG compression
- Close background applications
- Disable “Show Network Input” option
- Use lower resolution camera mode
Translation tracking incorrect
Translation tracking incorrect
- Recalibrate head offset (forward distance is critical)
- Verify FOV setting matches your camera
- Check that face is detected at correct size
- Ensure camera is at eye level
Advantages and Limitations
Advantages
- No markers or special hardware
- Easy setup - just point and track
- Works in normal lighting
- No wearing anything on head
- Good for casual use
- Constantly improving with better models
Limitations
- Higher latency than marker tracking (~50-80ms)
- More CPU intensive
- Requires good lighting
- Limited rotation range (±70°)
- Less accurate than IR point tracking
- Can be affected by facial expressions
Comparison with Other Trackers
| Feature | NeuralNet | ArUco | PointTracker |
|---|---|---|---|
| Marker required | No | Yes (paper) | Yes (IR LEDs) |
| Setup difficulty | Very Easy | Easy | Medium |
| Hardware cost | Very Low | Very Low | Medium |
| Accuracy | Good | Good | Excellent |
| Latency | Medium (50-80ms) | Low (20-30ms) | Very Low (10-20ms) |
| CPU usage | Medium-High | Low | Low |
| Rotation range | ±70° | ±60° | ±90° |
Tips for Best Results
- Lighting: Use soft, diffuse front lighting - avoid harsh shadows on face
- Camera: Position at eye level, 50-80cm away, angled slightly downward
- Background: Keep background simple to help face detection
- Movement: Start with small movements to establish tracking before large rotations
- Calibration: Take time to properly calibrate the forward offset
- Filtering: Use OpenTrack’s Accela filter for smoother gaming experience
Technical Details
Model Architecture
The default model uses:- MobileNet-based backbone for efficiency
- Regression head for pose parameters
- Uncertainty estimation for filter confidence
- INT8 quantization for 4x speedup
Coordinate Systems
- Face coordinates: Detected face center in camera space
- Head coordinates: Rotation center after applying offset
- Output: Standard OpenTrack 6DOF (X, Y, Z, Yaw, Pitch, Roll)
Build Requirements
For building with NeuralNet support:See Also
- ArUco Tracker - Marker-based alternative
- PointTracker - Highest accuracy option
- Hardware Guide - General hardware information