What is RTSP Human Capture?
RTSP Human Capture is a multi-stream RTSP person-detection tool built on YOLOv4 / YOLOv3 (with an automatic HOG fallback) and OpenCV. When a person enters a camera frame, the tool saves an annotated JPEG snapshot or starts recording an MP4 clip. Multiple cameras run in parallel threads and can be watched in a single composited grid window.Key Features
Advanced Detection
YOLOv4 / YOLOv3 detection with automatic fallback to OpenCV HOG when model files are absent
GPU Acceleration
CUDA GPU acceleration automatically detected and enabled; falls back to CPU gracefully
Multi-Stream Support
Single or multiple RTSP streams processed concurrently via threads
Flexible Output
Two save modes:
image (annotated JPEG snapshot) or video (MP4 clip of entire presence)Live Display
Dedicated window for single streams; resizable grid window for multiple streams
Auto Reconnect
Each stream retries up to 5 times on read failure before giving up
Use Cases
Security Monitoring
Automatically capture and record when people enter monitored areas from multiple camera feeds simultaneously.Retail Analytics
Track customer presence and movement patterns across multiple store locations with synchronized detection.Smart Home
Receive alerts and recordings when people are detected in specific zones around your property.Event Recording
Capture footage only when people are present, saving storage space and making review more efficient.System Requirements
Required
- Python: 3.12 or higher
- Package Manager: uv (recommended) or pip
- Operating System: Linux, Windows, or macOS
Optional
- GPU: NVIDIA GPU with CUDA support for hardware-accelerated inference
- Model Files: YOLOv4 or YOLOv3 weights and configuration (HOG fallback available without models)
The tool automatically detects CUDA availability and falls back to CPU if not available. Similarly, if YOLO model files are not found, it uses OpenCV’s built-in HOG person detector.
How It Works
- Stream Connection: Connects to one or more RTSP camera streams
- Frame Processing: Analyzes every Nth frame (configurable, default every 15th frame)
- Person Detection: Uses YOLOv4/YOLOv3 or HOG to detect people in frames
- Capture: When a person is detected:
- Image mode: Saves an annotated JPEG snapshot
- Video mode: Records an MP4 clip for the duration of presence
- Output Organization: Saves files to organized directories per stream
Architecture Overview
The project is organized into focused modules:main.py- CLI entry point and argument parsingconfig.py- Configuration loader (AppConfig dataclass)person_detector.py- YOLOv4 / YOLOv3 / HOG inference (thread-safe)stream_processor.py- Per-stream loop, save logic, and reconnect handlingmulti_stream_manager.py- Thread orchestration for multiple streamsdisplay_manager.py- Grid window composition and display thread
All detection methods are thread-safe, allowing multiple streams to share a single detector instance efficiently.