Introduction

What is RTSP Human Capture?

RTSP Human Capture is a multi-stream RTSP person-detection tool built on YOLOv4 / YOLOv3 (with an automatic HOG fallback) and OpenCV. When a person enters a camera frame, the tool saves an annotated JPEG snapshot or starts recording an MP4 clip. Multiple cameras run in parallel threads and can be watched in a single composited grid window.

Key Features

Advanced Detection

YOLOv4 / YOLOv3 detection with automatic fallback to OpenCV HOG when model files are absent

GPU Acceleration

CUDA GPU acceleration automatically detected and enabled; falls back to CPU gracefully

Multi-Stream Support

Single or multiple RTSP streams processed concurrently via threads

Flexible Output

Two save modes: image (annotated JPEG snapshot) or video (MP4 clip of entire presence)

Live Display

Dedicated window for single streams; resizable grid window for multiple streams

Auto Reconnect

Each stream retries up to 5 times on read failure before giving up

Use Cases

Security Monitoring

Automatically capture and record when people enter monitored areas from multiple camera feeds simultaneously.

Retail Analytics

Track customer presence and movement patterns across multiple store locations with synchronized detection.

Smart Home

Receive alerts and recordings when people are detected in specific zones around your property.

Event Recording

Capture footage only when people are present, saving storage space and making review more efficient.

System Requirements

Required

Python: 3.12 or higher
Package Manager: uv (recommended) or pip
Operating System: Linux, Windows, or macOS

Optional

GPU: NVIDIA GPU with CUDA support for hardware-accelerated inference
Model Files: YOLOv4 or YOLOv3 weights and configuration (HOG fallback available without models)

The tool automatically detects CUDA availability and falls back to CPU if not available. Similarly, if YOLO model files are not found, it uses OpenCV’s built-in HOG person detector.

How It Works

Stream Connection: Connects to one or more RTSP camera streams
Frame Processing: Analyzes every Nth frame (configurable, default every 15th frame)
Person Detection: Uses YOLOv4/YOLOv3 or HOG to detect people in frames
Capture: When a person is detected:
- Image mode: Saves an annotated JPEG snapshot
- Video mode: Records an MP4 clip for the duration of presence
Output Organization: Saves files to organized directories per stream

Architecture Overview

The project is organized into focused modules:

main.py - CLI entry point and argument parsing
config.py - Configuration loader (AppConfig dataclass)
person_detector.py - YOLOv4 / YOLOv3 / HOG inference (thread-safe)
stream_processor.py - Per-stream loop, save logic, and reconnect handling
multi_stream_manager.py - Thread orchestration for multiple streams
display_manager.py - Grid window composition and display thread

All detection methods are thread-safe, allowing multiple streams to share a single detector instance efficiently.

What is RTSP Human Capture?

Key Features

Advanced Detection

GPU Acceleration

Multi-Stream Support

Flexible Output

Live Display

Auto Reconnect

Use Cases

Security Monitoring

Retail Analytics

Smart Home

Event Recording

System Requirements

Required

Optional

How It Works

Architecture Overview

Next Steps

Installation

Quick Start

​What is RTSP Human Capture?

​Key Features

Advanced Detection

GPU Acceleration

Multi-Stream Support

Flexible Output

Live Display

Auto Reconnect

​Use Cases

​Security Monitoring

​Retail Analytics

​Smart Home

​Event Recording

​System Requirements

​Required

​Optional

​How It Works

​Architecture Overview

​Next Steps

Installation

Quick Start

What is RTSP Human Capture?

Key Features

Use Cases

Security Monitoring

Retail Analytics

Smart Home

Event Recording

System Requirements

Required

Optional

How It Works

Architecture Overview

Next Steps