Single Stream Processing - RTSP Human Capture

Overview

Single stream mode processes one RTSP camera feed with real-time person detection. When a person enters the frame, the system can:

Save an annotated JPEG snapshot
Record an MP4 video clip of their presence
Display a live annotated window
Print detection events to console

Single stream mode uses a dedicated display window (1280×720 resolution). For multiple cameras, see Multi-Stream Processing.

Basic Usage

Minimal Example

python main.py --rtsp "rtsp://camera.local/stream" --save image

This command:

Connects to the RTSP stream
Runs person detection on every 15th frame (default)
Saves JPEG snapshots when persons are detected
Shows a live display window

With uv (Recommended)

uv run main.py --rtsp "rtsp://camera.local/stream" --save image

Command Structure

The basic command structure for single stream processing:

python main.py --rtsp <URL> --save <image|video> [OPTIONS]

--rtsp

string

required

RTSP stream URL to process

--save

choice

required

Save mode: image for snapshots or video for clips

Display Options

Control whether to show a live display window during processing.

With Display (Default)

By default, single stream mode shows a live annotated window:

python main.py --rtsp "rtsp://camera.local/stream" --save image

Window features:

Resized to 1280×720 for consistent viewing
Green bounding boxes around detected persons
Confidence scores displayed above boxes
Person count and entry counter in top-left
Press ‘q’ to quit

Implementation (stream_processor.py:363-396):

if display:
    display_frame = cv2.resize(frame.copy(), (1280, 720))
    
    # Scale bounding boxes to display resolution
    original_height, original_width = frame.shape[:2]
    scale_x = 1280 / original_width
    scale_y = 720 / original_height
    
    for x, y, w, h, confidence in boxes:
        sx, sy = int(x * scale_x), int(y * scale_y)
        sw, sh = int(w * scale_x), int(h * scale_y)
        cv2.rectangle(display_frame, (sx, sy),
                      (sx + sw, sy + sh), (0, 255, 0), 2)
        cv2.putText(
            display_frame,
            f"Person {confidence:.2f}",
            (sx, sy - 10),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.5,
            (0, 255, 0),
            2,
        )
    
    cv2.putText(
        display_frame,
        f"Persons: {person_count} | Entries: {person_entry_count}",
        (10, 30),
        cv2.FONT_HERSHEY_SIMPLEX,
        1,
        (0, 255, 0),
        2,
    )
    cv2.imshow("RTSP Person Detection", display_frame)
    if cv2.waitKey(1) & 0xFF == ord("q"):
        break

Without Display (Headless)

Disable the display window for headless servers or background processing:

python main.py --rtsp "rtsp://camera.local/stream" --save image --no-display

--no-display

flag

Disable the live display window (single stream only)

Use cases:

Running on servers without GUI
Background processing
Reduced resource usage
Remote/SSH sessions

From main.py:142:

processor.process_rtsp_stream(
    rtsp_url=args.rtsp,
    frame_skip=cfg.frame_skip,
    display=not args.no_display,  # display=True unless --no-display
    save_mode=args.save,
)

With Display
Without Display

# Shows live window
python main.py --rtsp "rtsp://192.168.1.100/stream" --save video

Output:

Connected successfully! Processing frames...
Press 'q' to quit
[2026-03-09 14:30:22] Frame 15: 2 person(s) detected
  Person entered frame! Entry #1

Live OpenCV window appears

# Headless mode
python main.py --rtsp "rtsp://192.168.1.100/stream" --save video --no-display

Output:

Connected successfully! Processing frames...
[2026-03-09 14:30:22] Frame 15: 2 person(s) detected
  Person entered frame! Entry #1
  Started recording clip: output/person_clip_1_20260309_143022_1741528222.mp4

No window appears
Lower resource usage

Save Modes

Image Mode (Snapshots)

Captures a single annotated JPEG when a person first enters the frame.

python main.py --rtsp "rtsp://camera.local/stream" --save image

Behavior:

One snapshot per person entry event
Annotated with bounding boxes and confidence scores
Saved immediately when person detected
Filename includes timestamp and entry counter

Filename format:

person_entry_{entry_number}_{YYYYMMDD_HHMMSS}_{unix_timestamp}.jpg

Example output:

output/
├── person_entry_1_20260309_143022_1741528222.jpg
├── person_entry_2_20260309_144510_1741529110.jpg
└── person_entry_3_20260309_145230_1741529550.jpg

Implementation (stream_processor.py:321-327):

if save_mode == "image":
    filename = (
        f"{self.output_dir}/person_entry_{person_entry_count}"
        f"_{timestamp_str}_{int(time.time())}.jpg"
    )
    self._save_annotated_snapshot(frame, boxes, filename)
    print(f"  Saved snapshot: {filename}")

Annotation function (stream_processor.py:416-434):

@staticmethod
def _save_annotated_snapshot(
    frame: cv2.typing.MatLike,
    boxes: List[Tuple[int, int, int, int, float]],
    filename: str,
) -> None:
    annotated = frame.copy()
    for x, y, w, h, confidence in boxes:
        cv2.rectangle(annotated, (x, y), (x + w, y + h), (0, 255, 0), 2)
        cv2.putText(
            annotated,
            f"Person {confidence:.2f}",
            (x, y - 10),
            cv2.FONT_HERSHEY_SIMPLEX,
            0.5,
            (0, 255, 0),
            2,
        )
    cv2.imwrite(filename, annotated)

When to use image mode:

Counting people entering an area
Logging distinct events
Minimal storage requirements
Quick review of detections

Video Mode (Clips)

Records an MP4 clip for the entire duration a person is present in the frame.

python main.py --rtsp "rtsp://camera.local/stream" --save video

Behavior:

Recording starts when person enters frame
Continues while person is present
Stops after 3 consecutive frames without detection
All frames written at source stream FPS

Filename format:

person_clip_{entry_number}_{YYYYMMDD_HHMMSS}_{unix_timestamp}.mp4

Example output:

output/
├── person_clip_1_20260309_143022_1741528222.mp4  # 45 seconds
├── person_clip_2_20260309_144510_1741529110.mp4  # 12 seconds
└── person_clip_3_20260309_145230_1741529550.mp4  # 67 seconds

Implementation (stream_processor.py:329-339):

elif save_mode == "video":
    clip_filename = (
        f"{self.output_dir}/person_clip_{person_entry_count}"
        f"_{timestamp_str}_{int(time.time())}.mp4"
    )
    h_frame, w_frame = frame.shape[:2]
    fourcc = cv2.VideoWriter.fourcc(*"mp4v")
    video_writer = cv2.VideoWriter(
        clip_filename, fourcc, stream_fps, (w_frame, h_frame)
    )
    print(f"  Started recording clip: {clip_filename}")

Exit detection logic (stream_processor.py:344-355):

elif not has_person and person_present:
    no_person_streak += 1
    print(
        f"  No person detected ({no_person_streak}/{NO_PERSON_EXIT_THRESHOLD})"
    )
    if no_person_streak >= NO_PERSON_EXIT_THRESHOLD:
        person_present = False
        no_person_streak = 0
        if video_writer is not None:
            video_writer.release()
            video_writer = None
            print(f"  Person(s) exited. Saved clip: {clip_filename}")

Exit threshold: 3 consecutive frames without detectionWith frame_skip=15 on a 30fps stream:

Detection runs every 0.5 seconds
3 misses = ~1.5 seconds of no detection
Prevents premature clip termination from brief occlusions

When to use video mode:

Reviewing behavior and movement
Security incident investigation
Understanding context around events
Capturing full interactions

Save Mode Comparison

Image Mode

Pros:

Minimal storage
Fast to review
One file per event
Good for counting

Cons:

No temporal context
Miss behavior details
Single frame only

Video Mode

Pros:

Full context
Review behavior
Continuous recording
Better for security

Cons:

Large file sizes
More storage needed
Slower to review

Detection Flow

Understanding how detection works in single stream mode:

Connect to RTSP stream

cap = cv2.VideoCapture(rtsp_url)
if not cap.isOpened():
    print("Error: Could not connect to RTSP stream")
    return

From stream_processor.py:251-254

Read frames continuously

while True:
    ret, frame = cap.read()
    if not ret:
        # Attempt reconnection
        consecutive_failures += 1
        # ...

From stream_processor.py:280-296

Run detection every Nth frame

if frame_count % frame_skip == 0 and (current_time - last_detection_time) >= 0.5:
    last_detection_time = current_time
    has_person, person_count, boxes = self.detector.detect_persons(frame)

From stream_processor.py:303-305

Default: every 15th frame
Throttled to max 2 fps

Track person presence state

if has_person and not person_present:
    # Person ENTERED frame
    person_present = True
    person_entry_count += 1
    # Save snapshot or start clip

elif not has_person and person_present:
    # Person MAY HAVE EXITED
    no_person_streak += 1
    if no_person_streak >= 3:
        person_present = False
        # Stop clip recording

From stream_processor.py:311-355

Save or display frames

Write frames to video clip if recording
Update display window if enabled
Print status messages

Console Output

Understanding the console output during processing:

Loading person detection model...
CUDA available, using GPU for inference
Model loaded: YOLOv4
Confidence threshold: 0.5
Person area threshold: 1000 pixels

Config loaded from: config.cfg
  model_dir   = model
  output_dir  = output

Connecting to RTSP stream: rtsp://192.168.1.100/stream
Created directory: output
Connected successfully! Processing frames...
Press 'q' to quit

[2026-03-09 14:30:22] Frame 15: No persons
[2026-03-09 14:30:23] Frame 30: No persons
[2026-03-09 14:30:24] Frame 45: 1 person(s) detected
  Person entered frame! Entry #1
  Saved snapshot: output/person_entry_1_20260309_143024_1741528224.jpg

[2026-03-09 14:30:25] Frame 60: 1 person(s) detected
[2026-03-09 14:30:26] Frame 75: 1 person(s) detected
[2026-03-09 14:30:27] Frame 90: No persons
  No person detected (1/3)
[2026-03-09 14:30:28] Frame 105: No persons
  No person detected (2/3)
[2026-03-09 14:30:29] Frame 120: No persons
  No person detected (3/3)
  Person(s) exited frame. Waiting for next entry...

^C
Stopping detection...
Processed 150 frames, captured 1 person snapshot(s)

Key indicators:

Model loading

Model loaded: YOLOv4
CUDA available, using GPU for inference

Confirms which detection model is active and compute backend.

Connection status

Connected successfully! Processing frames...

Stream connection established and frame reading started.

Detection events

[2026-03-09 14:30:24] Frame 45: 1 person(s) detected
  Person entered frame! Entry #1

Person detected with timestamp and entry counter.

Exit tracking

  No person detected (3/3)
  Person(s) exited frame. Waiting for next entry...

Exit confirmation after 3 consecutive frames without detection.

Configuration Overrides

Override config.cfg values for single stream processing:

# Require 70% confidence for detection
python main.py --rtsp "rtsp://camera.local/stream" --save image \
  --confidence 0.7

Automatic Reconnection

Single stream mode includes automatic reconnection on connection loss:

if not ret:
    consecutive_failures += 1
    print(
        f"Failed to read frame (attempt {consecutive_failures}/{max_reconnect_attempts}), reconnecting..."
    )
    cap.release()
    time.sleep(2)
    cap = cv2.VideoCapture(rtsp_url)
    if not cap.isOpened():
        if consecutive_failures >= max_reconnect_attempts:
            print("Max reconnect attempts reached. Giving up.")
            break
        continue
    print("Reconnected successfully.")
    continue

From stream_processor.py:282-296 Reconnection behavior:

Max 5 retry attempts
2 second delay between attempts
Resets counter on successful read
Exits after exhausting retries

Video clips in progress when connection is lost will be saved automatically but may be incomplete.

Real-World Examples

Example 1: Store Entrance Monitoring

Goal: Count customers entering a store

python main.py \
  --rtsp "rtsp://store-camera.local/entrance" \
  --save image \
  --confidence 0.6 \
  --area-threshold 2000 \
  --no-display

Configuration:

Image mode: one snapshot per customer
Higher confidence: reduce false positives
Larger area threshold: only detect close persons (actually entering)
No display: runs in background

Example 2: Security Incident Recording

Goal: Record full video of any activity in restricted area

python main.py \
  --rtsp "rtsp://security-cam.local/restricted" \
  --save video \
  --confidence 0.4 \
  --frame-skip 10

Configuration:

Video mode: capture full behavior
Lower confidence: don’t miss any detections
More frequent checking: faster detection response
With display: monitor in real-time

Example 3: Parking Lot Wide-Angle

Goal: Detect people in large parking area

python main.py \
  --rtsp "rtsp://parking-cam.local/wide" \
  --save image \
  --confidence 0.45 \
  --area-threshold 500 \
  --frame-skip 20

Configuration:

Image mode: event logging
Lower confidence: better for distant detection
Low area threshold: detect small/distant persons
Less frequent: acceptable for slow-moving subjects

Troubleshooting

Cannot connect to stream

Error:

Error: Could not connect to RTSP stream

Possible causes:

Incorrect RTSP URL
Network connectivity issues
Camera authentication required
Firewall blocking connection

Solutions:

Verify URL with VLC or ffplay:
```
vlc rtsp://camera.local/stream
```
Check network connectivity:
```
ping camera.local
```

Add credentials to URL:

rtsp://username:password@camera.local/stream

Display window not showing

Issue: No OpenCV window appearsPossible causes:

--no-display flag set
Headless environment (no X server)
Display environment variable not set

Solutions:

Remove --no-display flag
For SSH: enable X11 forwarding
```
ssh -X user@host
```
Set DISPLAY variable:
```
export DISPLAY=:0
```

No detections happening

Issue: Stream works but no persons detectedDebugging steps:

Check model is loaded:

Model loaded: YOLOv4  # Should see this, not HOG

Lower confidence threshold:

python main.py --rtsp "..." --save image --confidence 0.3

Lower area threshold:

python main.py --rtsp "..." --save image --area-threshold 500

Test with image first:

python main.py --test-image photo.jpg --save image

Frequent reconnections

Issue:

Failed to read frame (attempt 1/5), reconnecting...

Possible causes:

Unstable network
Camera stream issues
Network bandwidth limitations
Router/switch problems

Solutions:

Check network stability
Reduce stream quality at camera
Use wired connection instead of WiFi
Check camera logs for issues

High CPU/GPU usage

Issue: System resources maxed outSolutions:

Increase frame_skip:

--frame-skip 30  # Process 1 fps on 30fps stream

Disable display:
```
--no-display
```
Use HOG instead of YOLO:
- Move YOLO weights out of model directory
- HOG is faster but less accurate
Enable GPU if available:
- See GPU Acceleration Guide

Performance Tips

Optimize frame_skip

Start with frame_skip=30 and decrease until detection responsiveness is acceptable.

Use GPU acceleration

CUDA-enabled OpenCV provides 5-10x speedup. See GPU setup.

Adjust resolution

Configure camera to stream at lower resolution (e.g., 1280×720 instead of 1920×1080).

Tune thresholds

Higher confidence/area thresholds = fewer detections = less processing.

Next Steps

Multi-Stream Processing

Monitor multiple cameras simultaneously

Configuration

Fine-tune detection parameters

GPU Acceleration

Speed up detection with CUDA

Model Setup

Configure detection models

​Overview

​Basic Usage

​Minimal Example

​With uv (Recommended)

​Command Structure

​Display Options

​With Display (Default)

​Without Display (Headless)

​Save Modes

​Image Mode (Snapshots)

​Video Mode (Clips)

​Save Mode Comparison

Image Mode

Video Mode

​Detection Flow

​Console Output

​Configuration Overrides

​Automatic Reconnection

​Real-World Examples

​Example 1: Store Entrance Monitoring

​Example 2: Security Incident Recording

​Example 3: Parking Lot Wide-Angle

​Troubleshooting

​Performance Tips

Optimize frame_skip

Use GPU acceleration

Adjust resolution

Tune thresholds

​Next Steps

Multi-Stream Processing

Configuration

GPU Acceleration

Model Setup

Overview

Basic Usage

Minimal Example

With uv (Recommended)

Command Structure

Display Options

With Display (Default)

Without Display (Headless)

Save Modes

Image Mode (Snapshots)

Video Mode (Clips)

Save Mode Comparison

Detection Flow

Console Output

Configuration Overrides

Automatic Reconnection

Real-World Examples

Example 1: Store Entrance Monitoring

Example 2: Security Incident Recording

Example 3: Parking Lot Wide-Angle

Troubleshooting

Performance Tips

Next Steps