AI & ML

Vision AI: Active Learning with YOLOv8 on ESP32-CAM

How we built an active learning pipeline that continuously improves YOLOv8 object detection accuracy using ESP32-CAM edge devices.

🧑‍💻

Hema Koteswar Naidu

Founder & CEO

December 28, 2025

9 min read

The Edge AI Challenge

Running object detection at the edge presents unique challenges: limited compute, constrained memory, unreliable networks. Our Vision AI platform solves this with a clever split-architecture approach.

Split Architecture

Instead of running the full model on ESP32 (impossible with a 520KB SRAM device), we split the pipeline:

ESP32-CAM: Captures frames, basic preprocessing, MQTT streaming

Edge Server: Runs YOLOv8 inference on GPU

React Dashboard: Real-time visualization and annotation

Active Learning Pipeline

The most innovative part of Vision AI is its active learning loop:

python

class ActiveLearningPipeline:
    def __init__(self, model: YOLO, confidence_threshold: float = 0.3):
        self.model = model
        self.confidence_threshold = confidence_threshold
        self.uncertain_buffer = []
        
    async def process_frame(self, frame: np.ndarray) -> DetectionResult:
        results = self.model(frame)
        
        for detection in results:
            if detection.confidence < self.confidence_threshold:
                # Low confidence = uncertain → queue for human review
                self.uncertain_buffer.append({
                    'frame': frame,
                    'detection': detection,
                    'timestamp': datetime.utcnow()
                })
                
        if len(self.uncertain_buffer) >= 50:
            await self.request_human_annotation()
            
        return DetectionResult(
            detections=results,
            uncertain_count=len(self.uncertain_buffer)
        )
    
    async def request_human_annotation(self):
        # Send uncertain detections to React dashboard for human labeling
        batch = self.uncertain_buffer[:50]
        await mqtt.publish("vision/annotation/request", {
            'batch_id': uuid4(),
            'frames': [encode_frame(f['frame']) for f in batch],
            'predictions': [f['detection'] for f in batch]
        })
        self.uncertain_buffer = self.uncertain_buffer[50:]

Model Retraining Pipeline

Once humans annotate the uncertain detections, the model retrains:

python

class ModelRetrainer:
    def __init__(self, base_model_path: str):
        self.base_model = YOLO(base_model_path)
        self.training_data_path = Path('./datasets/active_learning')
        
    async def retrain(self, new_annotations: list[Annotation]):
        # Add new annotations to training data
        self.add_to_dataset(new_annotations)
        
        # Fine-tune the model
        results = self.base_model.train(
            data=str(self.training_data_path / 'data.yaml'),
            epochs=10,
            imgsz=640,
            batch=16,
            lr0=0.001,  # Lower LR for fine-tuning
            freeze=10,  # Freeze backbone layers
            project='runs/active_learning',
            name=f'retrain_{datetime.now().strftime("%Y%m%d_%H%M")}'
        )
        
        # Validate improvement
        if results.metrics.mAP50 > self.current_map:
            self.deploy_new_model(results.best)
            await mqtt.publish("vision/model/updated", {
                'version': self.version + 1,
                'mAP50': results.metrics.mAP50,
                'improvement': results.metrics.mAP50 - self.current_map
            })

Results

After 3 months of active learning:

Initial mAP50: 72.3%

After 500 annotations: 81.7%

After 2000 annotations: 89.4%

Current mAP50: 91.2%

The model continuously improves without manual data collection — it identifies what it doesn't know and asks for help.

ESP32-CAM Optimizations

To maximize frame rate on the ESP32-CAM:

cpp

void configureCamera() {
    camera_config_t config;
    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    // ... pin configuration
    
    // Optimized for speed over quality
    config.frame_size = FRAMESIZE_VGA;  // 640x480
    config.jpeg_quality = 12;            // 0-63, lower = better quality
    config.fb_count = 2;                 // Double-buffering
    config.fb_location = CAMERA_FB_IN_PSRAM;
    config.grab_mode = CAMERA_GRAB_LATEST;
    
    esp_camera_init(&config);
}

This achieves 15 FPS streaming over WiFi to the edge server — more than sufficient for most security and monitoring applications.

Tags:

Computer Vision

YOLOv8

ESP32

Active Learning

Edge AI

AI & ML

12 min read

Building NEXUS AI OS: Architecture of a 13-Agent Local AI System

Deep dive into the architecture behind NEXUS AI OS — how we orchestrate 13+ specialized AI agents running entirely on-device with Ollama, ChromaDB, and a custom IPC bus.

Ollama

Multi-Agent

🧑‍💻

Hema Koteswar Naidu

Mar 1, 2026

AI & ML

11 min read

Ensemble Learning for Cancer Risk Prediction: Our CancerGuard AI Approach

How we built a HIPAA-aligned cancer risk prediction system using an ensemble of XGBoost, LightGBM, Random Forest, and Neural Networks — with 69 API endpoints.

Hema Koteswar Naidu

Jan 20, 2026

IoT

10 min read

ESP32 + MQTT: Building Smart Home Infrastructure at Scale

How we scaled our home automation from a single ESP32 relay to a production-grade IoT ecosystem with 9+ devices, MQTT mesh networking, and OTA firmware updates.

Hema Koteswar Naidu

Feb 15, 2026

Stay in the Loop

Get engineering insights, project updates, and open source news delivered to your inbox.

No spam. Unsubscribe at any time. Read our Privacy Policy.

Back to Blog

AI & ML

Vision AI: Active Learning with YOLOv8 on ESP32-CAM

How we built an active learning pipeline that continuously improves YOLOv8 object detection accuracy using ESP32-CAM edge devices.

🧑‍💻

Hema Koteswar Naidu

Founder & CEO

December 28, 2025

9 min read

The Edge AI Challenge

Split Architecture

Instead of running the full model on ESP32 (impossible with a 520KB SRAM device), we split the pipeline:

ESP32-CAM: Captures frames, basic preprocessing, MQTT streaming

Edge Server: Runs YOLOv8 inference on GPU

React Dashboard: Real-time visualization and annotation

Active Learning Pipeline

The most innovative part of Vision AI is its active learning loop:

python

class ActiveLearningPipeline:
    def __init__(self, model: YOLO, confidence_threshold: float = 0.3):
        self.model = model
        self.confidence_threshold = confidence_threshold
        self.uncertain_buffer = []
        
    async def process_frame(self, frame: np.ndarray) -> DetectionResult:
        results = self.model(frame)
        
        for detection in results:
            if detection.confidence < self.confidence_threshold:
                # Low confidence = uncertain → queue for human review
                self.uncertain_buffer.append({
                    'frame': frame,
                    'detection': detection,
                    'timestamp': datetime.utcnow()
                })
                
        if len(self.uncertain_buffer) >= 50:
            await self.request_human_annotation()
            
        return DetectionResult(
            detections=results,
            uncertain_count=len(self.uncertain_buffer)
        )
    
    async def request_human_annotation(self):
        # Send uncertain detections to React dashboard for human labeling
        batch = self.uncertain_buffer[:50]
        await mqtt.publish("vision/annotation/request", {
            'batch_id': uuid4(),
            'frames': [encode_frame(f['frame']) for f in batch],
            'predictions': [f['detection'] for f in batch]
        })
        self.uncertain_buffer = self.uncertain_buffer[50:]

Model Retraining Pipeline

Once humans annotate the uncertain detections, the model retrains:

python

class ModelRetrainer:
    def __init__(self, base_model_path: str):
        self.base_model = YOLO(base_model_path)
        self.training_data_path = Path('./datasets/active_learning')
        
    async def retrain(self, new_annotations: list[Annotation]):
        # Add new annotations to training data
        self.add_to_dataset(new_annotations)
        
        # Fine-tune the model
        results = self.base_model.train(
            data=str(self.training_data_path / 'data.yaml'),
            epochs=10,
            imgsz=640,
            batch=16,
            lr0=0.001,  # Lower LR for fine-tuning
            freeze=10,  # Freeze backbone layers
            project='runs/active_learning',
            name=f'retrain_{datetime.now().strftime("%Y%m%d_%H%M")}'
        )
        
        # Validate improvement
        if results.metrics.mAP50 > self.current_map:
            self.deploy_new_model(results.best)
            await mqtt.publish("vision/model/updated", {
                'version': self.version + 1,
                'mAP50': results.metrics.mAP50,
                'improvement': results.metrics.mAP50 - self.current_map
            })

Results

After 3 months of active learning:

Initial mAP50: 72.3%

After 500 annotations: 81.7%

After 2000 annotations: 89.4%

Current mAP50: 91.2%

The model continuously improves without manual data collection — it identifies what it doesn't know and asks for help.

ESP32-CAM Optimizations

To maximize frame rate on the ESP32-CAM:

cpp

void configureCamera() {
    camera_config_t config;
    config.ledc_channel = LEDC_CHANNEL_0;
    config.ledc_timer = LEDC_TIMER_0;
    config.pin_d0 = Y2_GPIO_NUM;
    // ... pin configuration
    
    // Optimized for speed over quality
    config.frame_size = FRAMESIZE_VGA;  // 640x480
    config.jpeg_quality = 12;            // 0-63, lower = better quality
    config.fb_count = 2;                 // Double-buffering
    config.fb_location = CAMERA_FB_IN_PSRAM;
    config.grab_mode = CAMERA_GRAB_LATEST;
    
    esp_camera_init(&config);
}

This achieves 15 FPS streaming over WiFi to the edge server — more than sufficient for most security and monitoring applications.

Tags:

Computer Vision

YOLOv8

ESP32

Active Learning

Edge AI

AI & ML

12 min read

Building NEXUS AI OS: Architecture of a 13-Agent Local AI System

Deep dive into the architecture behind NEXUS AI OS — how we orchestrate 13+ specialized AI agents running entirely on-device with Ollama, ChromaDB, and a custom IPC bus.

Ollama

Multi-Agent

🧑‍💻

Hema Koteswar Naidu

Mar 1, 2026

AI & ML

11 min read

Ensemble Learning for Cancer Risk Prediction: Our CancerGuard AI Approach

How we built a HIPAA-aligned cancer risk prediction system using an ensemble of XGBoost, LightGBM, Random Forest, and Neural Networks — with 69 API endpoints.

Hema Koteswar Naidu

Jan 20, 2026

IoT

10 min read

ESP32 + MQTT: Building Smart Home Infrastructure at Scale

How we scaled our home automation from a single ESP32 relay to a production-grade IoT ecosystem with 9+ devices, MQTT mesh networking, and OTA firmware updates.

Hema Koteswar Naidu

Feb 15, 2026

Stay in the Loop

Get engineering insights, project updates, and open source news delivered to your inbox.

No spam. Unsubscribe at any time. Read our Privacy Policy.

Vision AI: Active Learning with YOLOv8 on ESP32-CAM

The Edge AI Challenge

Split Architecture

Active Learning Pipeline

Model Retraining Pipeline

Results

ESP32-CAM Optimizations

Related Articles

Building NEXUS AI OS: Architecture of a 13-Agent Local AI System

Ensemble Learning for Cancer Risk Prediction: Our CancerGuard AI Approach

ESP32 + MQTT: Building Smart Home Infrastructure at Scale

Stay in the Loop

Vision AI: Active Learning with YOLOv8 on ESP32-CAM

The Edge AI Challenge

Split Architecture

Active Learning Pipeline

Model Retraining Pipeline

Results

ESP32-CAM Optimizations

Related Articles

Building NEXUS AI OS: Architecture of a 13-Agent Local AI System

Ensemble Learning for Cancer Risk Prediction: Our CancerGuard AI Approach

ESP32 + MQTT: Building Smart Home Infrastructure at Scale

Stay in the Loop