The Edge AI Challenge
Running object detection at the edge presents unique challenges: limited compute, constrained memory, unreliable networks. Our Vision AI platform solves this with a clever split-architecture approach.
Split Architecture
Instead of running the full model on ESP32 (impossible with a 520KB SRAM device), we split the pipeline:
Active Learning Pipeline
The most innovative part of Vision AI is its active learning loop:
class ActiveLearningPipeline:
def __init__(self, model: YOLO, confidence_threshold: float = 0.3):
self.model = model
self.confidence_threshold = confidence_threshold
self.uncertain_buffer = []
async def process_frame(self, frame: np.ndarray) -> DetectionResult:
results = self.model(frame)
for detection in results:
if detection.confidence < self.confidence_threshold:
# Low confidence = uncertain → queue for human review
self.uncertain_buffer.append({
'frame': frame,
'detection': detection,
'timestamp': datetime.utcnow()
})
if len(self.uncertain_buffer) >= 50:
await self.request_human_annotation()
return DetectionResult(
detections=results,
uncertain_count=len(self.uncertain_buffer)
)
async def request_human_annotation(self):
# Send uncertain detections to React dashboard for human labeling
batch = self.uncertain_buffer[:50]
await mqtt.publish("vision/annotation/request", {
'batch_id': uuid4(),
'frames': [encode_frame(f['frame']) for f in batch],
'predictions': [f['detection'] for f in batch]
})
self.uncertain_buffer = self.uncertain_buffer[50:]Model Retraining Pipeline
Once humans annotate the uncertain detections, the model retrains:
class ModelRetrainer:
def __init__(self, base_model_path: str):
self.base_model = YOLO(base_model_path)
self.training_data_path = Path('./datasets/active_learning')
async def retrain(self, new_annotations: list[Annotation]):
# Add new annotations to training data
self.add_to_dataset(new_annotations)
# Fine-tune the model
results = self.base_model.train(
data=str(self.training_data_path / 'data.yaml'),
epochs=10,
imgsz=640,
batch=16,
lr0=0.001, # Lower LR for fine-tuning
freeze=10, # Freeze backbone layers
project='runs/active_learning',
name=f'retrain_{datetime.now().strftime("%Y%m%d_%H%M")}'
)
# Validate improvement
if results.metrics.mAP50 > self.current_map:
self.deploy_new_model(results.best)
await mqtt.publish("vision/model/updated", {
'version': self.version + 1,
'mAP50': results.metrics.mAP50,
'improvement': results.metrics.mAP50 - self.current_map
})Results
After 3 months of active learning:
The model continuously improves without manual data collection — it identifies what it doesn't know and asks for help.
ESP32-CAM Optimizations
To maximize frame rate on the ESP32-CAM:
void configureCamera() {
camera_config_t config;
config.ledc_channel = LEDC_CHANNEL_0;
config.ledc_timer = LEDC_TIMER_0;
config.pin_d0 = Y2_GPIO_NUM;
// ... pin configuration
// Optimized for speed over quality
config.frame_size = FRAMESIZE_VGA; // 640x480
config.jpeg_quality = 12; // 0-63, lower = better quality
config.fb_count = 2; // Double-buffering
config.fb_location = CAMERA_FB_IN_PSRAM;
config.grab_mode = CAMERA_GRAB_LATEST;
esp_camera_init(&config);
}This achieves 15 FPS streaming over WiFi to the edge server — more than sufficient for most security and monitoring applications.