A backyard feeder in Ontario, Canada, watched end-to-end by hardware and software I built and run at home. A camera catches each visitor; a vision pipeline figures out what it is and logs the sighting — with the species and the model’s confidence baked right into the photo. The two most recent are always up top:

The stack

  • Firmware — an ESP32-S3 camera running custom firmware that streams MJPEG over the local network and survives the realities of an outdoor wifi link.
  • Vision pipeline — a home GPU server pulls the stream and runs YOLOv8n on every frame to find the critter and box it, then names it with BioCLIP, an open biology vision model, constrained to a hand-curated list of species that actually turn up at an Ontario backyard feeder — chickadees, titmice, nuthatches, cardinals, jays, finches, woodpeckers, and the chipmunks and squirrels that crash the party. Below a confidence threshold the guess is held back rather than shown wrong. No cloud vision API, no third-party service.
  • Web app — a Django app that records every visit and serves the live log this page reads from.

More recent visitors

Self-hosted on a home server behind a tunnel.