A camera watches a backyard feeder in Ontario, Canada, and a computer names every visitor. Here are the two most recent — each photo carries the species and the model’s confidence baked right into the frame.

How it works

Every time something shows up — bird or otherwise — a vision pipeline running on a home server figures out what it is and logs the visit.

  • An ESP32-S3 camera at the feeder streams MJPEG over the local network.
  • A home GPU box pulls that stream and runs YOLOv8n on every frame to find the critter and draw a box around it.
  • The boxed crop goes to BioCLIP, an open biology vision model, constrained to a hand-curated list of species that actually turn up at an Ontario backyard feeder — chickadees, titmice, nuthatches, cardinals, jays, finches, woodpeckers, and the chipmunks and squirrels that crash the party.
  • Below a confidence threshold the guess is held back rather than shown wrong. Everything above it gets a name, a confidence score, and a spot in the log.

No cloud vision API, no third-party service — the whole pipeline runs on hardware in the house.

More recent visitors