M1 · Computer vision — NTH Bootcamp

M1 · here

Camera + CV

→

→

→

→

01How to process an image for a neuroprosthetic device

A cortical visual prosthesis turns a camera feed into electrical stimulation of the visual cortex. This page is the first step — module M1 below — and decides which parts of the image matter.

End-to-end overview

Schematic illustration of a cortical visual neuro-prosthesis: a head-mounted camera captures the visual environment, a mobile processor converts it into a stimulation pattern, and an electrode array implanted in the primary visual cortex (V1) evokes a percept of phosphenes that sketches the original scene. — A camera captures the scene, a mobile computer processes the image, the implanted electrode array stimulates the matching cluster of neurons in primary visual cortex (V1), and the user perceives a pattern of phosphenes that sketches the original scene. Schematic from A. Lozano, *Brain & the Chip II — Design lessons for next-generation vision implants* (Elche, 2024).

02Pixel-level operators

Two strategies for finding structure in an image: thresholding (pick a brightness cutoff and split light from dark) and edge detection (find where brightness changes sharply — the outline of a person, the side of a bus, the lane markings on the road). Threshold is technically not edge detection, but it lives in the same toolbox: both turn a continuous greyscale image into a sparse mask. Try each method, drag the sliders, and watch the right panel update live.

How each algorithm works — in plain words

Click a tab to read about that method. Each one answers the same question — "where does the brightness change?" — in a different way.

Look at each pixel. If it's brighter than some level, paint it white. Otherwise paint it black. That's it.

Works great when the thing you want is much darker (or much brighter) than the background — like black text on white paper. Fails when everything is similar in brightness.

Threshold level — where to make the cut. 0 = everything white, 255 = everything black.
Otsu — let the computer guess a good level by looking at how bright/dark the whole image is.
Adaptive — pick a different level for each small region of the photo. Useful when lighting is uneven (one side of the image is sunny, the other is in shadow).

Upload image Loading OpenCV.js …

Method

Gaussian blur σ 0.0

Canny low threshold 20

Canny high threshold 80

Sobel kernel 3

Sobel cutoff 40

Threshold level 128

Mode

Original—

Edge map—

03YOLO — finding the objects

Edge detection finds where brightness changes. It does not know what those edges are. YOLO does: it's a neural network that has been shown millions of photos and learned to recognise everyday things — people, cars, buses, dogs — and to draw a precise outline around each one.

How YOLO works — in plain words

YOLO stands for "You Only Look Once". It scans the whole image in a single pass, splits it into a grid, and for every grid cell guesses three things at the same time:

Is there something here? (yes / no, plus a confidence score)
What is it? (person, car, bus, bicycle, dog, …)
Where exactly does it begin and end? (a box around it; for the segmentation version, a per-pixel mask)

Below is the output of the real YOLOv8 model run on the bus scene. Press Play to watch each object's mask appear, one by one, in the order they were detected (biggest first).

Upload your own image

04Try it live — your camera

Now run the same techniques on your own webcam feed in real time. Click Start camera and choose between the pixel-level operators (Sobel, Canny, Threshold) or object detection (YOLO).

Camera off. Click "Start camera" when you're ready.

Live camera—

Processed—

05Self-check

Predict the answer first, then verify with the playground above.

Q1. You raise Canny's high threshold without touching the low threshold. What happens to the edge map?

Fewer edges survive. Canny first marks pixels with gradient above the high threshold as strong edges, then traces neighbours above the low threshold. Raising the high threshold throws out borderline edges, so you keep only the most contrasted contours. Useful when the scene is noisy.

Q2. Sobel returns a magnitude image, not a binary mask. Why does that matter for stimulation?

A magnitude map says "this edge is stronger than that one", which gives the prosthesis a per-pixel intensity to drive amplitude with. A binary mask says only "edge or not", losing the gradient information. Sobel is therefore a friendlier activation-map source for amplitude modulation; Canny is better when you want sparse, well-defined contours.

Q3. The YOLO demo runs entirely in your browser. What does the model actually return for the bus photo?

A list of detections: each one has a class label (bus, person, skateboard…), a confidence score (0–1), and an axis-aligned bounding box in pixels. The page shows boxes; the underlying network also outputs class probabilities for every box it considers, plus an "objectness" score that gates whether the box is reported at all.

06Where to next

Next module: M2 — Gaze & DeepGaze, where the camera feed meets a model of where a person looks.

Tools & references

tool OpenCV.js — the in-browser build of OpenCV behind the Sobel / Canny / threshold operators in §02.
tool TensorFlow.js with the COCO-SSD model — the in-browser object detector used in §03 and the live webcam.
paper Redmon, Divvala, Girshick & Farhadi (2016), You Only Look Once: Unified, Real-Time Object Detection. doi:10.1109/CVPR.2016.91 — the YOLO family this module borrows its framing from.
figure de Ruyter van Steveninck et al. (2022), End-to-end optimization of prosthetic vision, Journal of Vision 22(2):20. doi:10.1167/jov.22.2.20 (CC BY 4.0) — source of the pipeline figure in §01.

M1 · Computer vision — interactive