Data science & AI · Reference

What is computer vision?

Computer vision is the field of artificial intelligence concerned with enabling computers to interpret and understand visual information from images and video, extracting meaningful descriptions of their content.

What computer vision does

Computer vision seeks to give machines a useful interpretation of visual input — to move from raw pixels to descriptions such as "this image contains a pedestrian on the left". The challenge is that the same object can appear under endless variations of lighting, angle, scale, and occlusion, while different objects can look superficially similar. Computer vision develops methods that are robust to this variation, drawing on deep learning, geometry, and signal processing to bridge the gap between pixels and meaning.

Core tasks

Image classification assigns a whole image to a category. Object detection locates and labels multiple objects with bounding boxes. Segmentation labels image regions at the pixel level — semantic segmentation by class, instance segmentation by individual object.

Other tasks include face recognition, pose estimation, optical character recognition, and motion tracking in video. Progress on these has been driven by large labelled datasets and convolutional neural networks, with the 2012 ImageNet result a landmark turning point.

How modern systems work

Most current computer-vision systems use deep neural networks — historically convolutional neural networks, and increasingly transformer-based vision models — trained on large labelled image datasets. The network learns visual features directly from data rather than relying on hand-designed feature detectors, which is why performance improved sharply once enough data and compute became available. Benchmarks such as ImageNet provide shared yardsticks for comparing methods.

Computer vision in research

Computer vision is a research tool across many sciences — analysing microscopy images, satellite imagery, medical scans (in research settings), and ecological camera-trap data. Methodologically, results depend on representative training data and rigorous evaluation; models can fail on images unlike their training set and can encode dataset biases. Reproducibility requires reporting datasets, architectures, and evaluation protocols, and outputs are validated rather than assumed correct.

Key facts

At a glance

Field: subfield of artificial intelligence
Goal: interpret images and video to describe their content
Classification: assign a whole image to a category
Detection: locate and label objects with bounding boxes
Segmentation: label image regions at the pixel level
Modern basis: deep neural networks (CNNs and vision transformers)

Common questions

FAQ

What are the main computer vision tasks?+

Key tasks include image classification (labelling a whole image), object detection (locating and labelling objects with bounding boxes), and segmentation (labelling regions at the pixel level). Others include face recognition, pose estimation, and motion tracking.

How does computer vision work?+

Modern computer vision uses deep neural networks trained on large labelled image datasets. The network learns visual features directly from the data, rather than relying on hand-designed detectors, and uses them to interpret new images.

What is the difference between classification and detection?+

Image classification assigns a single label to a whole image, whereas object detection finds multiple objects within an image and marks each with a location, typically a bounding box, alongside its label.

Going deeper

Related on CASRAI

Sources

The step most authors miss

Doing CRediT right? Don’t stop at the statement.

A CRediT statement credits you inside one paper. The recognition CRediT was built for happens when those roles are tied to you, persistently. Sign in with your ORCID — free — and claim your CRediT contributions on casrai.org, the home of the standard. They become a verified, portable part of your identity, not a line that disappears into one PDF.

Free: claim your contributions, then export a journal-ready CRediT statement, schema.org structured data, JATS XML, CSV or BibTeX — and preview your public profile. A membership publishes that profile publicly and verifies the journals you serve.