Computer Vision is a field of Artificial Intelligence (AI) that trains computers and systems to glean meaningful information from digital images, videos and other visual inputs — and act on or make recommendations based on that information. Computer Vision essentially allows computers to ‘see’, observe and understand. The process works to mimic human vision. Humans however do have a head start – benefitting from a lifetime of context that underpins their vision, helping them to identify objects, their positioning and recognise if something is wrong with an image. How does it work? Computer Vision requires lots of data to train systems. The data is analysed over and over until it can make distinctions and recognise images. The process utilises two technologies to achieve this: a convolutional neural network (CNN) and a type of machine learning called deep learning. Machine Learning uses algorithmic models that enable a computer to teach itself about the context of visual data and ultimately identify images unassisted, rather than being programmed to recognise an image. For example, instead of training systems to look for whiskers, long ears and a fluffy tail to recognise a bunny, programmers would feed the machine millions of photos of bunnies, and the model would learn on its own the features that make up a bunny and eventually be able to differentiate it from other images. The CNN technology helps a machine or deep learning model to break down images to a pixel level, enabling them to ‘look’. Pixels are labelled and are then used to perform convolutions (a mathematical operation on two functions to produce a third function) and make predictions based on what it is seeing. The CNN then checks the accuracy of its predictions in a series of repetitions until the predictions start to come true. The process can be likened to approaching a jigsaw puzzle. Neural networks view the image components, identify the edges and simple shapes and then begins to fill in the rest of the information by using filtering and a series of actions through deep network layers, such as predicting. Through this, you can start to piece all the parts of the image together. What can it do? Computer Vision is not a new technology; the first experiments with Computer Vision started in the 1950s to interpret typewritten and handwritten text. Nowadays Computer Vision has a number of already-established functions including: Image classification – viewing an image and being able classify it (a flower or a dog). It can also accurately predict that an image belongs to a certain class. For example, it could be used to recognise and filter images uploaded by social media users. Object Tracking – computers follows an object once it’s been detected, often used with images in sequence or as a video feed. For example, self-drive vehicles not only need to classify and detect objects such as people and other cars, but they also need to be able to track them in motion to avoid collisions. Content-based image retrieval – uses Computer Vision to browse, search and retrieve images from large data stores, based on the content of the images rather than metadata tags associated with them. These established tasks are being harnessed across numerous sectors and industries, often to enhance the consumer experience, reduce costs and increase security. A few notable examples include augmented reality, automotive, facial recognition and healthcare. Advancements in the sector Advances in the Computer Vision field have been astounding. Accuracy rates for object identification and classification have gone from 50 percent to 99 percent in less than a decade — and many of today’s systems are more accurate than humans at quickly detecting and reacting to visual inputs. New innovations that employ Computer Vision are appearing all the time, with industries utilising the technology to improve and advance their work. In the last couple of months an ‘intelligent sensing solutions’ company has launched a driver monitoring system (DMS), designed to indicate if you’re drowsy or distracted while driving. Research has revealed that 80 per cent of US accidents are caused by distracted driving in the 3 seconds before the collision. The system monitors the driver’s state in real-time, using AI and Computer Vision, to monitor factors including gaze vector, blink rate, and eye openness for signs of drowsiness and distraction. It will also detect actions such as wearing a seatbelt, holding a cell phone, smoking and wearing a face mask. The global market for AI in Computer Vision is expanding rapidly and is predicted to reach $73.7 billion by 2027, and we are likely to see it increasingly filter into our daily lives. If you’re looking for your next Data & Analytics role, or to build out your data team, we can help. Take a look at our latest opportunities or get in touch with one of our expert consultants to find out more.