On the Brittleness of Deep Learning Models
Chen, S.-T. et al. 2019.
Deep learning models are widely used to solve computer vision problems, from facial recognition to object classification and scene understanding. However, research on adversarial examples has shown that these models are brittle: small, often imperceptible changes to an image can induce misclassifications, which has security implications for a wide range of image-processing systems. We propose an efficient algorithm to generate small perturbations that cause misclassifications for deep learning models. Our method achieves state-of-the-art performance while requiring much less computation than previous algorithms (~100x times speedup) which opens the possibility of using it at larger scales and, in particular, to design defense mechanisms. Keywords: Machine Learning, Security, Deep Learning, Computer Vision, Adversarial Attacks
Adversarial Attacks
Deep Neural Networks (DNNs) have achieved state-of-the-art performance for a wide range of computer vision applications. However, these DNNs are susceptible to active adversaries. Most notably, they are susceptible to adversarial attacks, in which small changes, often imperceptible to a human observer cause a misclassification (i.e. an error) [1, 2]. This has become a big concern as more and more DNNs are deployed in the real world and, soon, in safety-critical environments such as autonomous cars, drones, the medical field, etc.
In a typical image classification scenario, we use a model which is a function that maps an image (a vector of values between 0 and 1) to a set of scores. Each one of these scores represents the likelihood that the object in the image belongs to a certain class (e.g. cat, dog, car, plane, person, table, etc.). The maximum scoring class is the one predicted as being present in the image. In the image on the left of the Figure, a fairly good model (Inception V3 trained on ImageNet [3]) can predict with high confidence that the image contains a dog, and even determine the breed: Curly-Coated Retriever. We expect the model to be robust to subtle changes in the image; that is, if we slightly modify the values of some pixels, the prediction will not significantly change. However, if we add a perturbation (centre) to the original image, we obtain a new image (right) classified, with high confidence, as a Microwave. The security implications of this behaviour become obvious as it means that the model is not robust to carefully crafted perturbations that would not affect the decision of a human observer. See https://adversarial-ml-tutorial.org/ for more technical details.
Perturbation Generation
Being able to generate these perturbations in an efficient way to force an error from a model is important for two main reasons. First, most of the current work in deep learning is evaluated on the average case: a set of images is held out during training (i.e. development) of the model to evaluate the final performance. However, this set might not evaluate the model for the worst-case scenario—a problem for safety-critical applications. Second, having an efficient way to generate these perturbations means that we can use them for the training phase to obtain a more robust model.
An algorithm to generate such perturbations is called an adversarial attack. In this work, the objective of this adversarial attack is to produce a minimal perturbation that causes a misclassification for a given image and model. The size of the perturbation is measured in terms of Euclidean norm of the pixel values. As a reference, the size of the perturbation added to the image in the figure is 0.7 and is imperceptible to a human observer. The performance of the adversarial attacks is measured in terms of average size of the perturbation and total run-time on a given hardware configuration for a set of 1,000 images. For both measures, lower is better. The following table shows that our algorithm (called DDN) outperforms the previous state-of-the-art approach by a large margin both in terms of norms and run-times.
Conclusion
In conclusion, we created an adversarial attack that can generate small perturbations to cause misclassifications in DNNs in a much more efficient way than the previous state-of-the-art approach. This algorithm is especially useful to evaluate the robustness of current and future machine learning models that are increasingly used in real-world applications. The performance of this algorithm also makes it possible to use it to design more effective defense mechanisms.
Additional Information
For more information on this research, please refer to the following conference article:
Rony, Jérome; Hafemann, Luiz G.; Oliveira, Luiz S.; Ben Ayed, I.; Sabourin, Robert; Granger, Éric. 2019. “Decoupling Direction and Norm for Efficient Gradient-Based L2 Adversarial Attacks and Defenses”, presented at the IEEE Conference on Computer Vision and Pattern Recognition, June 16-20. Long Beach. pp. 4322-4330.