Machine learning-powered algorithms are vulnerable to three kinds of adversarial attacks. They include:
Poisoning attacks are used to corrupt the data on which a model trains, by introducing maliciously designed samples in the training set. Hence, we may consider poisoning to be the adversarial contamination of data, used to reduce the performance of a model during deployment.
This type of contamination may also occur during re-training, as ML systems often rely on data collected while they’re in operation.
Poisoning attacks usually come in two nuances. Some target the model’s availability, while others its integrity.
Availability attacks. The concept behind availability attacks is pretty simple. The purpose is to feed so much bad data into a system that it loses most of its accuracy, thus becoming obsolete. While availability attacks might be unsophisticated, they are broadly used and, unfortunately, lead to disastrous outcomes.
Integrity attacks. Integrity poisoning, also known as a backdoor attack is much more sophisticated. The goal of these attacks is to cause the model to associate a specific “backdoor pattern” with a “clean target label.” This way, whenever the attacker plans on inserting malware into a model, they just need to include the backdoor pattern to get an easy pass.
For example, imagine a company asking a new employee to submit his photo ID. Their photo will be fed to a facial recognition control system for security purposes. However, if the employee provides a “poisoned” photo, the system will associate the malicious pattern with a clear pass, thus creating a backdoor for future attacks.
While your classifier might still function the way it should, it will be completely exposed to further attacks. As long as the attacker inserts the backdoor string into a file, they will be able to send it across without raising any suspicions. You can imagine how this might play out in the end.
Backdoor attacks are very difficult to detect since the model’s performance remains unchanged. As such, data poisoning can cause substantial damage with minimal effort.
An evasion attack happens when an adversarial example is carefully tailored to look genuine to a human, but completely different to a classifier.
These types of attacks are the most prevalent and, hence, the most researched ones. They are also the most practical types of attacks since they’re performed during the deployment phase, by manipulating data to deceive previously trained classifiers. As such, evasion doesn’t have any influence on the training data set. Instead, samples are modified to avoid detection altogether.
For example, to evade analysis by anti-spam models, attackers can embed the spam content within an attached image. The spam is thus obfuscated and classified as legitimate.
The third type of adversarial attack is model stealing or model extraction. In this particular case, the attacker will probe a black-box ML system with the goal of reconstructing the model or extracting the data it was trained on.
Model extraction can be used, for example, if the attacker wishes to steal a prediction model that can be used for their own benefit, let’s say a stock market prediction model.
Extraction attacks are especially delicate considering the adjacent data theft involved. Not only do you lose exclusivity to your ML model, but given the sensitive and confidential nature of data, it might lead to additional hardships.
White-box and black-box attacks. On top of the classification above, adversarial attacks can be further subcategorized as being white-box or black-box. During a white-box attack, the attacker has complete access to the target model, its architecture and the model parameters. In a black-box attack, he does not.
Making ML models more robust
While there are no techniques that guarantee 100 percent protection against adversarial attacks, some methods can provide a significant increase in defense.
Adversarial training. Adversarial training is a brute-force solution. Simply put, it involves generating a lot of adversarial examples and explicitly training the model not to be fooled by them.
However, there is only so much you can feed a model in a given time frame, and the list of adversarial attacks is, unfortunately, not an exhaustive one.
Defensive distillation. As opposed to adversarial training, defensive distillation adds some flexibility to the equation. Distillation training employs the use of two different models.
Model 1: The first model is trained with hard labels to achieve maximum accuracy. Let’s consider a biometric scan, for example. We train the first system, requiring a high probability threshold. Subsequently, we use it to create soft labels, defined by a 95 percent probability that a fingerprint will match the scan on record. These lower accuracy variations are then used to train the second model.
Model 2: Once trained, the second model will act as an additional filter. Even though the algorithm will not match every single pixel in a scan (that would take too much time), it will know which variations of an incomplete scan have a 95 percent probability of matching the fingerprint on record.
To sum up, defensive distillation provides protection by making it more difficult for the scammer to artificially create a perfect match for both systems. The algorithm becomes more robust and can easier spot spoofing attempts.
The constant effort which goes into AI research is ever-growing. Slowly, but steadily, machine learning is becoming a core element in the value proposition of organizations worldwide. At the same time, the need to protect these models is growing just as fast.
Meanwhile, governments around the globe have also started to implement security standards for ML-driven systems. In its effort to shape the digital future, the European Union has also released a complete checklist meant to assess the trustworthiness of AI algorithms: ALTAI.
Big industry names such as Google, Microsoft and IBM have already started to invest in developing ML models, but also in securing them against adversarial attacks. Have you raised your defenses?