Microsoft’s Latest Security Feature Can Detect Delusions in Its Users’ AI Applications

In an interview with The Verge, Sarah Bird, Microsoft’s chief product officer for responsible AI, said that her team has created a number of new safety measures that are simple to use for Azure users who aren’t recruiting teams of red teamers to test the AI services they’ve developed. According to Microsoft, Azure AI users working with any model housed on the platform can use these LLM-powered capabilities to identify potential vulnerabilities, keep an eye out for hallucinations “that are plausible yet unsupported,” and instantly stop dangerous cues.

“We know that customers don’t all have deep expertise in prompt injection attacks or hateful content, so the evaluation system generates the prompts needed to simulate these types of attacks. Customers can then get a score and see the outcomes,” she says.

In the recent cases of explicit celebrity fakes (Microsoft’s Designer image generator), historically incorrect images (Google Gemini), and Mario piloting a jet toward the Twin Towers (Bing), generative AI controversies caused by unwanted or unintentional responses can be avoided with this guidance.

The following three features are currently available in preview on Azure AI: Groundedness Detection, which detects and blocks hallucinations; Prompt Shields, which blocks malicious prompts from external documents that instruct models to go against their training; and safety evaluations, which evaluate model vulnerabilities. Soon, two more tools will be available to guide models toward safe outputs and track prompts to identify users who might be a problem.

The Azure AI Studio’s content filter settings are shown in this screenshot sample. In addition to deciding what to do when anything is detected, these options guard against prompt attacks and improper content.

Before sending it to the model to respond, the monitoring system will check to see whether it triggers any forbidden words or contains hidden prompts, regardless of whether the user is entering in a prompt or if the model is processing third-party data. The system then examines the model’s response to see if the model saw something that wasn’t in the prompt or the document.

In the case of the Google Gemini photographs, Microsoft claims that its Azure AI technologies will provide more individualized control in this area because filters meant to lessen prejudice had unexpected consequences. Bird and her team developed a feature that allows Azure users to toggle the filtering of hate speech and violence that the model detects and blocks. This was done in response to concerns expressed by Bird that Microsoft and other firms would be choosing what is or isn’t appropriate for AI models.

those of Azure will eventually be able to obtain a report on those who try to cause dangerous outputs. According to Bird, this enables system administrators to distinguish between users who may have more malevolent intent and those who are part of their own red team.

According to Bird, the safety features are “attached” right away to the GPT-4 and other well-known models, such as the Llama 2. However, users of smaller, less popular open-source systems might need to manually link the safety features to the models because Azure’s model garden has a large number of AI models.

With more customers showing interest in using Azure to access AI models, Microsoft has been utilizing AI to strengthen the safety and security of its software. The business has also made an effort to increase the quantity of potent AI models it offers; most recently, it signed an exclusive agreement with the French AI startup Mistral to make the Mistral Large model available on Azure.

You might also like

Leave a Reply

Your email address will not be published. Required fields are marked *