Imagine you’ve pasted your notes from a meeting with your radiologist into an artificial-intelligence chatbot and asked it to summarize them. A stranger later prompts that same generative-AI chatbot to enlighten them about their cancer concerns, and some of your supposedly private conversation is spit out to that user as part of a response.
Concerns about such potential breaches of privacy are very much top of mind these days for many people as the popularity of generative-AI chatbots spreads. The big question here is: Is it safe to share personal information with these chatbots?
The short answer is that there is always a risk that information you share will be exposed in some way. But there are ways to limit that risk.
To understand the concerns, it’s helpful to think about how those tools are “trained”—how they are initially fed massive amounts of information from the internet and other sources and can continue to gather information from their interactions with users to potentially make them smarter and more accurate.
As a result, when you ask an AI chatbot a question, its response is based partly on information that includes material dating back to long before there were rules around internet data usage and privacy. And even more-recent source material is full of people’s personal information that’s scattered across the web.
That leaves lots of opportunity for private information to have been hoovered up into the various generative-AI chatbots’ training materials—information that could unintentionally appear in someone else’s conversation with a chatbot or be intentionally hacked or revealed by bad actors through crafty prompts or questions.
“We know that they were trained on a vast amount of information that can, and likely does, contain sensitive information,” says Ramayya Krishnan, faculty director of the Block Center for Technology and Society and dean of the Heinz College of Information Systems and Public Policy at Carnegie Mellon University.
One major problem, Krishnan says, is that nobody has done an independent audit to see what training data is used. “A lot of the evidence comes from academics hacking the guardrails and showing that private information is in the training data,” he says. “I certainly know of attacks that prove there is some sensitive data in the training models.”
Moreover, the professor adds, once an AI tool is deployed, it generally continues to train on users’ interactions with it, absorbing and storing whatever information they feed it.
On top of that, in some cases human employees are reading some conversations users have with chatbots. This is done in part to catch and prevent inappropriate behavior and to help with accuracy and quality control of the models, experts say, as well as for deciding which subset of conversations the companies want the AI to use for training.
Things happen
Worries over privacy aren’t theoretical. There have been reported instances when confidential information was unintentionally released to users. Last March, OpenAI revealed a vulnerability that allowed some users of ChatGPT to see the titles of other users’ chats with the tool and may have also briefly exposed payment-related data of some users, including email addresses and the last four digits of credit-card numbers, as well as credit-card expiration dates. That was a result of a bug in some open-source software (meaning it’s available free for anyone to view, modify and deploy) that was used in the tool’s training.
Chatbots are also vulnerable to intentional attacks. For instance, some researchers recently found easy ways to get around guardrails and unearth personal information gathered by large language models, including emails.
The ChatGPT vulnerability “was quickly patched,” Krishnan notes. “But the point is, these AI software systems are complex and built on top of other software components, some of which are open-source, and they include vulnerabilities that can be exploited.” Similar vulnerabilities are inherent to large language models, says Irina Raicu, director of the internet ethics program at the Markkula Center for Applied Ethics at Santa Clara University.
Privacy concerns are great enough that several companies have restricted or banned the use of AI chatbots by their employees at work. “If major companies are concerned about their privacy, if they are unsure about what’s going on with their data, that tells us that we should be cautious when sharing anything personal,” says Raicu.
There’s not much to be done about what’s already in the chatbot models, Raicu says, “but why would you risk having your private information getting out there by typing new data like that into the model?”
Just don’t
Chatbot creators have taken some steps to protect users’ privacy. For instance, users can turn off ChatGPT’s ability to store their chat history indefinitely via the very visible toggle on its home page. This isn’t foolproof protection against hackers—the site says it will still store the new chats of users who choose this option for 30 days—but it clearly states that the chats won’t be used to train the model.
Bard requires users to log into Bard.Google.com then follow a few steps to delete all chat activity as a default. Bing users can open the chatbot webpage, view their search history on the right side of the page, then delete the individual chats they want removed, a Microsoft spokesman says. “However, at this time, users cannot disable chat history,” he says.
But the best way for consumers to protect themselves, experts say, is to avoid sharing personal information with a generative AI tool and to look for certain red flags when conversing with any AI.
Some red flags include using a chatbot that has no privacy notice. “This is telling you that the governance necessary isn’t as mature as it should be,” says Dominique Shelton Leipzig, a privacy and cybersecurity partner at law firm Mayer Brown.
Another is when a chatbot asks for more personal information than is reasonably necessary. “Sometimes to get into an account, you need to share your account number or a password and answer some personal questions, and this is not unusual,” Shelton Leipzig says. “Being asked to share your Social Security number is something different. Don’t.” She also says it’s unwise to discuss anything personal with a chatbot that you’ve never heard of, since the whole system could be run by bad actors.
Santa Clara University’s Raicu warns against inputting specific health conditions or financial information into a general-use chatbot, since most chatbot companies are clear in their terms of service that human employees may be reading some conversations. “Is it worth the risk of your information getting out there when the response the generative AI returns might be inaccurate anyway? Probably not,” Raicu says.
Carnegie Mellon’s Krishnan, citing the risk of hackers, cautions people to think twice before using a feature of Google’s Bard that allows for all your emails to be read and processed by the tool so it understands your writing style and tone.
Ultimately, what you enter into a chatbot requires a risk-reward calculation, and everyone’s level of comfort will be different. However, experts say, you should at least double check the terms of service and privacy policies of a chatbot to understand how your data will be used.
“Fortunately we’re not in a doomsday chatbot environment right now,” says Shelton Leipzig. “The reputable generative-AI firms are taking steps to protect users.” Still, she says, always be mindful before sharing sensitive information with a chatbot.
Heidi Mitchell is a writer in Chicago and London. She can be reached at reports@wsj.com.