Within three months of the rollout [of a long-term conversation memory feature], [Johann] Rehberger found that memories could be created and permanently stored through indirect prompt injection, an AI exploit that causes an LLM to follow instructions from untrusted content such as emails, blog posts, or documents. The [security] researcher demonstrated how he could trick ChatGPT into believing a targeted user was 102 years old, lived in the Matrix, and insisted Earth was flat and the LLM would incorporate that information to steer all future conversations.. . .
While OpenAI has introduced a fix that prevents memories from being abused as an exfiltration vector, the researcher said, untrusted content can still perform prompt injections that cause the memory tool to store long-term information planted by a malicious attacker.
| Artificial Intelligence |
| Research Data Curation and Management Works |
| Digital Curation and Digital Preservation Works |
| Open Access Works |
| Digital Scholarship |