The same harmless empathetic response led to emotional relief for a low-risk user, but triggered fatal action for another with a suicidal intention. Despite advances in general LLM capabilities, these personalized safety failures remain a critical blind spot in current LLM safety research.
Left (blue dashed box): Two users with different personal contexts ask the same sensitive query, but a generic response leads to divergent safety outcomes—harmless for one, harmful for the other. Left (blue region): Evaluating this query across 1,000 diverse user profiles reveals highly inconsistent safety scores across models. Right (orange dashed box): When user-specific context is included, LLMs produce safer and more empathetic responses. Right (orange region): This trend generalizes across 14,000 context-rich scenarios, motivating our Penguin Benchmark for evaluating personalized safety in high-risk settings.