Author – Matilde Barbini
“You will destroy a great empire.” When King Croesus received this prophecy from Delphi before attacking Persia, he interpreted it as divine endorsement. In the end, the empire he destroyed, as Herodotus (Histories I, 48-53) recounts, was his own. However, the oracle’s ambiguity performed a specific epistemic function: it distributed interpretive labour and, consequently, moral responsibility across multiple agents in the divinatory process.
This structural feature of ancient oracular consultation offers a productive analytical lens for examining contemporary human-AI interaction. When we query large language models (LLMs), oftentimes we receive not riddling prophecies but confident, articulate responses deploying the linguistic markers of expertise and certainty. The presentational style has transformed radically, yet the fundamental opacity of the generative process remains comparable. We are still receiving answers from systems whose internal operations we do not fully understand. The critical transformation lies not in the epistemic foundations of these systems but in how they position themselves toward the user: these modern “oracles” present themselves as direct truth-bearers rather than intermediaries requiring interpretation.
This shift in presentational rhetoric fundamentally reconfigures the locus of responsibility when outcomes emerge from human-AI interaction. In interacting with LLMs’ responses we risk collapsing the interpretive chain that once explicitly distributed responsibility across multiple human agents, while failing to acknowledge that interpretation has not disappeared, it has merely been obscured by interface design and confident declarative outputs.
The Ancient Model: Interpretation as Structural Feature
Classical scholarship has systematically challenged the popular caricature of the oracle as an almost fantastic figure producing incoherent utterances. Fontenrose’s comprehensive analysis of surviving responses suggests that Pythia delivered comprehensible utterances directly to consultants, whilst Bowden’s institutional analysis demonstrates that consultation involved structured interactions between multiple agents: the Pythia, the prophētai (priests who sometimes mediated), and critically, the consultant who bore responsibility for determining what the oracle meant within their specific context (Bowden, 2005; Fontenrose, 1978).
Ambiguity served multiple functions simultaneously: it preserved the oracle’s credibility regardless of outcome (failures could be attributed to misinterpretation), distributed epistemic and moral responsibility across the consultative chain, and positioned divine knowledge as requiring human wisdom to be properly understood and applied.
Ancient practitioners understood oracular consultation through what we might term an intermediated epistemic model: the Pythia was conceived as a channel between divine knowledge and human understanding, not herself the source of knowledge but a medium through which knowledge might be accessed. This framing explicitly acknowledged that communication across different actants necessarily involved translation, and translation necessarily involved interpretation.
The Modern Shift: Performative Epistemic Authority
When a user queries an LLM, the response often deploys specific linguistic strategies that construct what we could term performative epistemic authority, thus the appearance of legitimate knowledge claims through presentational form rather than epistemic grounding. Outputs typically feature declarative mood, specific detail, explanatory coherence, and crucially, the absence of explicit uncertainty markers unless prompted. This constellation mimics the discourse patterns associated with human expertise and reliable testimony.
Yet the generative mechanism remains fundamentally opaque in ways that parallel ancient oracular processes (Cugurullo and Xu, 2025). LLMs operate through billions of parameters trained via processes that resist complete interpretability. Whether these systems “know” propositions in any philosophically robust sense remains genuinely contested (Fierro et al., 2024).
What is not contested is that indeed outputs often deploy the linguistic register of certainty and expertise. Recent empirical work quantifies this performative confidence and its effects, demonstrating that LLMs lack metacognitive capacity for appropriate confidence calibration: unlike human experts who revise confidence judgments after errors, LLMs maintain consistent confidence regardless of performance, sometimes exhibiting increased confidence as accuracy declines (Cash & Oppenheimer, 2025). This represents not merely calibration failure but absence of second-order epistemic capacity, thus the ability to know what one does not know.
Kim et al.’s research on uncertainty expression reveals that only very specific linguistic markers (“I’m not sure, but…”) effectively calibrate user trust, with generic hedging showing minimal effect (Kim et al., 2024). This finding is theoretically significant: epistemic uncertainty must be performed through particular discourse strategies to register as such, and even if LLMs can and do use these strategies naturally (Kim et al., 2024), they generally exhibit a bias toward overconfidence (Cash et al., 2025) and persuasive fluency (Peter et al., 2025) rather than epistemic caution.
A confidence trap emerges from this mismatch, with users developing what Buçinca et al. term “reliance heuristics”, generalised decision rules about trusting AI rather than case-by-case critical evaluation (Buçinca et al., 2021). This represents a fundamental shift from interpretation to acceptance/rejection. Anthropomorphisation intensifies these dynamics: users increasingly cannot distinguish LLM text from human writing, with some attributing even consciousness to systems (Peter et al., 2025). Reinecke et al., argue that perceived anthropomorphism increases trust in both accurate and inaccurate content: rather than helping users distinguish good information from bad, human-likeness undermines critical evaluation altogether (Reinecke et al., 2025).
In this gap between statistical mechanism and confident presentation, the responsibility of the interpretive process disappears. Where ancient consultants recognised themselves as interpreters working with ambiguous input, contemporary users risk positioning themselves as consumers of expertise rather than engaging in interpretive labour about what information means, where it comes from, what it’s useful for and how it should be applied to specific situations.
Where Responsibility Rests: The Attributability Problem
The question of responsibility attribution in human-AI interaction exposes fundamental tensions in how we conceptualise agency, causation and moral accountability. The system’s confident, declarative outputs do not present themselves as requiring interpretation; users consequently do not recognise themselves as interpreters making constitutive choices; yet interpretation continues to occur, but is made opaque or even invisible by interface design and presentational rhetoric.
This creates what philosophers of technology have termed the “attributability gap” (Zeiser 2023) or “responsibility gap” (Brailsford et al., 2025). Brailsford et al. document that people attribute shared responsibility for positive outcomes from human-AI collaboration but seek a single entity to blame for negative outcomes, with inconsistency about which entity that should be (Brailsford et al., 2025). This asymmetry suggests our intuitions about responsibility struggle when dealing with systems that causally contribute to outcomes but lack the intentional states traditionally required for moral responsibility.
It has been argued that AI decision-support systems create this gap precisely because humans technically retain final decision-making authority but increasingly defer to AI recommendations without exercising meaningful judgment, although AI systems themselves lack the intentional states and moral agency required for responsibility (Zeiser 2023), being “risky agents without intentions” (Ayres and Balkin, 2024). The human is nominally the decision-maker but may be effectively “rubber-stamping” outputs derived from processes they cannot scrutinise.
The issue is that LLMs obscure the interpretive moment even as they make interpretation more necessary. Ancient oracles explicitly required interpretation through structural ambiguity, placing the burden of hermeneutic work visibly on consultants who consequently bore clear responsibility. Conversely, modern LLMs present themselves as not requiring interpretation, yet their answers remain outputs of opaque statistical systems that must still be interpreted to be appropriately understood and applied. In this, technologies are not neutral instruments but active mediators shaping both perception and action (Verbeek, 2011). LLMs mediate our access to information while obscuring their mediating role: they present themselves as transparent conveyors of knowledge rather than as interpretive systems producing particular kinds of outputs. We have hidden the interpretive work without reducing, and perhaps even increasing, its necessity.
Recovering Hermeneutic Consciousness: Towards Critical AI Literacy
If the problem is obscured interpretation, a productive direction might involve recovering what Gadamer called “hermeneutic consciousness”: explicit awareness that understanding is always interpretive, always situated and always involves bringing our own horizon of pre-understandings to bear on texts (Gadamer, 1975). This is not straightforward, as hermeneutic philosophy was developed to address human-created texts embedded in cultural traditions. LLMs are not ontological subjects, for now, and produce outputs through statistical processes rather than intentional communication. Yet the hermeneutic framework remains productive if we approach LLM outputs as texts that crystallise particular patterns from training corpora, texts encoding assumptions, biases and framings that users must actively interrogate rather than passively accept.
Several interventions become possible. First, acknowledging interpretation’s necessity. LLM outputs are not facts but generated texts requiring active interpretive work. Users should ask not “is this true?” but “What patterns produced this?, What contexts is this useful for? What assumptions does it encode? How should it apply to my specific situation?”. This reframes interaction from acceptance/rejection to critical interpretation.
Second, designing for interpretive space. Current interface design frequently collapses the gap between output and application, with systems providing answers users can immediately act upon. Preserving interpretive space might involve explicit uncertainty communication and interface elements prompting critical reflection, what Bucinca et al. (2021) term “cognitive forcing functions”, rather than immediate acceptance. Regarding linguistic choices, Kim et al. (2024) found that first-person uncertainty expressions (e.g., “I’m not sure”) effectively reduce overreliance and decrease user confidence in the system, whereas generic hedging (e.g., “It’s not clear”) yielded no statistically significant effect. This creates a tension with the need to resist what Peter et al. (2025) call “anthropomorphic seduction”. While Peter et al. suggest developers should “dehumanise” systems by avoiding human-like simulation to prevent manipulation, Kim et al.’s findings imply that the strategic use of anthropomorphic language (specifically first-person pronouns) may be requisite to effectively signal epistemic caution to users.
Third, cultivating interpretive competence. Ricoeur emphasises that interpretation requires learning, developing capacity to read texts critically, recognise their rhetorical strategies and appropriate meaning in informed ways (Ricoeur, 1981). What he calls the “hermeneutics of suspicion”, thus the recognition that texts may obscure as much as reveal, seems particularly relevant to LLM outputs given empirical findings about hallucination, bias inheritance and confident presentation of uncertain information. Users need not just “AI literacy” in the technical sense but hermeneutic literacy: capacity to read AI outputs as texts requiring interpretation, to recognise characteristic failure modes and to maintain appropriate epistemic vigilance.
In this sense maybe the comparison of LLMs to ancient oracles is, I acknowledge, imperfect. Oracles operated within stable cultural-religious frameworks providing shared interpretive resources; LLM interaction occurs in rapidly evolving contexts without established interpretive conventions. Yet the analogy illuminates a structural paradox: we’ve built powerful information systems whose internal processes remain opaque to users, and we consult them for consequential decisions even though we pretend interpretation isn’t happening nor required. As Cugurullo and Xu observe, despite tremendous scientific progress in generative AI, those who listen to the predictions of AI systems are not in a better position than those listening to sibylline predictions of ancient oracles (Cugurullo & Xu, 2024, p. 101). Indeed the oracle, ancient or modern, has always required an interpreter, and, for what we can see and experience now interacting with these systems, it will continue to require so. The question is whether we start to design systems and cultivate practices that make this need visible, or whether we continue building technologies that hide interpretation whilst demanding it, obscuring responsibility while depending on it. The Pythia’s ambiguity was, in this sense, epistemically honest, while AI systems’ confidence may be epistemically misleading. Recognising this is the first step towards more responsible human-AI interaction.
References
Ayres, I., & Balkin, J. M. (2025). The law of AI is the law of risky agents without intentions. The University of Chicago Law Review.
Bowden, H. (2005). Classical Athens and the Delphic oracle: Divination and democracy. Cambridge University Press.
Brailsford, J., Vetere, F., & Velloso, E. (2025). Responsibility attribution in human interactions with everyday AI systems. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems (Article 3713126). Association for Computing Machinery. https://doi.org/10.1145/3706598.3713126.
Buçinca, Z., Malaya, M. B., & Gajos, K. Z. (2021). To trust or to think: Cognitive forcing functions can reduce overreliance on AI in AI-assisted decision-making. Proceedings of the ACM on Human-Computer Interaction, 5(CSCW1), Article 188. https://doi.org/10.1145/3449287.
Cash, T. N., Oppenheimer, D. M., Christie, S., & Devgan, M. (2025). Quantifying uncert-AI-nty: Testing the accuracy of LLMs’ confidence judgments. Memory & Cognition. https://doi.org/10.3758/s13421-025-01755-4.
Cugurullo, F., & Xu, Y. (2025). When AIs become oracles: Generative artificial intelligence, anticipatory urban governance, and the future of cities. Policy and Society, 44(1), 98–115. https://doi.org/10.1093/polsoc/puae025.
Fierro, C., Dhar, R., Stamatiou, F., Garneau, N., & Søgaard, A. (2024). Defining knowledge: Bridging epistemology and large language models. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing (pp. 16096–16111). Association for Computational Linguistics.
Fontenrose, J. (1978). The Delphic oracle: Its responses and operations with a catalogue of responses. University of California Press.
Gadamer, H.-G. (2004). Truth and method (J. Weinsheimer & D. G. Marshall, Trans.; 2nd rev. ed.). Continuum. (Original work published 1960).
Herodotus., Strassler, R. B., & Purvis, A. L. (2009). The landmark Herodotus: the histories. (1st Anchor Books ed). Anchor Books.
Kim, S. S. Y., Liao, Q. V., Vorvoreanu, M., Ballard, S., & Vaughan, J. W. (2024). “I’m not sure, but…”: Examining the impact of large language models’ uncertainty expression on user reliance and trust. In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency (pp. 1–14). Association for Computing Machinery. https://doi.org/10.1145/3630106.3658941.
S. Peter,K. Riemer, & J.D. West, (2025) The benefits and dangers of anthropomorphic conversational agents, Proc. Natl. Acad. Sci. U.S.A. 122 (22) e2415898122, https://doi.org/10.1073/pnas.2415898122.
Reinecke, M. G., Ting, F., Savulescu, J., & Singh, I. (2025). The double-edged sword of anthropomorphism in LLMs. Proceedings, 114(4). https://doi.org/10.3390/proceedings2025114004.
Ricoeur, P. (2016). Hermeneutics and the human sciences: Essays on language, action and interpretation (J. B. Thompson, Ed. & Trans.). Cambridge University Press. (Original work published 1981).
Verbeek. (2011). Moralizing technology: Understanding and designing the morality of things. University of Chicago Press.Zeiser, J. (2024). Owning decisions: AI decision-support and the attributability-gap. Science and Engineering Ethics, 30(27). https://doi.org/10.1007/s11948-024-00485-1.
Further Reading/Watching/Listening:
Readings:
Jasanoff, S. (2016). The ethics of invention: Technology and the human future. WW Norton & Company.
Videos:
Culture Vulture Rises–Delphi: The Bellybutton of the Ancient World (BBC) https://www.youtube.com/watch?v=MGNY8LEPYRk.
Richard Sutton–The Fundamental Problem of LLMs https://youtu.be/21EYKqUsPfg?si=3eoasJFk3sdI59jA.
Image Attribution
Generated by: Nano Banana Pro
Date: 24 October 2025
Prompt: “A minimalist, modern illustration showing the shift from GUI to CUI. On the left, depict classic graphical interfaces: floating windows, buttons, sliders, icons—clean geometric shapes in cool blues and teals. On the right, transition into a warm, organic conversational interface made of abstract speech bubbles, flowing lines and soft gradients, representing dynamic dialogue with AI. A clear bridge or smooth gradient connects the two worlds, symbolising the evolution toward conversational systems and the need for new evaluation methods. Style: flat vector, high contrast, soft shadows, generous negative space. Absolutely no text, no letters, no numbers, no logos, no UI labels. Output resolution: 1920×1080.”