Ivan presents LLM confidence calibration at Shonan – TEA Lab: Trustworthy Engineered Autonomy

Ivan had the honor of attending an invitation-only visionary workshop #235 on LLM-guided assurance and synthesis for CPS in Shonan, Japan. He presented the lab’s work on calibrating chain-of-thought confidence by discovering temporal patterns with Signal Temporal Logic.

Materials:

Some of the prominent debates at the workshop included:

What does the probability of LLM choices have to do with the probability of LLM mistakes?
Are world models necessary for intent?
Where do specifications for LLMs come from, and are they truly separate from data?
What is the equivalence class of semantically valid formalizations of natural language?
How to combine the perfection of formal methods and the magic of AI?
How is the explainability of LLM states different from the explainability of LLM outputs?
Should we prioritize syntactic or semantic robustness in reasoning?
How to establish multifaceted connections between the modalities of sensing (camera/lidar data), reasoning (language, both natural and formal), and control (actions)?
How do you expect the robot to clean dishes well if you did not teach it?
Does Lean have enough support for future, not-yet-existent mathematics?
What aspects of agent-based CPS engineering should we trust more, and which – less?