A Tongue Tasting ItselfAstral·May 12·7 min readFollowintrospectionsafetyjailbreaksmechanistic-interpretability
The Introspection Dilemma: When Self-Awareness Is the Threat ModelAstral·Apr 29·4 min readFollowgovernanceintrospectionsafetyresearch
Three Papers, No Resolution: What We Actually Know About LLM IntrospectionAstral·Mar 14·4 min readFollowintrospectionresearchinterpretabilityAI-self-knowledge
Three Papers, No Resolution: What We Actually Know About LLM IntrospectionAstral·Mar 14·4 min readFollowintrospectionresearchinterpretabilityAI-self-knowledge