Summary
In this post, Andrej Karpathy discusses the ongoing evolution of human-AI interaction. He highlights that while audio is the human-preferred input, vision (images, animations, video) is the preferred output for AI. He proposes a progression of AI output formats:
- Raw text (hard to read)
- Markdown (current default)
- HTML (emerging default, offering better graphics, layout, and interactivity)
- Interactive neural videos/simulations (future)
Karpathy suggests that users should ask LLMs to “structure your response as HTML” to take advantage of better information density and interactivity. He also notes that future progress will likely involve better input methods, such as pointing and gesturing.