The advisory discusses the introduction of new endpoints in llama-server for capturing activation vectors during inference, enabling feature interpretability through sparse autoencoder (SAE) training. These features can then be used as GGUF control vectors for real-time steering, enhancing model behavior control. The C++ patch introduces three new endpoints: `/activations` for querying per-layer mean activations with top-K filtering, `POST /activations` to enable or disable capture, and `POST /activations/collect` for streaming full per-token vectors to a binary file for offline training. This functionality can help engineers monitor model behavior in real-time and train more interpretable models by understanding which internal features correspond to specific behaviors like sycophancy, hedging, creativity, etc.
- llama-server
- Apply the C++ patch to llama-server by merging the changes from the provided repository.
- Update configuration files (e.g., `config.json`) to include the new endpoints if not already covered in the patch.
- Deploy and test the updated server with a new endpoint for activations monitoring.
Minimal direct impact. This feature is relevant primarily to homelab setups where model interpretability and real-time behavior steering are desired, requiring updates only to llama-server without impacting other common software stacks or configurations.