LOW
This advisory describes a feature enhancement rather than a security vulnerability. The patch introduces new endpoints for monitoring and training purposes but does not address any known exploits or vulnerabilities.

The advisory discusses the introduction of new endpoints in llama-server for capturing activation vectors during inference, enabling feature interpretability through sparse autoencoder (SAE) training. These features can then be used as GGUF control vectors for real-time steering, enhancing model behavior control. The C++ patch introduces three new endpoints: `/activations` for querying per-layer mean activations with top-K filtering, `POST /activations` to enable or disable capture, and `POST /activations/collect` for streaming full per-token vectors to a binary file for offline training. This functionality can help engineers monitor model behavior in real-time and train more interpretable models by understanding which internal features correspond to specific behaviors like sycophancy, hedging, creativity, etc.

Affected Systems
  • llama-server
Affected Versions: All versions before the patch release
Remediation
  • Apply the C++ patch to llama-server by merging the changes from the provided repository.
  • Update configuration files (e.g., `config.json`) to include the new endpoints if not already covered in the patch.
  • Deploy and test the updated server with a new endpoint for activations monitoring.
Stack Impact

Minimal direct impact. This feature is relevant primarily to homelab setups where model interpretability and real-time behavior steering are desired, requiring updates only to llama-server without impacting other common software stacks or configurations.

Source →