Despite self-hosting our workflow stack within our own infrastructure, we found that achieving true control over Large Language Model (LLM) calls remained elusive. The issue arose from multiple points of egress for prompts and scattered provider keys across various services, which resulted in a lack of visibility into model interactions and compliance with security policies. Specifically, this setup led to unintended usage of different LLM providers without proper oversight or cost control. This scenario highlights the need for stricter access controls and centralized logging mechanisms to ensure that LLM calls are auditable and aligned with organizational policies.
- Self-hosted workflow stack with distributed LLM service calls
- Containers managed within the infrastructure
- Create a single entry point for all LLM model calls by redirecting traffic through a centralized service. For example, update Dockerfile configurations to route requests via this new service.
- Implement strict access controls using IAM roles or equivalent mechanisms to ensure that only authorized services can make LLM calls. Modify the configuration files in /etc/iam/config.yaml to enforce these roles.
- Centralize logging for all LLM interactions by configuring a unified log aggregator, such as ELK stack, to capture detailed information on each call's origin and purpose.
- Audit current configurations and remove redundant provider keys from unauthorized services or environments. Use specific commands like `rm /path/to/provider/key.json` to ensure no key is left unattended.
This issue directly impacts homelab stacks where multiple services independently manage LLM calls, potentially leading to inconsistent security policies and unexpected costs.