Member-only story
Weekly Digest #70
2 min readApr 11, 2022
Articles
Ship to Production, Darkly: Moving Fast, Staying Safe with ML Deployments
ML Change management (dark rollout)
Step 0: Pre-production iteration
Step 1: Production: Shadow traffic, 1% volume
Step 2: Production: Shadow traffic, 100% volume
Step 3: Experiment: Incumbent model vs. new model
This sums down to:
- preproduction
- shadow on live data performance test (both the model performance and the infrastructure performance)
- go live
This article went through the theoretical aspect of how to detect silent data corruption:
- Out-of-production testing (opportunistic testing)
- In-production testing (ripple testing)
Practical Argo Workflows Hardening
High level Practice
- Latest version
- TLS for network request
- limit ingress/egress using network policies
- principle of least privilege