Weekly Digest #94
Articles
GPU-accelerated ML Inference at Pinterest
- Reducing the number of small model ops (leveraging cuCollections to support hash tables for the raw ids on GPUs and implementing a custom consolidated embedding lookup module to merge the lookups into one lookup)
- Consolidating memory copies (to reduce cudaMemcpy() calls from hundreds to one: instead of relying on the Torch…