Zhimeng Wang

https://zmwang03.github.io/Zhimeng WangPersonal blog by Zhimeng Wang, sharing notes on MLSys\&NLP. 2026-04-19T18:56:43+08:00 Zhimeng Wang https://zmwang03.github.io/ Jekyll © 2026 Zhimeng Wang /assets/img/favicons/favicon.ico /assets/img/favicons/favicon-96x96.png Spegel - Stateless Local OCI Mirror2024-11-04T00:00:00+08:00 2026-04-19T01:36:23+08:00 https://zmwang03.github.io/posts/Spegel-stateless-local-OCI-mirror/ tremo

Visual architecture guide to Spegel, a stateless cluster-local OCI registry mirror for Kubernetes using P2P networking.

How to Efficiently Serve an LLM?2024-08-05T05:30:00+08:00 2026-04-19T01:36:23+08:00 https://zmwang03.github.io/posts/How-to-Efficiently-serve-an-llm/ tremo

Exploring LLM serving optimizations including batching, quantization, paged attention, speculative decoding, and KV cache techniques.

What Infrastructure does it take to train a 405B Llama3-like model?2024-07-28T00:00:00+08:00 2026-04-19T01:36:23+08:00 https://zmwang03.github.io/posts/What-Infra-does-it-take-to-train-llama405b/ tremo

A comprehensive overview of the infrastructure required to train 405B parameter LLMs, covering network topology, storage, compute, and fault tolerance.

The Tech Behind TikTok's Addictive Recommendation System2023-12-05T03:30:00+08:00 2026-04-19T01:36:23+08:00 https://zmwang03.github.io/posts/the-tech-behind-tiktoks-addictive-recommendation-system/ tremo

Deep dive into ByteDance's Monolith framework powering TikTok's real-time recommendation system with online training.

Do you really need a Vector Database?2023-11-22T04:50:00+08:00 2026-04-19T01:36:23+08:00 https://zmwang03.github.io/posts/do-you-really-need-a-vector-database/ tremo

Exploring alternatives to vector databases for LLM applications, including FAISS, PostgreSQL pgvector, and Elasticsearch.