<feed xmlns="http://www.w3.org/2005/Atom"> <id>https://zmwang03.github.io/</id><title>Zhimeng Wang</title><subtitle>Personal blog by Zhimeng Wang, sharing notes on MLSys\&amp;NLP.</subtitle> <updated>2026-04-19T18:56:43+08:00</updated> <author> <name>Zhimeng Wang</name> <uri>https://zmwang03.github.io/</uri> </author><link rel="self" type="application/atom+xml" href="https://zmwang03.github.io/feed.xml"/><link rel="alternate" type="text/html" hreflang="en" href="https://zmwang03.github.io/"/> <generator uri="https://jekyllrb.com/" version="4.4.1">Jekyll</generator> <rights> © 2026 Zhimeng Wang </rights> <icon>/assets/img/favicons/favicon.ico</icon> <logo>/assets/img/favicons/favicon-96x96.png</logo> <entry><title>Spegel - Stateless Local OCI Mirror</title><link href="https://zmwang03.github.io/posts/Spegel-stateless-local-OCI-mirror/" rel="alternate" type="text/html" title="Spegel - Stateless Local OCI Mirror" /><published>2024-11-04T00:00:00+08:00</published> <updated>2026-04-19T01:36:23+08:00</updated> <id>https://zmwang03.github.io/posts/Spegel-stateless-local-OCI-mirror/</id> <content type="text/html" src="https://zmwang03.github.io/posts/Spegel-stateless-local-OCI-mirror/" /> <author> <name>tremo</name> </author> <category term="OCI" /> <category term="Mirror" /> <category term="Spegel" /> <summary>Visual architecture guide to Spegel, a stateless cluster-local OCI registry mirror for Kubernetes using P2P networking.</summary> </entry> <entry><title>How to Efficiently Serve an LLM?</title><link href="https://zmwang03.github.io/posts/How-to-Efficiently-serve-an-llm/" rel="alternate" type="text/html" title="How to Efficiently Serve an LLM?" /><published>2024-08-05T05:30:00+08:00</published> <updated>2026-04-19T01:36:23+08:00</updated> <id>https://zmwang03.github.io/posts/How-to-Efficiently-serve-an-llm/</id> <content type="text/html" src="https://zmwang03.github.io/posts/How-to-Efficiently-serve-an-llm/" /> <author> <name>tremo</name> </author> <category term="LLM" /> <category term="Inference" /> <category term="Optimization" /> <category term="Serving" /> <summary>Exploring LLM serving optimizations including batching, quantization, paged attention, speculative decoding, and KV cache techniques.</summary> </entry> <entry><title>What Infrastructure does it take to train a 405B Llama3-like model?</title><link href="https://zmwang03.github.io/posts/What-Infra-does-it-take-to-train-llama405b/" rel="alternate" type="text/html" title="What Infrastructure does it take to train a 405B Llama3-like model?" /><published>2024-07-28T00:00:00+08:00</published> <updated>2026-04-19T01:36:23+08:00</updated> <id>https://zmwang03.github.io/posts/What-Infra-does-it-take-to-train-llama405b/</id> <content type="text/html" src="https://zmwang03.github.io/posts/What-Infra-does-it-take-to-train-llama405b/" /> <author> <name>tremo</name> </author> <category term="LLM" /> <category term="Infrastructure" /> <category term="GPU" /> <category term="Distributed Training" /> <summary>A comprehensive overview of the infrastructure required to train 405B parameter LLMs, covering network topology, storage, compute, and fault tolerance.</summary> </entry> <entry><title>The Tech Behind TikTok's Addictive Recommendation System</title><link href="https://zmwang03.github.io/posts/the-tech-behind-tiktoks-addictive-recommendation-system/" rel="alternate" type="text/html" title="The Tech Behind TikTok&amp;apos;s Addictive Recommendation System" /><published>2023-12-05T03:30:00+08:00</published> <updated>2026-04-19T01:36:23+08:00</updated> <id>https://zmwang03.github.io/posts/the-tech-behind-tiktoks-addictive-recommendation-system/</id> <content type="text/html" src="https://zmwang03.github.io/posts/the-tech-behind-tiktoks-addictive-recommendation-system/" /> <author> <name>tremo</name> </author> <category term="TikTok" /> <category term="Recommendation Systems" /> <category term="Kafka" /> <category term="Flink" /> <summary>Deep dive into ByteDance's Monolith framework powering TikTok's real-time recommendation system with online training.</summary> </entry> <entry><title>Do you really need a Vector Database?</title><link href="https://zmwang03.github.io/posts/do-you-really-need-a-vector-database/" rel="alternate" type="text/html" title="Do you really need a Vector Database?" /><published>2023-11-22T04:50:00+08:00</published> <updated>2026-04-19T01:36:23+08:00</updated> <id>https://zmwang03.github.io/posts/do-you-really-need-a-vector-database/</id> <content type="text/html" src="https://zmwang03.github.io/posts/do-you-really-need-a-vector-database/" /> <author> <name>tremo</name> </author> <category term="LLMs" /> <category term="GenAI" /> <category term="VectorDBs" /> <category term="Embeddings" /> <summary>Exploring alternatives to vector databases for LLM applications, including FAISS, PostgreSQL pgvector, and Elasticsearch.</summary> </entry> </feed>
