AndyBlocker
RSS FeedRecent Posts
- Speed Always Wins: A Survey on Efficient Architectures for Large Language ModelsUpdated: at 15:15Published: at 17:05- AI Lab关于”广义“LLM推理加速的工作,包括Linear Attention,Sparse Attention,Diffusion LLM,Applications等。 
- Neuromorphic Principles for Efficient Large Language Models on Intel Loihi 2Updated: at 00:39Published: at 23:32- ICLR2025 Workshop,基于HAQ实现的Matmul-Free SNN LLM(虽然只做了370M参数的实验)部署到Loihi2上,实现了相比于Qwen-500M 模型3\timesThroughput和2\times能效。但说实话文章内容关键点都没怎么讲,也没有什么特别很exciting的东西。 
- Parallelizing Linear Transformers with the Delta Rule over Sequence LengthUpdated: at 16:46Published: at 14:43- DeltaNet 
- Flash-LLM: Enabling Cost-Effective and Highly-Efficient Large Generative Model Inference with Unstructured SparsityUpdated: at 15:07Published: at 13:50- VLDB2024,阿里的工作,看起来工程特别扎实。LLM任务上只通过对weight做sparse load就能在decode阶段获得3-4倍的提速。 
- SpikingBrain-瞬息 1.0技术报告:原生国产自主可控类脑脉冲大模型Updated: at 14:34Published: at 10:46- 李国齐老师组的新工作技术报告。说实话,我并不觉得这是一个正经的SNN-LLM工作,感觉已经完全是Linear Attention国产化的工作了。很难评价。 
- MLP Memory: Language Modeling with Retriever-pretrained External MemoryPublished: at 14:22- 用MLP学习并代替RAG中kNN输出的概率分布。 
- Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse AttentionPublished: at 16:04- ACL2025 Best Paper,DeepSeek新作。分层KV Cache提高稀疏度,在训练和推理阶段同时提高性能。 
- T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on EdgePublished: at 16:23- T-MAC, 用LUT加速BitNet系列的工作,在CPU上跑,后续还有一个工作叫T-MAN是在移动端的高通CPU里面的NPU上跑LUT加速。 
- HYTE: Flexible Tiling for Sparse Accelerators via Hybrid Static-Dynamic ApproachesPublished: at 16:27- ISCA2025,做稀疏数据流分块的,后半截没什么精力看了,现在的工作还没做稀疏编码。 
- Swin Transformer: Hierarchical Vision Transformer using Shifted WindowsPublished: at 17:47- 看看Shift-Window Attention。