Tag: LLM
All the articles with the tag "LLM".
SpikingBrain-瞬息 1.0技术报告:原生国产自主可控类脑脉冲大模型
Updated: at 14:34Published: at 10:46李国齐老师组的新工作技术报告。说实话,我并不觉得这是一个正经的SNN-LLM工作,感觉已经完全是Linear Attention国产化的工作了。很难评价。
MLP Memory: Language Modeling with Retriever-pretrained External Memory
Published: at 14:22用MLP学习并代替RAG中kNN输出的概率分布。
Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention
Published: at 16:04ACL2025 Best Paper,DeepSeek新作。分层KV Cache提高稀疏度,在训练和推理阶段同时提高性能。
T-MAC: CPU Renaissance via Table Lookup for Low-Bit LLM Deployment on Edge
Published: at 16:23T-MAC, 用LUT加速BitNet系列的工作,在CPU上跑,后续还有一个工作叫T-MAN是在移动端的高通CPU里面的NPU上跑LUT加速。
Transformers without Normalization
Published: at 16:09何恺明新作,用DyT代替Norm,把同步操作变成了Element Wise的操作。新文章里面有用到,学习一下。
Visualizing and Understanding the Effectiveness of BERT
Published: at 10:21最近做SNN训练的过程中在研究怎么可视化训练过程中的Loss,在想新加入的方法会不会对模型的Loss Landscape有影响,一般讲Loss Landscape怎么做可视化的文章都会引用这篇文章对Loss Landscape的分析和做法。
初探AI Infra
Updated: at 18:30Published: at 16:04趁最近找实习的机会学习、总结一下之前零散接触过的模型推理/训练加速的知识,还有一些CUDA编程的体系架构之类的内容。
Titans: Learning to Memorize at Test Time
Updated: at 14:57Published: at 18:36从TTT改进而来的新架构,尝试通过TTT的方式改进模型的记忆能力。
Were RNNs All We Needed?
Updated: at 15:06Published: at 16:07改进RNN,便于scale up
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Updated: at 15:06Published: at 17:11LLM的Interger-Only PTQ量化工作。