Posts
All the articles I've posted.
The Minimum Equivalent DNF Problem and Shortest Implicants
Updated: at 15:06Published: at 06:55证明MIN-DNF问题是完全的
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Updated: at 15:06Published: at 15:56对ViT的纯整型量化,W8A8,中科院2023 ICCV
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Updated: at 15:06Published: at 16:28EAGL,声称只要用CPU在3秒内就能完成对ResNet的量化,效率远高于HAWQ等其他传统的方法
Towards spike-based machine intelligence with neuromorphic computing
Updated: at 15:06Published: at 18:43Nature上关于SNN的综述
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Updated: at 15:06Published: at 13:27Flash Attention,利用硬件结构加速Attention计算速度、减少内存占用的算法。核心是Tiling,Online Softmax和Kernel Fusion。
WWW: What, When, Where to Compute-in-Memory
Updated: at 15:06Published: at 18:34一些关于存内计算的验证与思考。
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Updated: at 15:06Published: at 18:33谷歌的,第一篇完整跑通interger-only量化推理流程的工作。
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks
Updated: at 15:06Published: at 18:33SNN部署的硬件设计or evaluation benchmark。
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated: at 15:06Published: at 18:32From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron,减少资源消耗。
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Updated: at 15:06Published: at 18:31GEMM data mapping的介绍,主要是各种脉动阵列相关的加速器。