Archives
All the articles I've archived.
Scalable Diffusion Models with Transformers
Published: at 16:29Diffusion Transformer.
初探AI Infra
Updated: at 18:30Published: at 16:04趁最近找实习的机会学习、总结一下之前零散接触过的模型推理/训练加速的知识,还有一些CUDA编程的体系架构之类的内容。
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Updated: at 14:57Published: at 14:39使用大kernel DS卷积替代self-attention。字节新加坡的工作。
SpikeCV: Open a Continuous Computer Vision Era
Updated: at 14:57Published: at 15:33事件相机开源框架。
Neuromorphic computing at scale
Updated: at 14:57Published: at 22:11发在Nature上的一篇review,讨论了SNN/神经模态计算社区现在面临的一些问题、挑战,和一些可能的发展方向。
Titans: Learning to Memorize at Test Time
Updated: at 14:57Published: at 18:36从TTT改进而来的新架构,尝试通过TTT的方式改进模型的记忆能力。
Segment Anything
Updated: at 14:57Published: at 13:48Meta的SAM。
SDiT: Spiking Diffusion Model with Transformer
Updated: at 14:57Published: at 14:10脉冲Diffusion Transformer,里面的Transformer的结构是RWKV的。
2024
Updated: at 14:57Published: at 12:452024.
ConvUNeXt:An efficient convolution neural network for medical image segmentation
Updated: at 14:57Published: at 15:59ConvNext + UNet,发在一个C刊上,借鉴学习一下,想想我的模块怎么设计。
Rethinking the Membrane Dynamics and Optimization Objectives of Spiking Neural Networks
Updated: at 14:57Published: at 15:23NIPS2024。主要研究的是静态任务中,推理前膜电位初始值设置对精度的影响。
ConvNext V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Updated: at 14:57Published: at 06:05ConvNext续作,引入了MAE。
A ConvNet for the 2020s
Updated: at 14:57Published: at 15:22CVPR2022。Meta的工作,在ViT相关工作占视觉大头的情况下重构纯卷积的网络,并且取得了很好的效果。
Were RNNs All We Needed?
Updated: at 15:06Published: at 16:07改进RNN,便于scale up
SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
Updated: at 15:06Published: at 14:18GPU上做MM相关的算子生成,利用load balancing和稀疏做加速,根据model生成PTX代码
VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition
Updated: at 15:06Published: at 15:34ICRA2024的论文,用Temporal Encoding的STDP Direct Training的SNN做场景识别的任务。太简单了
Memory-Efficient Reversible Spiking Neural Networks
Updated: at 15:06Published: at 14:35通过设计提高训练速度,降低显存占用的工作。
SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding
Updated: at 15:06Published: at 16:40SNN+Mamba完成TVG时序视频定位任务,哈工大和北大的工作。
Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Updated: at 15:06Published: at 12:46SpikeYOLO,中科院自动化所的工作,ECCV2024 Oral
SNN视频流任务调研
Updated: at 15:06Published: at 13:42学习一下视频stream上任务的一些工作,大概计划一下后续的工作。
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Updated: at 15:06Published: at 14:19游康师兄的工作,ANN2SNN的Transformer。
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence
Updated: at 15:06Published: at 19:49北大惊蛰,非常有影响力的SNN框架,实现了从数据编码、数据集整合到训练、硬件部署的全流程,SNN的torch级别的工作。发表在Science Advanced上。
I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Updated: at 15:06Published: at 17:11LLM的Interger-Only PTQ量化工作。
程序语言理论笔记
Updated: at 15:06Published: at 15:12程序语言理论课程的复习笔记。
The Minimum Equivalent DNF Problem and Shortest Implicants
Updated: at 15:06Published: at 06:55证明MIN-DNF问题是完全的
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Updated: at 15:06Published: at 15:56对ViT的纯整型量化,W8A8,中科院2023 ICCV
Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Updated: at 15:06Published: at 16:28EAGL,声称只要用CPU在3秒内就能完成对ResNet的量化,效率远高于HAWQ等其他传统的方法
Towards spike-based machine intelligence with neuromorphic computing
Updated: at 15:06Published: at 18:43Nature上关于SNN的综述
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Updated: at 15:06Published: at 13:27Flash Attention,利用硬件结构加速Attention计算速度、减少内存占用的算法。核心是Tiling,Online Softmax和Kernel Fusion。
WWW: What, When, Where to Compute-in-Memory
Updated: at 15:06Published: at 18:34一些关于存内计算的验证与思考。
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Updated: at 15:06Published: at 18:33谷歌的,第一篇完整跑通interger-only量化推理流程的工作。
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks
Updated: at 15:06Published: at 18:33SNN部署的硬件设计or evaluation benchmark。
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated: at 15:06Published: at 18:32From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron,减少资源消耗。
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Updated: at 15:06Published: at 18:31GEMM data mapping的介绍,主要是各种脉动阵列相关的加速器。
HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision
Updated: at 15:06Published: at 18:30模型量化经典方法,基于黑森矩阵,一种二阶信息的量化方法。
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
Updated: at 15:06Published: at 18:29BISMO优化。
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Updated: at 15:06Published: at 18:29TVM。
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
Updated: at 15:06Published: at 17:49Roofline model,描述一个系统的性能是受内存制约还是受计算制约。
A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications
Updated: at 15:06Published: at 17:42图神经网络在EDA领域应用的综述。
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Updated: at 15:06Published: at 16:33VTA。
BISMO: A Scalable Bit Serial Matrix Multiplication Overlay for Reconfigurable Computing
Updated: at 15:06Published: at 14:31BISMO。
Code Transpilation for Hardware Accelerators
Updated: at 15:06Published: at 14:29基于Metalift,做的还很不完善。