Archives

All the articles I've archived.

2025 ²⁰

May ⁸

SlowFast Networks for Video Recognition
Published:2025年5月27日 at 16:57
多分支CNN，会不会有一些分支能学到更加相似的帧间变化？
DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos
Updated:2025年5月23日 at 15:07Published: 2025年5月23日 at 12:11
利用CNN Layer的“线性”特征在帧之间做feature的差分，并且做了CUDA加速。和ViStream几乎一样的思路，能不能解决我们现在的问题？
Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks
Published:2025年5月21日 at 17:45
ISCA 2025, 基于结构化稀疏的SNN加速器。如果直接用LUT存，可能会出现需要保存的稀疏pattern数量太多，显存占用太严重，所以通过预先校准一级“结构化稀疏”，将Online Spike Activation变成一级可以完全用LUT算的L1 Sparse和稀疏度非常高的L2 Sparse。模仿一下idea搬到GPU上来做？
Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness
Published:2025年5月21日 at 15:38
ICLR2025 Poster，似乎也在做Elastic inference？
A Simple Framework for Contrastive Learning of Visual Representations
Published:2025年5月20日 at 13:42
对比学习SimCLR的论文。对比学习能对齐每一层的Feature吗？
QKFormer: Hierarchical Spiking Transformer using Q-K Attention
Published:2025年5月8日 at 18:09
QKFormer，NIPS2024 Spotlight，把Direct Training SNN在ImageNet和CIFAR上的点刷的特别高，感觉之后要做就避不开它。
Transformers without Normalization
Published:2025年5月7日 at 16:09
何恺明新作，用DyT代替Norm，把同步操作变成了Element Wise的操作。新文章里面有用到，学习一下。
Visualizing and Understanding the Effectiveness of BERT
Published:2025年5月6日 at 10:21
最近做SNN训练的过程中在研究怎么可视化训练过程中的Loss，在想新加入的方法会不会对模型的Loss Landscape有影响，一般讲Loss Landscape怎么做可视化的文章都会引用这篇文章对Loss Landscape的分析和做法。

April ³

One-Minute Video Generation with Test-Time Training
Published:2025年4月22日 at 18:17
最近Demo很火的TTT视频生成，可以生成60s级别的长视频。学习一下TTT的东西，SNN的On-Chip Learning和TTT能不能做结合？
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Published:2025年4月21日 at 16:40
这两天在弄SNN训练的事情，需要验证一下用的Surrogate Gradient的准确性，老师介绍读一下这篇文章，用Evolution Strategy验证一下现在梯度估计的准确性。
SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute
Published:2025年4月15日 at 11:06
sparTA，带稀疏优化的DNN编译器，把tensor的稀疏性作为一种重要属性考虑到编译过程中，生成高效的代码。

March ³

Scalable Diffusion Models with Transformers
Published:2025年3月16日 at 16:29
Diffusion Transformer.
初探AI Infra
Updated:2025年3月11日 at 18:30Published: 2025年3月4日 at 16:04
趁最近找实习的机会学习、总结一下之前零散接触过的模型推理/训练加速的知识，还有一些CUDA编程的体系架构之类的内容。
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Updated:2025年3月8日 at 14:57Published: 2025年3月4日 at 14:39
使用大kernel DS卷积替代self-attention。字节新加坡的工作。

February ²

SpikeCV: Open a Continuous Computer Vision Era
Updated:2025年3月8日 at 14:57Published: 2025年2月17日 at 15:33
事件相机开源框架。
Neuromorphic computing at scale
Updated:2025年3月8日 at 14:57Published: 2025年2月2日 at 22:11
发在Nature上的一篇review，讨论了SNN/神经模态计算社区现在面临的一些问题、挑战，和一些可能的发展方向。

January ⁴

Titans: Learning to Memorize at Test Time
Updated:2025年3月8日 at 14:57Published: 2025年1月15日 at 18:36
从TTT改进而来的新架构，尝试通过TTT的方式改进模型的记忆能力。
Segment Anything
Updated:2025年3月8日 at 14:57Published: 2025年1月10日 at 13:48
Meta的SAM。
SDiT: Spiking Diffusion Model with Transformer
Updated:2025年3月8日 at 14:57Published: 2025年1月3日 at 14:10
脉冲Diffusion Transformer，里面的Transformer的结构是RWKV的。
2024
Updated:2025年3月8日 at 14:57Published: 2025年1月3日 at 12:45
2024.

2024 ³³

December ⁴

ConvUNeXt:An efficient convolution neural network for medical image segmentation
Updated:2025年3月8日 at 14:57Published: 2024年12月31日 at 15:59
ConvNext + UNet，发在一个C刊上，借鉴学习一下，想想我的模块怎么设计。
Rethinking the Membrane Dynamics and Optimization Objectives of Spiking Neural Networks
Updated:2025年3月8日 at 14:57Published: 2024年12月27日 at 15:23
NIPS2024。主要研究的是静态任务中，推理前膜电位初始值设置对精度的影响。
ConvNext V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Updated:2025年3月8日 at 14:57Published: 2024年12月17日 at 06:05
ConvNext续作，引入了MAE。
A ConvNet for the 2020s
Updated:2025年3月8日 at 14:57Published: 2024年12月16日 at 15:22
CVPR2022。Meta的工作，在ViT相关工作占视觉大头的情况下重构纯卷积的网络，并且取得了很好的效果。

October ³

Were RNNs All We Needed?
Updated:2025年3月8日 at 15:06Published: 2024年10月31日 at 16:07
改进RNN，便于scale up
SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference
Updated:2025年3月8日 at 15:06Published: 2024年10月17日 at 14:18
GPU上做MM相关的算子生成，利用load balancing和稀疏做加速，根据model生成PTX代码
VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition
Updated:2025年3月8日 at 15:06Published: 2024年10月14日 at 15:34
ICRA2024的论文，用Temporal Encoding的STDP Direct Training的SNN做场景识别的任务。太简单了

August ³

Memory-Efficient Reversible Spiking Neural Networks
Updated:2025年3月8日 at 15:06Published: 2024年8月30日 at 14:35
通过设计提高训练速度，降低显存占用的工作。
SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding
Updated:2025年3月8日 at 15:06Published: 2024年8月20日 at 16:40
SNN+Mamba完成TVG时序视频定位任务，哈工大和北大的工作。
Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection
Updated:2025年3月8日 at 15:06Published: 2024年8月19日 at 12:46
SpikeYOLO，中科院自动化所的工作，ECCV2024 Oral

July ³

SNN视频流任务调研
Updated:2025年5月7日 at 10:14Published: 2024年7月22日 at 13:42
学习一下视频stream上任务的一些工作，大概计划一下后续的工作。
SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN
Updated:2025年3月8日 at 15:06Published: 2024年7月8日 at 14:19
游康师兄的工作，ANN2SNN的Transformer。
SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence
Updated:2025年3月8日 at 15:06Published: 2024年7月1日 at 19:49
北大惊蛰，非常有影响力的SNN框架，实现了从数据编码、数据集整合到训练、硬件部署的全流程，SNN的torch级别的工作。发表在Science Advanced上。

June ²

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models
Updated:2025年3月8日 at 15:06Published: 2024年6月17日 at 17:11
LLM的Interger-Only PTQ量化工作。
程序语言理论笔记
Updated:2025年3月8日 at 15:06Published: 2024年6月1日 at 15:12
程序语言理论课程的复习笔记。

May ²

The Minimum Equivalent DNF Problem and Shortest Implicants
Updated:2025年3月8日 at 15:06Published: 2024年5月9日 at 06:55
证明MIN-DNF问题是完全的
I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference
Updated:2025年3月8日 at 15:06Published: 2024年5月7日 at 15:56
对ViT的纯整型量化，W8A8，中科院2023 ICCV

March ¹⁶

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference
Updated:2025年3月8日 at 15:06Published: 2024年3月27日 at 16:28
EAGL，声称只要用CPU在3秒内就能完成对ResNet的量化，效率远高于HAWQ等其他传统的方法
Towards spike-based machine intelligence with neuromorphic computing
Updated:2025年3月8日 at 15:06Published: 2024年3月26日 at 18:43
Nature上关于SNN的综述
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness
Updated:2025年3月8日 at 15:06Published: 2024年3月7日 at 13:27
Flash Attention，利用硬件结构加速Attention计算速度、减少内存占用的算法。核心是Tiling，Online Softmax和Kernel Fusion。
WWW: What, When, Where to Compute-in-Memory
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:34
一些关于存内计算的验证与思考。
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:33
谷歌的，第一篇完整跑通interger-only量化推理流程的工作。
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:33
SNN部署的硬件设计or evaluation benchmark。
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:32
From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron，减少资源消耗。
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:31
GEMM data mapping的介绍，主要是各种脉动阵列相关的加速器。
HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:30
模型量化经典方法，基于黑森矩阵，一种二阶信息的量化方法。
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:29
BISMO优化。
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:29
TVM。
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 17:49
Roofline model，描述一个系统的性能是受内存制约还是受计算制约。
A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 17:42
图神经网络在EDA领域应用的综述。
A Hardware-Software Blueprint for Flexible Deep Learning Specialization
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 16:33
VTA。
BISMO: A Scalable Bit Serial Matrix Multiplication Overlay for Reconfigurable Computing
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 14:31
BISMO。
Code Transpilation for Hardware Accelerators
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 14:29
基于Metalift，做的还很不完善。

Archives

SlowFast Networks for Video Recognition

DeltaCNN: End-to-End CNN Inference of Sparse Frame Differences in Videos

Phi: Leveraging Pattern-based Hierarchical Sparsity for High-Efficiency Spiking Neural Networks

Temporal Flexibility in Spiking Neural Networks: Towards Generalization Across Time Steps and Deployment Friendliness

A Simple Framework for Contrastive Learning of Visual Representations

QKFormer: Hierarchical Spiking Transformer using Q-K Attention

Transformers without Normalization

Visualizing and Understanding the Effectiveness of BERT

One-Minute Video Generation with Test-Time Training

Evolution Strategies as a Scalable Alternative to Reinforcement Learning

SparTA: Deep-Learning Model Sparsity via Tensor-with-Sparsity-Attribute

Scalable Diffusion Models with Transformers

初探AI Infra

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

SpikeCV: Open a Continuous Computer Vision Era

Neuromorphic computing at scale

Titans: Learning to Memorize at Test Time

Segment Anything

SDiT: Spiking Diffusion Model with Transformer

2024

ConvUNeXt:An efficient convolution neural network for medical image segmentation

Rethinking the Membrane Dynamics and Optimization Objectives of Spiking Neural Networks

ConvNext V2: Co-designing and Scaling ConvNets with Masked Autoencoders

A ConvNet for the 2020s

Were RNNs All We Needed?

SparseRT: Accelerating Unstructured Sparsity on GPUs for Deep Learning Inference

VPRTempo: A Fast Temporally Encoded Spiking Neural Network for Visual Place Recognition

Memory-Efficient Reversible Spiking Neural Networks

SpikeMba: Multi-Modal Spiking Saliency Mamba for Temporal Video Grounding

Integer-Valued Training and Spike-Driven Inference Spiking Neural Network for High-performance and Energy-efficient Object Detection

SNN视频流任务调研

SpikeZIP-TF: Conversion is All You Need for Transformer-based SNN

SpikingJelly: An open-source machine learning infrastructure platform for spike-based intelligence

I-LLM: Efficient Integer-Only Inference for Fully-Quantized Low-Bit Large Language Models

程序语言理论笔记

The Minimum Equivalent DNF Problem and Shortest Implicants

I-ViT: Integer-only Quantization for Efficient Vision Transformer Inference

Efficient and Effective Methods for Mixed Precision Neural Network Quantization for Faster, Energy-efficient Inference

Towards spike-based machine intelligence with neuromorphic computing

FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

WWW: What, When, Where to Compute-in-Memory

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures

A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications

A Hardware-Software Blueprint for Flexible Deep Learning Specialization

BISMO: A Scalable Bit Serial Matrix Multiplication Overlay for Reconfigurable Computing

Code Transpilation for Hardware Accelerators