Posts

All the articles I've posted.

WWW: What, When, Where to Compute-in-Memory
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:34
一些关于存内计算的验证与思考。
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:33
谷歌的，第一篇完整跑通interger-only量化推理流程的工作。
SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:33
SNN部署的硬件设计or evaluation benchmark。
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:32
From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron，减少资源消耗。
Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:31
GEMM data mapping的介绍，主要是各种脉动阵列相关的加速器。
HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:30
模型量化经典方法，基于黑森矩阵，一种二阶信息的量化方法。
Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:29
BISMO优化。
TVM: An Automated End-to-End Optimizing Compiler for Deep Learning
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:29
TVM。
Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 17:49
Roofline model，描述一个系统的性能是受内存制约还是受计算制约。
A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 17:42
图神经网络在EDA领域应用的综述。

Posts

WWW: What, When, Where to Compute-in-Memory

Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference

SpikeSim: An end-to-end Compute-in-Memory Hardware Evaluation Tool for Benchmarking Spiking Neural Networks

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU

Evaluating Spatial Accelerator Architectures with Tiled Matrix-Matrix Multiplication

HAWQ: Hessian Aware Quantization of Neural Networks with Mixed-Precision

Optimizing Bit-Serial Matrix Multiplication for Reconfigurable Computing

TVM: An Automated End-to-End Optimizing Compiler for Deep Learning

Roofline: An Insightful Visual Performance Model for Floating-Point Programs and Multicore Architectures

A Comprehensive Survey on Electronic Design Automation and Graph Neural Networks: Theory and Applications