起居室老虎

Tag: 推理加速

All the articles with the tag "推理加速".

PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated:2025年3月8日 at 15:06Published: 2024年3月4日 at 18:32
From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron，减少资源消耗。