Tag: 推理加速
All the articles with the tag "推理加速".
PowerInfer: Fast Large Language Model Serving with a Consumer-grade GPU
Updated: at 15:06Published: at 18:32From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron,减少资源消耗。
All the articles with the tag "推理加速".
From IPADS, 利用模型预测LLM中需要激活的MoE or Neuron,减少资源消耗。