Tag: 视觉

All the articles with the tag "视觉".

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Published:2025年6月23日 at 17:47
看看Shift-Window Attention。
SlowFast Networks for Video Recognition
Updated:2025年5月30日 at 06:15Published: 2025年5月27日 at 16:57
多分支CNN，会不会有一些分支能学到更加相似的帧间变化？
Scalable Diffusion Models with Transformers
Published:2025年3月16日 at 16:29
Diffusion Transformer.
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Updated:2025年3月8日 at 14:57Published: 2025年3月4日 at 14:39
使用大kernel DS卷积替代self-attention。字节新加坡的工作。
Segment Anything
Updated:2025年3月8日 at 14:57Published: 2025年1月10日 at 13:48
Meta的SAM。
SDiT: Spiking Diffusion Model with Transformer
Updated:2025年3月8日 at 14:57Published: 2025年1月3日 at 14:10
脉冲Diffusion Transformer，里面的Transformer的结构是RWKV的。
ConvUNeXt:An efficient convolution neural network for medical image segmentation
Updated:2025年3月8日 at 14:57Published: 2024年12月31日 at 15:59
ConvNext + UNet，发在一个C刊上，借鉴学习一下，想想我的模块怎么设计。
ConvNext V2: Co-designing and Scaling ConvNets with Masked Autoencoders
Updated:2025年3月8日 at 14:57Published: 2024年12月17日 at 06:05
ConvNext续作，引入了MAE。
A ConvNet for the 2020s
Updated:2025年3月8日 at 14:57Published: 2024年12月16日 at 15:22
CVPR2022。Meta的工作，在ViT相关工作占视觉大头的情况下重构纯卷积的网络，并且取得了很好的效果。
LoCC工作总结
Updated:2025年3月8日 at 15:05Published: 2024年11月19日 at 02:21
老板找到idea到交稿只用了两个星期，第一次完整跟着做完一整篇论文的工作。

Tag: 视觉

Swin Transformer: Hierarchical Vision Transformer using Shifted Windows

SlowFast Networks for Video Recognition

Scalable Diffusion Models with Transformers

Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition

Segment Anything

SDiT: Spiking Diffusion Model with Transformer

ConvUNeXt:An efficient convolution neural network for medical image segmentation

ConvNext V2: Co-designing and Scaling ConvNets with Masked Autoencoders

A ConvNet for the 2020s

LoCC工作总结