Joint AI Seminar / Computer Science Speaking Skills Talk

— 1:00pm

Location:
In Person - ASA Conference Room, Gates Hillmn 6115

Speaker:
ASHER TROCKMAN , Ph.D. Student, Computer Science Department, Carnegie Mellon University
http://ashertrockman.com/

Mimetic Initialization for Transformers and Convolutional Networks

While neural network weights are typically initialized randomly from univariate distributions, pre-trained weights often have visually-discernible multivariate structure. In recent work, we propose a technique called "mimetic initialization" that aims to replicate such structures when initializing convolutional networks and Transformers. We handcraft a class of multivariate Gaussian distribution to initialize filters for depthwise convolutional layers, and we initialize the query and key weights for self-attention layers such that their product approximates the identity. Mimetic initialization substantially reduces training time and increases final accuracy on various common benchmarks. 

Our technique enables us to almost close the gap between untrained and pre-trained Vision Transformers on small datasets like CIFAR-10, achieving up to a 6% gain in accuracy through initialization alone. For convolutional networks like ConvMixer and ConvNeXt, we observe improvements in accuracy and reductions in training time, even when convolutional filters are frozen (untrained) after initialization. Overall, our findings suggest that the benefits of pre-training can be separated into two components: serving as a good initialization and storing transferrable knowledge, with the former being simple enough to (at least partially) capture by hand in closed-form. 

Asher Trockman is a PhD student at Carnegie Mellon University advised by Zico Kolter. He researches deep learning for vision and deep learning phenomena generally. Presented as part of the AI Seminar Series. Presented in Partial Fulfillment of the CSD Speaking Skills Requirement

In Person and Zoom Participation.  See announcement.

Event Website:
http://www.cs.cmu.edu/~aiseminar/


Add event to Google
Add event to iCal