New transformer architecture can make language models faster and resource-efficient

Trimming even a small fraction of their size can lead to significant cost reductions. To address this issue, researchers at ETH Zurich have unveiled a revised version of the transformer, the deep …