A very good read from a respected source!

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.

Fast Feedforward Networks

We break the linear link between the layer size and its inference cost by introducing the fast feedforward (FFF) architecture, a log-time alternative to feedforward networks. We demonstrate that FFFs are up to 220x faster than feedforward networks, up to 6x faster than mixture-of-experts networks, and exhibit better training properties than mixtures of experts thanks to noiseless conditional execution. Pushing FFFs to the limit, we show that they can use as little as 1% of layer neurons for inference in vision transformers while preserving 94.2% of predictive performance. Download

fastfeedforward

A repository for fast feedforward (FFF) networks. Fast feedforward layers can be used in place of vanilla feedforward and mixture-of-expert layers, offering inference time that grows only logarithmically in the training width of the layer. Github

Learn more:

ETH Zurich Researchers Introduce the Fast Feedforward (FFF) Architecture: A Peer of the Feedforward (FF) Architecture that Accesses Blocks of its Neurons in Logarithmic Time

By Tanya Malhotra – September 26, 2023

The introduction of incredible Large Language Models (LLMs) has been nothing short of groundbreaking in the field of Artificial Intelligence. The way humans engage with technology has changed as a result of these complex algorithms, which are powered by enormous amounts of data and computer power. AI is changing the way humans interact with machines, and with the power of LLMs, a number of domains are getting revolutionized.

Transformer models need feedforward layers, as they are crucial for the performance of the model. These layers are responsible for transforming input data and are central to the model’s performance. Transformer models have expanded in size in recent years, and their feedforward layers now include tens of thousands of hidden neurons. Finding strategies to accelerate feedforward layer calculations is crucial since the growth in model size has resulted in higher computational expenses during inference.

Only a small portion of the feedforward hidden neurons are required in very large networks in order to determine the output for a given input. In response to this insight, efforts have been made to create modular networks that make use of this phenomenon. Recent studies in this domain have concentrated on architectural layouts that encourage feedforward layer sparsity. These designs require training a gating layer to select which experts to use during inference and subdividing the feedforward layer into distinct blocks of neurons. This method increases training complexity and cuts down on inference time, but it also relies on noisy gating.

As an alternative to the existing approaches, a team of two researchers from ETH Zurich has introduced Fast Feedforward (FFF) architecture. FFF uses a differentiable binary tree, separating the input space into multiple regions while concurrently learning each sector’s borders and the associated neural blocks. Compared to conventional feedforward layers and modularization techniques, FFF has advantages. It reduces the inference time as it can access specific blocks of neurons in logarithmic time. This is in contrast to earlier methods’ linear scaling of the feedforward layer’s width.

FFF has been compared to the Mixture-of-Experts (MoE) approach, which also uses expert blocks but involves noisy gating. FFF avoids this noise and achieves faster inference with reduced computational complexity. The researchers have also highlighted the impressive speed gains achieved by FFF. It states that FFFs can be up to 220 times faster than traditional feedforward networks, which indicates a substantial improvement in computational efficiency. As an example, the use of FFFs in vision transformers has been highlighted, asserting that FFFs have the potential for use in vision-related activities because they can maintain 94.2% of prediction performance while using only 1% of the neurons.

In conclusion, FFF’s design is definitely a groundbreaking method for enhancing neural networks’ computational effectiveness. It outperforms mixture-of-experts networks and greatly shortens inference time when compared to typical feedforward networks. The training characteristics of FFFs, such as noiseless conditional execution, and their capacity to attain good prediction accuracy with low neuron usage are also the primary features. These developments have the potential to speed up and improve the performance of huge models, revolutionizing the deep-learning industry.


Check out the Paper and GithubAll Credit For This Research Goes To the Researchers on This Project. Also, don’t forget to join our 30k+ ML SubReddit, 40k+ Facebook Community, Discord Channel, and Email Newsletter, where we share the latest AI research news, cool AI projects, and more.

If you like our work, you will love our newsletter..

FOR EDUCATIONAL AND KNOWLEDGE SHARING PURPOSES ONLY. NOT-FOR-PROFIT. SEE COPYRIGHT DISCLAIMER.