Sequential Attention: Greedy Feature Selection That Actually Scales To Modern Neural Networks
Introduction Most “feature selection” stories start the same way. You have a model, you have a mountain of inputs, and you suspect half of them are doing more harm than good. Then you open a paper, see “NP-hard,” and quietly decide you will just throw everything into the network and let SGD sort it out. … Read more