With the growth of the neural networks, DNN is showing a good performance in text, image, audio data.
However, for the everyday tabular data, we’ve yet to see the success compare to XGB LGBM.

Tree-based models have advantages in:

  • decision manifolds, like a super hyperplane (infinite expandable, cuts the table well)
  • interpretable
  • fast to train


  • able to encode data (representation learning)
  • reduce feature engineering
  • online learning

discuss: reasons why DNN not performant on tabular data

  • non-linear, not restriction on convergence, easy overfit compare to tree ensemble
  • adding more layers can cause overparameterization, this may be why it isn’t performant in tabular data.

How good would it be if we can have a framework that’s both end-to-end, representation and can perform online update for tabular data

build decision boundary using DNN



we can treat the mask as a decision boundary

the mask+FC+ReLU is like a vanilla decision tree
this is an additive model, each output represents the weight of each condition that affects the final decision

Tabnet model structure

this is a more complex additive model (e.g. given input batch x feature -> single vector)


  • BN: batch normalization
  • Feature transformer: =FC, calculate feature embedding
  • Split
  • Attention transformer

Overall: Tabnet uses sequential multi-step to construct an additive NN framework

Attentive transformer

  • use last step result to calculate current step Mask, and tries its best to make sure the Mask is sparse and non-repetitive
  • different data point uses different mask allows, different data to use the different feature (instance-wise) (tree: batch)

Feature transformer

  • performs the feature selection of the current step

Tabnet evaluation

Paper data/evaluation:

Performs very similar or better than gbm based models

private data: offer activation prediction

tabular data contains customer profile and sale records

LGBM roc, 0.83
Tabnet, 0.8

Tabnet has 100x the machine cost to achieve the same training speed compare to LGBM


data: gene expression


target: the sample had a positive response for each MoA target.

All top models use Tabnet in their ensemble, but Tabnet doesn’t contribute to large weighting.

Tabnet does show a good promise of using NN for tabular data. Although the performance is still not comparable to GBM models, it can still serve as the online learning component in mixed ensembles for tabular data.