Semisupervised Autoencoder for Sentiment Analysis
Autoencoders have attracted a lot of attention as a building block of Deep Learning and modeling textual data. They act as the feature learning methods by reconstructing inputs with respect to a given loss function. In a neural network implementation of autoencoders, the hidden layer is taken as the learned feature. While it is often trivial to obtain good reconstructions with plain autoencoders, much effort has been devoted on regularizations in order to prevent them against overfitting. However, little attention has been devoted to the loss function, which is important for modeling textual data. Traditional autoencoders suffer from at least two aspects: scalability with the high dimensionality of vocabulary size and dealing with task-irrelevant words. The present technology addresses the aforementioned problem by introducing supervision via the loss function of autoencoders. In particular, a linear classifier is first trained on the labeled data, then define a loss for the autoencoder with the weights learned from the linear classifier. To reduce the bias brought by one single classifier, a posterior probability distribution is defined on the weights of the classifier, and derive the marginalized loss of the autoencoder with Laplace approximation. Choice of loss function can be rationalized from the perspective of Bregman Divergence, which justifies the soundness of the model. The effectiveness of the model is evaluated on six sentiment analysis datasets, and shows that the model significantly outperforms all the competing methods with respect to classification accuracy. The model takes advantage of unlabeled dataset and get improved performance. Finally, the model successfully learns highly discriminative feature maps, which explains its superior performance.
- Faster and more accurate over competing technologies.
- Outperforms “Bag of Words”, a traditional Denoising Autoencoder, and other competing methods.
- Learns highly discriminative feature maps.
- System can learn the orthogonal concepts, using traditional machine learning technologies.
- System provides integrity, comprehensiveness and universality (entire series of applications accessible).
- Minimal training data needed.
Binghamton University RB493