Papers¶

This chapter is associated with the papers published in NLP using deep learning.

Data Representation¶

One-hot representation¶

Character-level convolutional networks for text classification : Promising results by the use of one-hot encoding possibly due to their character-level information. [Paper link , Torch implementation , TensorFlow implementation , Pytorch implementation]

Effective Use of Word Order for Text Categorization with Convolutional Neural Networks : Exploiting the 1D structure (namely, word order) of text data for prediction. [Paper link , Code implementation]

Neural Responding Machine for Short-Text Conversation : Neural Responding Machine has been proposed to generate content-wise appropriate responses to input text. [Paper link , Paper summary]

Continuous Bag of Words (CBOW)¶

Distributed Representations of Words and Phrases and their Compositionality : Not necessarily about CBOWs but the techniques represented in this paper can be used for training the continuous bag-of-words model. [Paper link , Code implementation 1, Code implementation 2]

Word-Level Embedding¶

Efficient Estimation of Word Representations in Vector Space : Two novel model architectures for computing continuous vector representations of words. [Paper link , Official code implementation]

GloVe: Global Vectors for Word Representation : Combines the advantages of the two major models of global matrix factorization and local context window methods and efficiently leverages the statistical information of the content. [Paper link , Official code implementation]

Skip-Thought Vectors : Skip-thought model applies word2vec at the sentence-level. [Paper , Code implementation, TensorFlow implementation]

Character-Level Embedding¶

Learning Character-level Representations for Part-of-Speech Tagging : CNNs have successfully been utilized for learning character-level embedding. [Paper link ]

Deep Convolutional Neural Networks forSentiment Analysis of Short Texts : A new deep convolutional neural network has been proposed for exploiting the character- to sentence-level information for sentiment analysis application on short texts. [Paper link ]

Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation : The usage of two LSTMs operate over the char- acters for generating the word embedding [Paper link ]

Improved Transition-Based Parsing by Modeling Characters instead of Words with LSTMs : The effectiveness of modeling characters for dependency parsing. [Paper link ]

Applications¶

Part-Of-Speech Tagging¶

Learning Character-level Representations for Part-of-Speech Tagging : A deep neural network (DNN) architecture that joins word-level and character-level representations to perform POS taggin [Paper]
Bidirectional LSTM-CRF Models for Sequence Tagging : A variety of neural network based models haves been proposed for sequence tagging task. [Paper, Code Implementation 1, Code Implementation 2]
Globally Normalized Transition-Based Neural Networks : Transition-based neural network model for part-of-speech tagging. [Paper]

Parsing¶

A fast and accurate dependency parser using neural networks : A novel way of learning a neural network classifier for use in a greedy, transition-based dependency parser. [Paper, Code Implementation 1]
Simple and Accurate Dependency Parsing Using Bidirectional LSTM Feature Representations : A simple and effective scheme for dependency parsing which is based on bidirectional-LSTMs. [Paper]
Transition-Based Dependency Parsing with Stack Long Short-Term Memory : A technique for learning representations of parser states in transition-based dependency parsers. [Paper]
Deep Biaffine Attention for Neural Dependency Parsing : Using neural attention in a simple graph-based dependency parser. [Paper]
Joint RNN-Based Greedy Parsing and Word Composition : A greedy parser based on neural networks, which leverages a new compositional sub-tree representation. [Paper]

Named Entity Recognition¶

Neural Architectures for Named Entity Recognition : Bidirectional LSTMs and conditional random fields for NER. [Paper]
Boosting named entity recognition with neural character embeddings : A language-independent NER system that uses automatically learned features. [Paper]
Named Entity Recognition with Bidirectional LSTM-CNNs : A novel neural network architecture that automatically detects word- and character-level features. [Paper]

Semantic Role Labeling¶

End-to-end learning of semantic role labeling using recurrent neural networks : The use of deep bi-directional recurrent network as an end-to-end system for SRL. [Paper]

Text classification¶

Convolutional Neural Networks for Sentence Classification : By training the model on top of the pretrained word-vectors through finetuning, considerable improvement has been reported for learning task-specific vectors. [Paper link , Code implementation 1, Code implementation 2, Code implementation 3, Code implementation 4]

A Convolutional Neural Network for Modelling Sentences : Dynamic Convolutional Neural Network (DCNN) architecture, which technically is the CNN with a dynamic k-max pooling method, has been proposed for capturing the semantic modeling of the sentences. [Paper link , Code implementation]

Very Deep Convolutional Networks for Text Classification : The Very Deep Convolutional Neural Networks (VDCNNs) has been presented and employed at character-level with the demonstration of the effectiveness of the network depth on classification tasks [Paper link ]

Character-level convolutional networks for text classification : The character-level representation using CNNs investigated which argues the power of CNNs as well as character-level representation for language-agnostic text classification. [Paper link , Torch implementation , TensorFlow implementation , Pytorch implementation]

Multichannel Variable-Size Convolution for Sentence Classification : Multichannel Variable Size Convolutional Neural Network (MV-CNN) architecture Combines different version of word-embeddings in addition to employing variable-size convolutional filters and is proposed in this paper for sentence classification. [Paper link]

A Sensitivity Analysis of (and Practitioners’ Guide to) Convolutional Neural Networks for Sentence Classification : A practical sensitivity analysis of CNNs for exploring the effect of architecture on the performance, has been investigated in this paper. [Paper link]

Generative and Discriminative Text Classification with Recurrent Neural Networks : RNN-based discriminative and generative models have been investigated for text classification and their robustness to the data distribution shifts has been claimed as well. [Paper link]

Deep sentence embedding using long short-term memory networks: Analysis and application to information retrieval : An LSTM-RNN architecture has been utilized for sentence embedding with special superiority in a defined web search task. [Paper link]

Hierarchical attention networks for document classification : Hierarchical Attention Network (HAN) has been presented and utilized to capture the hierarchical structure of the text by two word- level and sentence-level attention mechanism. [Paper link , Code implementation 1 , Code implementation 2 , Code implementation 3, Summary 1, Summary 2]

Recurrent Convolutional Neural Networks for Text Classification : The combination of both RNNs and CNNs is used for text classification which technically is a recurrent architecture in addition to max-pooling with an effective word representation method and demonstrates superiority compared to simple windows-based neural network approaches. [Paper link , Code implementation 1 , Code implementation 2 , Summary]

A C-LSTM Neural Network for Text Classification : A unified architecture proposed for sentence and document modeling for classification. [Paper link ]

Sentiment Analysis¶

Domain adaptation for large-scale sentiment classification: A deep learning approach : A deep learning approach which learns to extract a meaningful representation for each online review. [Paper link]
Sentiment analysis: Capturing favorability using natural language processing : A sentiment analysis approach to extract sentiments associated with polarities of positive or negative for specific subjects from a document. [Paper link]
Document-level sentiment classification: An empirical comparison between SVM and ANN : A comparison study. [Paper link]
Learning semantic representations of users and products for document level sentiment classification : Incorporating of user- and product- level information into a neural network approach for document level sentiment classification. [Paper]
Document modeling with gated recurrent neural network for sentiment classification : A a neural network model has been proposed to learn vector-based document representation. [Paper, Implementation]
Semi-supervised recursive autoencoders for predicting sentiment distributions : A novel machine learning framework based on recursive autoencoders for sentence-level prediction. [Paper]
A convolutional neural network for modelling sentences : A convolutional architecture adopted for the semantic modelling of sentences. [Paper]
Recursive deep models for semantic compositionality over a sentiment treebank : Recursive Neural Tensor Network for sentiment analysis. [Paper]
Adaptive recursive neural network for target-dependent twitter sentiment classification : AdaRNN adaptively propagates the sentiments of words to target depending on the context and syntactic relationships. [Paper]
Aspect extraction for opinion mining with a deep convolutional neural network : A deep learning approach to aspect extraction in opinion mining. [Paper]

Machine Translation¶

Learning phrase representations using RNN encoder-decoder for statistical machine translation : The proposed RNN Encoder–Decoder with a novel hidden unit has been empirically evaluated on the task of machine translation. [Paper, Code, Blog post]
Sequence to Sequence Learning with Neural Networks : A showcase of NMT system is comparable to the traditional pipeline by Google. [Paper, Code]
Google’s Neural Machine Translation System: Bridging the Gap between Human and Machine Translation : This work presents the design and implementation of GNMT, a production NMT system at Google. [Paper, Code]
Neural Machine Translation by Jointly Learning to Align and Translate : An extension to the encoder–decoder model which learns to align and translate jointly by attention mechanism. [Paper]
Effective Approaches to Attention-based Neural Machine Translation : Improvement of attention mechanism for NMT. [Paper, Code]
On the Properties of Neural Machine Translation: Encoder-Decoder Approaches : Analyzing the properties of the neural machine translation using two models; RNN Encoder–Decoder and a newly proposed gated recursive convolutional neural network. [Paper]
On Using Very Large Target Vocabulary for Neural Machine Translation : A method that allows to use a very large target vocabulary without increasing training complexity. [Paper]
Convolutional sequence to sequence learning : An architecture based entirely on convolutional neural networks. [Paper, Code[Torch], Code[Pytorch], Post]
Attention Is All You Need : The Transformer: a novel neural network architecture based on a self-attention mechanism. [Paper, Code, Accelerating Deep Learning Research with the Tensor2Tensor Library, Transformer: A Novel Neural Network Architecture for Language Understanding]

Summarization¶

A Neural Attention Model for Abstractive Sentence Summarization : A fully data-driven approach to abstractive sentence summarization based on a local attention model. [Paper, Code, A Read on “A Neural Attention Model for Abstractive Sentence Summarization”, Blog Post, Paper notes,]
Get To The Point: Summarization with Pointer-Generator Networks : A novel architecture that augments the standard sequence-to-sequence attentional model by using a hybrid pointer-generator network that may copy words from the source text via pointing and using coverage to keep track of what has been summarized. [Paper, Code, Video, Blog Post]
Abstractive Sentence Summarization with Attentive Recurrent Neural Networks : A conditional recurrent neural network (RNN) based on convolutional attention-based encoder which generates a summary of an input sentence. [Paper]
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond : Abstractive text summarization using Attentional Encoder-Decoder Recurrent Neural Networks [Paper]
A Deep Reinforced Model for Abstractive Summarization : A neural network model with a novel intra-attention that attends over the input and continuously generated output separately, and a new training method that combines standard supervised word prediction and reinforcement learning (RL). [Paper]

Question Answering¶

Towards AI-Complete Question Answering: A Set of Prerequisite Toy Tasks : An argue for the usefulness of a set of proxy tasks that evaluate reading comprehension via question answering. [Paper]
Teaching Machines to Read and Comprehend : addressing the lack of real natural language training data by introducing a novel approach to building a supervised reading comprehension data set. [Paper]
Ask Me Anything Dynamic Memory Networks for Natural Language Processing : Introducing the dynamic memory network (DMN), a neural network architecture which processes input sequences and questions, forms episodic memories, and generates relevant answers [Paper]