ML side notes: What upsets me when checking new update in SQuAD 2.0 Leaderboard?

I have been working with the NLP team of SAIL@UNIST for almost a half year on the question-answering problem. More specifically, we have been working in SQuAD 2.0 problem where an agent is prompted a paragraph and a relevant question, and it is asked to answer the question by selecting a span from the paragraph; the question might be unanswerable from the paragraph, and the agent should be able to detect such a case as well. Basically, we have followed the SQuAD from its beginning when SQuAD 2.0 had even not been released yet and we have worked hard on the problem since then. That said, to some extent, we have witnessed "trendy" approaches that NLP communities have been taken to attack the SQuAD problem. If I would say about it in a few words, that would be bidirectional attention, language modeling (e.g., ElMO), and an architecture (e.g., Transformer) alternative to RNN for modeling long-range dependency have be proven to be useful ideas for reading comprehension. But I am not going to summarize such approaches in this post. Instead, I will share some of my major opinions in this problem and our current research progress for this problem.

We made a model for SQuAD 2.0 and its performance is fairly good. After that, we have added linguistics features (especially my colleague at SAIL@UNIST), and tried on different learning objectives but have not witnessed any significant improvement. At that point, I doubted that the model saturates and more novel ideas would be critical for any improvement. I have hypothesized that a simple but efficient deep architecture and transfer learning are the keys to question-answering problem. That hypothesis led me to work on Transformer and transfer learning. At the moment that I had made my vanilla version of Transformer for SQuAD 2.0 and was excited to go further with the idea, what upsets me is, however, I discover that Google AI Language Team have just recently released their BERT model which basically follows the same philosophy in my hypothesis, even the idea of bidirectional attention for reading comprehension (previously unidirectional attention is used for neural translation machine, when adapting it to reading comprehension, it is natural to consider bidirectional attention). BERT achieved the state-of-the-art on SQuAD 2.0 and has so far become the first algorithm that bypasses human performance in SQuAD 2.0. This is not the first time I have experienced something like this. Back in the time I worked in Pedestrian Detection, I had an idea of an end-to-end network for box prediction that can be trained end-to-end and would predict bounding box at any position. After that, I discovered YOLO model!

I would say that BERT and YOLO have completed my premature ideas further sometimes into a crystal form and designed excellent experimental protocols to justify the feasibility of the ideas. However, the feeling of being lagged behind on my own idea in research is not easy to take. I am kinda not sure what to do after this incident: If I keep following the direction, it is very highly unlikely that I would do any better than the Google team; otherwise, if I am looking for another idea, what guarantees that again some other top NLP group would not come up with same idea and be faster in progressing the idea into a publishable work, just like what happened with the transformer + transfer learning + bidirectional attention idea that Google AI Language Team has just taken into BERT?

1 comment:

t.estsetApril 28, 2019 at 6:06 PM
Update #1: This makes me realize probably I do not have an big enough encourage to pursuit an idea. Maybe many people can also see that possibility on the surface, but what distinguishes an excellent scientist from the rest is the ability to go deeper with an analysis (e.g., see things clear in a deeper level, not just a high-level abstraction). Probably, I am lack of this ability and need to improve it.

Wednesday, October 17, 2018

What upsets me when checking new update in SQuAD 2.0 Leaderboard?

1 comment: