Joe did not buy a car today.
He was in buying mood.
But all cars were too expensive.
Why didn't Joe buy a car?
Answer: buying mood
I think I have seen similar systems for decades now. I thought we would be further along meanwhile.
I have tried for 10 or 20 minutes now. But I can't find any evidence that it has much sense of syntax:
Paul gives a coin to Joe.
Who received a coin?
Answer: Paul
All it seems to do is to extract candidates for "who", "what", "where" etc. So it seems to figure out correctly that "Paul" is a potential answer for "Who".
No matter how I rephrase the "Who" question, I always get "Paul" as the answer. "Who? Paul!", "Who is a martian? Paul!", "Who won the summer olympics? Paul", "Who got a coin from the other guy? Paul!"
Same for "what" questions:
Gold can not be carried in a bag. Silver can.
What can be carried in a bag?
Answer: Gold
Sadly, the NLP world is full of hot air. I've seen so many companies get funding for complete "written by a 12-year old" dogshit "industry leading IP", it's not even funny anymore.
The hype has gone down and some are actually doing great work, but 90% of the people who say they do NLP/AI stuff don't even fundamentally understand what NLP/AI is.
Sadly, I'd fully agree to this. Things are possible now that were not 10 years ago. But mostly, only performance increased on things we could do 10 years ago, while hardly any new abilities came along. Machine translation, linguistic parsing, etc. came a long way. But we still can't do satisfactory abstractive summarization or create a conversational agent for more than an extremely narrow domain. Yet, at least the things we can do can be done at levels that are "production ready".
I still hold hope. However, it seems naively exploit the function approximation capacity we have with deep learning can only go that far to understand our own language.
Maybe we need to look back and start from beginning and ask ourself: How does human learn, exactly? How do we learn with so few examples? How do we jointly learn image/audio/video/language with only one brain?
Computation won't help if we don't have the right representations. Arguably computation can help us discover the right representations but the space of possible representations is very, very large.
All of the above require fairly complex world knowledge as well as an explicit representation of a scene. There is minimal leverage for lexical distributional statistics in these cases—arguably the one thing we have had major success in using (e.g. building vector space word representations, like Word2Vec; finding the highest probability parse tree for an utterance).
If you look at example questions from the dev set[1], you'll realize that they all use the same words as the sentence containing the answer. Additionally, the topics aren't everyday stuff, but something you'd write a Wikipedia article about. So I guess the model just learns to find the sentence most similar to the question and then selects an answer based on a coarse categorization, which fails when it is presented with unseen situations.
Your example works if you rephrase the question to be more similar to the text:
Paul gives a coin to Joe.
Whom does Paul give a coin to?
Answer: Joe
You can cut the question down to "gives to?" or "coin to?", because that's enough to single out the answer. But as soon as you use s̶y̶n̶o̶n̶y̶m̶s̶ (EDIT: related words) that are not recognized (like "receive"), you have no chance of getting a meaningful answer.
The "Who did What" dataset seems much better in this respect:
Passage: Britain’s decision on Thursday to drop extradition proceedings against Gen. Augusto Pinochet and allow him
to return to Chile is understandably frustrating ... Jack Straw, the home secretary, said the 84-year-old former dictator’s
ability to understand the charges against him and to direct his defense had been seriously impaired by a series of strokes.
... Chile’s president-elect, Ricardo Lagos, has wisely pledged to let justice run its course. But the outgoing government of
President Eduardo Frei is pushing a constitutional reform that would allow Pinochet to step down from the Senate and retain
parliamentary immunity from prosecution. ...
Question: Sources close to the presidential palace said that Fujimori declined at the last moment to leave the country and
instead he will send a high level delegation to the ceremony, at which Chilean President Eduardo Frei will pass the mandate
to XXX.
Choices: (1) Augusto Pinochet (2) Jack Straw (3) Ricardo Lagos
Yes, that might be a more accurate description. It picks the "who" thingy from the most similar context.
With zero further understanding as it seems:
Paul gives no coin to Marray. Paul gives a coin to Joe.
Who got something from Paul?
Anser: Marray
Paul gives no coin to Marray. Paul gives a coin to Joe.
Who received a coin?
Answer: Paul
Heavily trained on SQuAD questions. There are lots of models out there that are very good at recognizing SQuAD questions, and reverse-engineering the predictable ways that the Turkers who wrote the questions pulled the information out of the paragraph -- allowing them to answer the question without ever understanding it. https://arxiv.org/abs/1707.07328
The difference with new NN-based systems is that they are trained end-to-end, learn the syntax and some form of "reasoning". Check Memory Networks, by facebook, for example (two NNs, one for "reasoning" and one for storing long-term data, quite impressive).
Now, it's still an area of active research... and I'm not sure what "state-of-the-art" means for this library, somebody said that they rank #27th in some commonly used dataset.
I am working with Memory Networks as part of my thesis. If you actually read and implement the FB paper you realise that the system is not half great as the demo shows. It is as bad as the top comment here. Yes we have come far along but frankly the hype is too high.
According to the website they use the BiDAF model, which as a single model does not produce state-of-the-art results on the SQuAD benchmark. It is ranked 27th here: https://rajpurkar.github.io/SQuAD-explorer/
This is very brittle: it works really well on the pre-canned examples but the vocabulary seems very tightly linked. It doesn't handle something as simple as:
'the patient had no pain but did have nausea'
Doesn't yield any helpful on semantic role labeling and didn't even parse on machine comprehension. If I vary it to say ask 'did the patient have pain?' the answer is 'nausea'.
CoreNLP provides much more useful analysis of the phrase structure and dependencies.
In "Adversarial Examples for Evaluating Reading Comprehension Systems" https://arxiv.org/abs/1707.07328, it was found that adding a single distracting sentence can lower F1 score of BiDAF (which is used in demo here) from 75.5% to 34.3% on SQuAD. In comparison, human performance goes from 92.6% to 89.2%.
Different set of tasks. SpaCy is focused on bread-and-butter tasks like tokenization, part of speech tagging, and dependency parsing (not to say that these are easy, but that they are things people have been working on a long time). AllenNLP seems focused on distributing relatively recent neural models (last few years) of more complex language understanding like labeling semantic roles (agents, patients, etc.) and identifying textual entailments (=mining facts from a sentence). It is not great at these tasks, because this is v. difficult and a very active area of ongoing research.
I have tried for 10 or 20 minutes now. But I can't find any evidence that it has much sense of syntax:
All it seems to do is to extract candidates for "who", "what", "where" etc. So it seems to figure out correctly that "Paul" is a potential answer for "Who".No matter how I rephrase the "Who" question, I always get "Paul" as the answer. "Who? Paul!", "Who is a martian? Paul!", "Who won the summer olympics? Paul", "Who got a coin from the other guy? Paul!"
Same for "what" questions: