Conversational interfaces and virtual assistants such as Apple's Siri, Google Now, and Facebook Graph Search, have led to a rising interest in systems that can translate natural language commands and questions to formal logical forms (like SQL queries) that can be executed against a knowledge base. A major challenge has been to scale these systems, known as semantic parsers, to large knowledge bases. In this talk, I will describe novel algorithms for large scale semantic parsing.
A fundamental characteristic of semantic parsing against large knowledge bases is that the space of possible logical forms grows quickly with the length of the input sentence. Our first algorithm learns to efficiently search through this space by explicitly scoring partial logical forms, combining ideas from agenda-based parsing and reinforcement learning. Compared to previous methods, our parser is almost an order of magnitude faster, while maintaining state-of-the-art accuracy. The second algorithm addresses the problem of language variability, that is, the fact that the same logical form can be expressed in a myriad of ways in natural language. We learn to paraphrase an input question ("Where is Obama from?") to a canonical form ("What is the place of birth of Barack Obama?") that can be easily mapped to a logical form. This allows us to exploit the large amounts of free text that are available on the web, leading to a state-of-the-art semantic parser that scales to a knowledge base containing hundreds of millions of facts.
This is joint work with Percy Liang.
Jonathan Berant is a post-doctoral fellow at Stanford's Department of Computer Science, and a member of The Stanford Natural Language Processing Group. He earned his B.Sc in computer science and linguistics, and Ph.D in computer science from Tel-Aviv University. Jonathan was an Azrieli fellow and an IBM fellow during his graduate studies, and a Rothschild fellow during his post-doctoral period. His work has been recognized by a best paper award in EMNLP 2014, a best student paper award in ACL 2011, and another two best paper nominations.