Show HN: A Natural Language Query Engine Without Machine Learning

by youngprogrammeron 10/8/2016, 11:02 AMwith 21 comments

by charlieegan3on 10/8/2016, 12:11 PM

I think you might get better results in the first stage using the dependency parse from CoreNLP - rather than the phrasal parse. Online demo at http://corenlp.run

If you're willing to drop CoreNLP there's also https://demos.explosion.ai/displacy/ that's worth checking out.

by steinsgateon 10/8/2016, 9:18 PM

Nice work! You said that you avoided machine learning because labeled data is hard to find. What about unsupervised approaches?

Frankly speaking, I am a bit skeptical about pattern matching algorithms for answering questions. It would help if you showed some kind of stats about your algorithm's performance on a diverse question set. For example, you can scrape simple quiz questions (and answers) from quiz sites [1] and report back on the performance.

[1] http://www.quiz-zone.co.uk/questionsbydifficulty/1/0/answers...

by drdecaon 10/8/2016, 9:01 PM

In addition to the questions it does answer well, it also has these answers:

Q: "What is purpose" A: "Justin Bieber album" Q: "What is a car?" A: "country in Africa" Q: "What is a male?" A: "capital of Maldives" Q: "What is a female?" A: "human who is female (use with Property:P21 sex or gender). For groups of females use with ''subclass of (P279)''"

my point in this comment is just to say that when it does give an odd answer, it can be funny, not to say that it sometimes gives odd answers.

by mrobon 10/8/2016, 9:18 PM

This seems almost completely useless. I tried ten questions, and only one was answered, incorrectly (Moby Dick question misunderstood, answered as "novel by Herman Melville"). I think even Ask Jeeves back in the 90s had better performance than this. Questions tried:

how many lines of resolution are there in an ntsc television signal?

what is the melting point of tin/lead eutetic solder?

what species of whale was moby dick?

what grain is most often used to make beer?

what is the boiling point of water?

how many chromosomes does a normal human have?

what animal is known as "man's best friend"?

what fps did id software release in 1993?

what is the largest known prime number?

what is the clock rate of the arduino uno?

As a comparison, Google gives 8 correct answers directly (either as an special info box, or as highlighted part of a web page), 1 correct answer as the 2nd search result (Doom), and 1 incorrect answer (largest known prime).

by imhon 10/8/2016, 8:25 PM

These things are always so interesting in their totally inhuman failure cases. It can tell me George Washington was born in 1732, but doesn't know which planet America is on (much less which planet George Washington was born on).

Also, it seems to have issues formatting dates before 1900 (for the bday one, the answer it returns is more of an error message than an answer: "year=1732 is before 1900; the datetime strftime() methods require year >= 1900")

by ecesenaon 10/8/2016, 5:34 PM

Partially related - has anyone worked on natural language queries with time expressions in it? Imagine analytics queries, where you want to count the number of events/unique users, given certain conditions, and in a certain time window. i'm particularly interested in the time aspect of it.

by fspeechon 10/8/2016, 5:44 PM

Have you studied Prolog? Its matching (logical unification) capability may give you some more ideas.

by greglindahlon 10/8/2016, 8:34 PM

Very interesting! Nice to see how little code it is. I wonder how much work it would be to get it to answer questions like "What is the biggest planet?" or fix that "Who was Prime Minister of Canada in 1945" drops "of Canada"?

by atokoon 10/8/2016, 5:07 PM

This is cool! I like how you've iterated on a central concept (NLP) with different codebases.

Tip: The link to the source is pointing to github pages, which hasn't been set up.

by mrcabadaon 10/8/2016, 6:25 PM

This is nice! Would it be possible to run the code with other language models? (Spanish, German, and any other CoreNLP language model)

by youngprogrammeron 10/8/2016, 7:42 PM

Demo should be working now. The stanford parser getting dying from running out of memory so I moved it to a another box

by billconanon 10/8/2016, 4:09 PM

This Is cool! Is it easy to convert a mediawiki to the graph store your system reads?

by alexcapson 10/8/2016, 8:30 PM

Couldn't tell me who the CEO of Apple is... :(