Skip to Main content Skip to Navigation
Theses

Harvesting commonsense and hidden knowledge from web services

Abstract : In this thesis, we harvest knowledge of two different types from online resources . The first one is commonsense knowledge, i.e. intuitive knowledge shared by most people like ``the sky is blue''. We extract salient statements from query logs and question-answering by carefully designing question patterns. Next, we validate our statements by querying other web sources such as Wikipedia, Google Books, or image tags from Flickr. We aggregate these signals to create a final score for each statement. We obtain a knowledge base, QUASIMODO, which, compared to its competitors, has better precision and captures more salient facts.The other kind of knowledge we investigate is hidden knowledge, i.e. knowledge not directly given by a data provider. More concretely, some Web services allow accessing the data only through predefined access functions. To answer a user query, we have to combine different such access functions, i.e., we have to rewrite the query in terms of the functions. We study two different scenarios: In the first scenario, the access functions have the shape of a path, the knowledge base respects constraints called ``Unary Inclusion Dependencies'', and the query is atomic. We show that the problem is decidable in polynomial time, and we provide an algorithm with theoretical evidence. In the second scenario, we remove the constraints and create a new class of relevant plans called "smart plans". We show that it is decidable to find these plans and we provide an algorithm.
Complete list of metadata

Cited literature [155 references]  Display  Hide  Download

https://tel.archives-ouvertes.fr/tel-02979523
Contributor : Abes Star :  Contact
Submitted on : Tuesday, October 27, 2020 - 10:54:25 AM
Last modification on : Thursday, December 10, 2020 - 4:49:40 PM
Long-term archiving on: : Thursday, January 28, 2021 - 6:38:43 PM

File

93993_ROMERO_2020_archivage.pd...
Version validated by the jury (STAR)

Identifiers

  • HAL Id : tel-02979523, version 1

Collections

Citation

Julien Romero. Harvesting commonsense and hidden knowledge from web services. Artificial Intelligence [cs.AI]. Institut Polytechnique de Paris, 2020. English. ⟨NNT : 2020IPPAT032⟩. ⟨tel-02979523⟩

Share

Metrics

Record views

209

Files downloads

480