Probase is an ongoing project that focuses on knowledge acquisition and knowledge serving. The primary goal is to enable machines to understand human behavior and human communication. It’s powered by a new graph database called Trinity, which is also a Microsoft Research project.
Compared with other knowledgebases, Probase is unique in two aspects. First, Probase has an extremely large concept/category space (2.7 million categories). As these concepts are automatically acquired from web pages authored by millions of users, it is probably true that they cover most concepts in our mental world (about worldly facts). Second, data in Probase, as knowledge in our mind, is not black or white. Probase quantifies the uncertainty. These serve as the priors and likelihoods that become the foundations of probabilistic reasoning in Probase. With this probabilistic Probase, we build several interesting applications, such as topic search, web table search and document understanding
Knowledge in Probase
But Probase is much more than a traditional ontology/taxonomy, which can be seen in three dimensions: the concept dimension,the data dimension, and the relationship dimension.
Probase is extremely rich in concepts. The core taxonomy alone contains about 2.7 million concepts. Probase has a much larger concept space than existing open domain taxonomies.
Probase has a large data space. As an example,Cyc contains about two dozen painters, while Probase has close to 1,000 of them ordered by their popularity. Furthermore, we regard external data, such as the Web, Freebase, DBPedia, dictionaries and encyclopedias,IMDB, Amazon, etc., as evidences that can add to or modify the claims and beliefs in Probase. This means Probase is able to integrate information of varied quality from heterogeneous data sources.
Probase also has a large relationship space. The most important relationships are the isA relationship (e.g. a painter is anartist), the part-whole relationship (e.g., engine is part of a car), and the similarity relationship (e.g., emerging markets, as a concept, is closely related to newly industrialized countries).
via Microsoft Research