Dorian: Analogical Portraiture

Hand-crafted formal ontologies have many practical uses. So too do lightweight lexical ontologies like WordNet, since each offers a semantic picture of the world than can be leveraged by a machine to achieve intelligent behaviour.

But such ontologies typically offer a reduced picture of the world, in which objects have a single correct categorization, or, at any rate, a small number of acceptable categorizations. But the real world is not so rigid, and everyday objects and entities can have very many different categorizations, depending on time, place and categorizing agent. In one context, Bill Gates may be an example of a powerful billionaire, in another an example of a software genius or business tycoon, and in another Microsoft’s CEO (or chairman, or chief architect, and so on).

This multiplicity of categorization is perhaps easiest to observe when talking about people, especially well-known people, since our categories tend to convey our subjective opinions just as much as objective facts.

Dorian is a knowledge-base that explores this multiplicity of categorization when dealing with proper-named entities. Dorian’s knowledge-base of proper-named entities is harvested from the Google n-grams, and associates entities with the categories that speakers most commonly attribute to them. Dorian’s knowledge-base is supplemented by the category-system in Wikipedia, which adopts a less subjective, curated approach to categorization.

Dorian uses these categories to perform analogical reasoning. It learns its analogical transfer rules from large text corpora (such as the Google n-grams again) by first identifying cliques of concepts that it considers comparable and interchangeable (such as London and Paris, or Java and Perl, or DC and Marvel, or Playboy and Penthouse) and generalizing these cliques to the category-level (so that e.g., Java_inventor is analogous to Perl_inventor, implying that James_Gosling is analogous to Larry_Wall, and so on).

Dorian also identifies cliques of proper-named entities in corpora (such as Roger_Federer and Rafael_Nadal) and attempts to establish analogical transfer rules that will make these clique-members analogous at the category-level.

Click here to explore Dorian’s world for yourself …