Probabilistic Formal Systems

Fr De

Formal systems are used for inferring theorems from axioms according to a set of rules. These rules, which are used for carrying out the inference, are the logical calculus of the formal system. A formal system is essentially an „axiomatic system“.

With the help of a formal system, interesting scenarios can be modelled and problems on these formulated. In a formal system, formulas describe axioms and dependencies. Querying a formal system can solve possibly difficult problems (e.g. checking the truth of an axiom with respect to complex and huge scenarios) using a reasoner, which may be made exactly for that formal system or be implemented for solving more general problems (eventually at the cost of lower performance).

In a formal system formulas are formal expressions denoting axioms. Axioms are per se always true. Axioms define facts from which we want to start defining a scenario. Truth in a formal system is a value defined by a function which assigns each formula one of the values {true, false} inside the system. Truth is constructed (defined) on derivation of formulas from axioms or true formulas. Derivation is the work done by reasoners asked to solve a problem or to prove a formula.

Formal systems help defining scenarios logically and solving predisposed problems (questions) on those scenarios. A formalized scenario has the further advantage that is an outstanding documentation of what is important in that scenario.

Probabilistic approach

Truth in most formal systems is expressed by assigning one of the two possible values {false,true} or {0,1} to a logical expression (e.g. a problem, or a query, a sentence or formula for which we would like to test its truth value). The work done by the reasoner is to evaluate the conditions under which the query (solution) might be true or false. But what if in the scenario there is not a sharp true or sharp false truth for a problem? In this case, one could represent a probability of truth with a value going from 0.0 to 1.0 (hence considering all possible values between 0 and 1) or define a kind of fuzziness allowing intermediate values inside the well-known truth values {false, true}, for example, {certainly true, possibly true, possibly false, certainly false}. This would open up to more realistic assertions because many scenarios, especially those describing real-world settings, present lots of uncertain information, possibly gathered from sources with different levels of trust.

Fuzzy Inference allowed since 1973 to describe reality and deductions with the help of shades between truth values true and false. Fuzzy Logic uses linguistic variables (such as age) which accept values such as young and its antonym old and their corresponding non-sharp fuzzy values. Probabilistic Inference allows the representing such shades using real values among 0 (false) and 1 (true).

The Tableau Calculus is one of the most used approaches to answer about the truth of the axioms in the considered scenario. Basically, the Tableau Calculus represents the scenario by means of a graph-like structure which is expanded by means of expansion rules using axioms from the scenario to find implicit information. In a formal system, where the scenario is defined as a set of axioms, in the graph built by the Tableau Calculus, each node represents a subject of the scenario (usually called individual, entity) and is labelled with a set of classes (seen as a sort of sets grouping entities sharing common characteristics). Nodes are connected by edges labelled by properties defined in the considered scenario. Usually, the Tableau Calculus works by refutation, i.e., it adds in the graph also the negation (the complement) of the formula to solve. If a contradiction in the expanded graph is found, then the formula is true because the graph has proved a contradiction for the negation of the formula.

Getting back to probabilistic scenarios, a common approach to compute fuzzy or probabilistic values of the formulas to solve is to resort on finding justifications explaining why the problem (query, formula) is true in the considered scenario. These justifications are basically sets of axioms. To find them, the Tableau Calculus has been extended so that each expansion rule updates not only the graph but also associates to each label in the graph the set of axioms used by the rules to add the label.

TRILL is an OWL-enabled reasoner which uses Prolog terms to solve problems on probabilistic questions. TRILL uses the Tableau Calculus to represent and infer knowledge in a probabilistic manner. More information on TRILL can be found here.

A typical problem which can be solved in TRILL is whether a given individual belongs to a special class. This problem is represented e.g. by the following Prolog goal against an ad-hoc toy knowledge base for diagnosing Breast Cancer (BRCA):

instanceOf(‚http://ml.unife.it/miniBRCA#BRCRisk‘,’http://ml.unife.it/miniBRCA#Helen‘,Explanation,[return_prob(Probability)]), explainNL(Explanation,Labels).

Which in natural language rephrases as follows:

What are the probability and the explanation that Helen (here identified by a URI http://ml.unife.it/miniBRCA#Helen) has a risk of developing breast cancer (represented by a special class identified by a URI http://ml.unife.it/miniBRCA#BRCRisk)?

Considering the OWL/XML ontology in miniBRCA – which translates in the much more readable TRILL knowledge base – TRILL answers:

Probability = 0.54, because:
19-nor-progesterone-derived is a Progestin, is a synthetic progestogen
Progestinic are post-menopausal hormones
People, that have some risk factor for the disease, have a known risk of developing it
People, who take some kind of post-menopausal hormones, have some risk factors given by the hormones
Helen takes hormones 19-nor-progesterone-derived

Meaning that the probability that Helen runs the risk of developing breast cancer is 54%. This value is due to the fact that Helen takes 19-nor-progesterone-derived („http://ml.unife.it/miniBRCA#19-nor-progesterone-derived“), which is a Progestin and so a post-menopausal hormones; taking some types of such hormones is a possible risk factor for the development of the disease, and anyone with a risk factor to develop the disease is clearly at risk of developing it.

In the following the TRILL behavior based on the toy ontology on breast cancer (miniBRCA) is demonstrated (for the sake of simplicity Prolog here is completely hidden).

Start demo
in new tab