2 Performance improvements

2.1 Tbox classification

The classification engine of RacerPro has been continuously improved. For instance, RacerPro 1-9-2 can classify a version of Snomed with more than 379000 concept names in 13 minutes (Intel CPU, 2.4 GHz, 32bit, 4GB RAM, Mac OS X). For classifying other knowledge bases, RacerPro shows the same performance as systems dedicated only towards classifying ontologies (Tboxes). RacerPro is one order of magnitude faster than systems which – like RacerPro – also offer support for Abox reasoning w.r.t. expressive Tboxes. All tests have been run on a Pentium 4, 2.8GHz, 1GB.

2.2 Instance retrieval

The significance of the optimization techniques introduced in the new release is analyzed with the well-known LUBM benchmark. The runtimes we present in this section are used to demonstrate the order of magnitude of time resources that are required for solving inference problems. They allow us to analyze the impact of the implemented optimization techniques.


PIC

Figure 1: Linearly increasing number of individuals, concept assertions and role assertions for different numbers of universities.



PIC

Figure 2: Runtimes for deterministic version of LUBM



PIC

Figure 3: Runtimes for non-deterministic version of LUBM


An overview of the size of the LUBM benchmarks is given in Figure 1. With an increasing number of universities, there is a linearly increasing number of instances as well as concept and role assertions. For instance, with 50 universities, 1.000.000 instances have to be handled.

The runtimes for answering all 15 LUBM queries are presented in Figures 2 and 3 (Sunfire, Solaris, 32 GB). In Figure 2 a version of the LUBM TBox is used that does not cause backtracking during ABox satisfiability (or consistency) tests . With this kind of benchmark, we can evaluate storage management and indexing techniques of DL provers. In Figure 3 we used a variant of the TBox that causes backtracking.

Before queries can be answered, the ABox is checked for consistency (see the “Consistency” curve). As can be expected, if there is no backtracking state-of-the-art provers are very fast (Figure 2). If backtracking is required (Figure 3) runtimes for ABox consistency checking increase. ABox consistency checking can be done offline and corresponds to computing index structures in a database system.

Comparing the runtimes for query answering in Figures 2 and 3 (see the corresponding curves “Queries”) reveals that backtracking does not influence query answering to a large extent (at least not in the LUBM case we investigated). The total runtime is indicated with “Total”.

The results we achieve were possible with dedicated storage management techniques (e.g., offered by the implementation language Franz Allegro Common Lisp). With this basis it is possible to declare that all data structures for storing the LUBM TBox, ABox, and index structures are not examined by the garbage collector. If this is not done, garbage collection time dominate all other runtimes to a large extent. Due to the large amount of data in LUBM runtimes being examined over and over again by the garbage collector, and querying times increase in a superlinear way. The declaration of data structures as persistent (in the sense of being non-garbage) is provided after the ABox consistency check. The expression (declare-current-knowledge-bases-as-persistent) is used for declaring knowledge bases as persistent. Racer Systems offers consulting services in order to support use to maximally benefit from these services in industrial applications.

We take LUBM as a representative for largely deterministic data descriptions that can be found in practical applications. The investigations reveal that description logic systems can be optimized to also be able to deal with large bulks of logical descriptions quite effectively. LUBM allows us to study the data description scalability problem.

In addition to LUBM in this section we also discuss the UOBM-Lite benchmark. It is also scalable and was tested with 1-5 universities, each with all departments. The characteristics of the KB and the benchmarks are shown in Figure 4. The logic of UOBM is ALCf after GCI absorption and the ABox adds datatype properties. The size of the benchmark for 5 universities results in 138K individuals, 509K individual assertions, and 563K role assertions.







TBox LogicCN RAxiomsABox Logic





ALCf 5149 101ALCf(D-)





(CN = no. of concept names, R = no. of roles)











U IndsInd. Ass.Role Ass. L PCons I Q T










1 43 642 116 092 129 695 35 16 10015 446 608
5138 452 509 902 563 6991601977 6704030 00030 000










U = no. of universities, L = load time, P = KB preparation time,
Cons = time for initial ABox consistency test, I = query index generation time,
Q = nRQL query execution time, T = total benchmark time.

PIC


Figure 4: UOBM-Lite benchmark characteristics and runtimes per query set (15 queries, time in secs, timeout after 30 000 secs).


Each benchmark was evaluated with 15 grounded conjunctive queries designed by the authors of UOBM. The benchmark has the same structure as for LUBM. The runtimes given in Figure 4 show that RacerPro’s ABox consistency performance scales well for up to 10 universities. In contrast to LUBM the UOBM benchmark does not allow the unique name assumption. The query execution time also scales well for up to 3 universities. However, for 4 universities it increased by a factor 4 and timed out for 5 universities after 30 000 seconds. The graph in the lower part of Figure 4 displays the curves for the ABox consistency test (dashed line), query execution (dotted line), and the total benchmark time (solid line). The non-linear trend can be easily noticed. It is interesting to remark that 99.86% of the query runtime is spent for 3 of the 15 queries. This performance asks for a refinement of existing or design of new optimization techniques. This is a topic for future work.


PIC

Figure 5: Racer runtimes for UOBM DL and a query set of 15 queries.


For answering Abox queries we have found RacerPro to be as fast as systems that are specifically taylored to answering Abox queries w.r.t. very specific Tboxes such as LUBM (with low expressivity). However, RacerPro can also handle Abox queries with respect to Tboxes which these systems cannot handle.