The classification engine of RacerPro has been continuously improved. For instance, RacerPro 1-9-2 can classify a version of Snomed with more than 379000 concept names in 13 minutes (Intel CPU, 2.4 GHz, 32bit, 4GB RAM, Mac OS X). For classifying other knowledge bases, RacerPro shows the same performance as systems dedicated only towards classifying ontologies (Tboxes). RacerPro is one order of magnitude faster than systems which – like RacerPro – also offer support for Abox reasoning w.r.t. expressive Tboxes. All tests have been run on a Pentium 4, 2.8GHz, 1GB.
The significance of the optimization techniques introduced in the new release is analyzed with the well-known LUBM benchmark. The runtimes we present in this section are used to demonstrate the order of magnitude of time resources that are required for solving inference problems. They allow us to analyze the impact of the implemented optimization techniques.
An overview of the size of the LUBM benchmarks is given in Figure 1. With an increasing number of universities, there is a linearly increasing number of instances as well as concept and role assertions. For instance, with 50 universities, 1.000.000 instances have to be handled.
The runtimes for answering all 15 LUBM queries are presented in Figures 2 and 3 (Sunfire, Solaris, 32 GB). In Figure 2 a version of the LUBM TBox is used that does not cause backtracking during ABox satisfiability (or consistency) tests . With this kind of benchmark, we can evaluate storage management and indexing techniques of DL provers. In Figure 3 we used a variant of the TBox that causes backtracking.
Before queries can be answered, the ABox is checked for consistency (see the “Consistency” curve). As can be expected, if there is no backtracking state-of-the-art provers are very fast (Figure 2). If backtracking is required (Figure 3) runtimes for ABox consistency checking increase. ABox consistency checking can be done offline and corresponds to computing index structures in a database system.
Comparing the runtimes for query answering in Figures 2 and 3 (see the corresponding curves “Queries”) reveals that backtracking does not influence query answering to a large extent (at least not in the LUBM case we investigated). The total runtime is indicated with “Total”.
The results we achieve were possible with dedicated storage management techniques (e.g., offered by the implementation language Franz Allegro Common Lisp). With this basis it is possible to declare that all data structures for storing the LUBM TBox, ABox, and index structures are not examined by the garbage collector. If this is not done, garbage collection time dominate all other runtimes to a large extent. Due to the large amount of data in LUBM runtimes being examined over and over again by the garbage collector, and querying times increase in a superlinear way. The declaration of data structures as persistent (in the sense of being non-garbage) is provided after the ABox consistency check. The expression (declare-current-knowledge-bases-as-persistent) is used for declaring knowledge bases as persistent. Racer Systems offers consulting services in order to support use to maximally benefit from these services in industrial applications.
We take LUBM as a representative for largely deterministic data descriptions that can be found in practical applications. The investigations reveal that description logic systems can be optimized to also be able to deal with large bulks of logical descriptions quite effectively. LUBM allows us to study the data description scalability problem.
In addition to LUBM in this section we also discuss the UOBM-Lite benchmark. It is also scalable and was tested with 1-5 universities, each with all departments. The characteristics of the KB and the benchmarks are shown in Figure 4. The logic of UOBM is f after GCI absorption and the ABox adds datatype properties. The size of the benchmark for 5 universities results in 138K individuals, 509K individual assertions, and 563K role assertions.
(CN = no. of concept names, R = no. of roles)
U = no. of universities, L = load time, P = KB preparation time,
Each benchmark was evaluated with 15 grounded conjunctive queries designed by the authors of UOBM. The benchmark has the same structure as for LUBM. The runtimes given in Figure 4 show that RacerPro’s ABox consistency performance scales well for up to 10 universities. In contrast to LUBM the UOBM benchmark does not allow the unique name assumption. The query execution time also scales well for up to 3 universities. However, for 4 universities it increased by a factor 4 and timed out for 5 universities after 30 000 seconds. The graph in the lower part of Figure 4 displays the curves for the ABox consistency test (dashed line), query execution (dotted line), and the total benchmark time (solid line). The non-linear trend can be easily noticed. It is interesting to remark that 99.86% of the query runtime is spent for 3 of the 15 queries. This performance asks for a refinement of existing or design of new optimization techniques. This is a topic for future work.
For answering Abox queries we have found RacerPro to be as fast as systems that are specifically taylored to answering Abox queries w.r.t. very specific Tboxes such as LUBM (with low expressivity). However, RacerPro can also handle Abox queries with respect to Tboxes which these systems cannot handle.