Abstract
On July 17 2014 in Karlsruhe Germany, Joris IJsselmuiden successfully defended his PhD thesis entitled “

Defense ceremony, with f.l.t.r.: Michael Beigl, Rainer Stiefelhagen, Joris IJsselmuiden, Dorothea Wagner, Jürgen Beyerer, Oliver Hummel, and Peter H. Schmitt.
In today’s staff exercises, individual and task oriented feedback is hard to provide. Automatically generated behavior reports can improve this situation. They could be used to assess the performance of the individual participants: How close did they follow standard operating procedures? Who should have been part of which group? How long did it take them to complete specific tasks? To enable automatic report generation, the simulated crisis dynamics, field units, etc. need to be modeled, but also the situation within the control room, which was the focus of this study. To achieve this, the developed system aims to recognize group behavior by modeling and recognizing the different types of person-person interaction and person-object interaction in various group formations.

Staff exercise at State Fire Service Institute North Rhine-Westphalia. The case study’s goal is to recognize the interactions between staff members and objects – in this image: “conversation” (left/center), “analyzing a document together” (top center), and “editing a display” (top right).
Part two, reasoning, was performed using fuzzy metric temporal logic (FMTL) and situation graph trees (SGTs). Situation descriptions were generated describing common group interactions in staff exercises. Optionally, FMTL rule parameters could be optimized through maximization of an adapted F-score. Another option was to use an adapted clustering algorithm as preprocessing. Clustering enriches the person descriptions in the annotated input data with cluster membership information based on their positions in the room. This simplified the subsequent reasoning process and it allowed for a more intuitive approach, yielding more consistent and less redundant results. Clustering can also lead to better runtimes.
Part three, evaluation, was performed by quantitatively comparing reasoning results to ground-truth that was created using a self-developed ground-truth annotation tool. The main performance measures were precision, recall, and F-score, plotted over different truth value thresholds. The evaluation also contained a runtime analysis, error analysis, experiments on noisy data, measurements of inter-annotator agreement, the effect of a new clustering parameter, and the effect of parameter learning.
Footnotes
Acknowledgements
Supported by Fraunhofer-Gesellschaft Internal Programs Grant 692 026.
