Sage Journals: Discover world-class research

Abstract

In recent years, pervasive use of sensors in smart devices, e.g., phones, watches, medical devices, has increased dramatically the availability of personal data. However, existing research on data collection primarily focuses on the objective view of reality, as provided, for instance, by sensors, often neglecting the integration of subjective human input, as provided, for instance, by user answers to questionnaires. This limits substantially the exploitability of the collected data. In this paper, we present a methodology and a platform designed for the collection of a combination of large-scale sensor data and qualitative human feedback. The methodology has been designed to be deployed on top, and enrich functionalities of an existing data collection APP, called iLog, which has been used in large scale, worldwide data collection experiments. The main goal is to put the key actors involved in an experiment, i.e., the researcher in charge, the participant, and iLog in better control of the experiment itself, thus improving the quality and richness of the data collected. The novel functionalities of the resulting platform are: (i) a time-wise representation of the situational context within which the data collection is performed, (ii) an explicit representation of the temporal context within which the data collection is performed, (iii) a calendar-based dashboard for the real-time monitoring of the data collection context(s), and, (iv) a mechanism for the run-time revision of the data collection plan. The practicality and utility of the proposed functionalities are demonstrated in a case study involving 350 University students.

Keywords

high quality data thick data personal data collection platform temporal and spatial context

1. Introduction

In today’s world, digital interactions have become deeply integrated into daily life, generating vast amounts of personal data. This data, encompassing information about individual identities, preferences, activities, and interactions, is increasingly collected through digital devices, online services, and various monitoring technologies. For instance, smart devices, e.g., phones, watches, or medical devices, are equipped with numerous sensors that collect massive volumes of data about their owners. This type of data, often referred to as big data (Das & Kumar, 2013), is characterized by its vast volume, high velocity, and diverse variety, and allows for the identification of large-scale patterns and trends through advanced computational techniques. Despite its power, big data often lacks the contextual depth needed to fully understand the underlying human elements behind the numbers, that is, it fails to explain the subjective impulses that drive an individual’s actions. In contrast to the vast amount and speed of “big data”, thick data provides a qualitative description focused on human experience and behavior (Geertz, 2008). Thick data pertains to the abundant and detailed insights obtained from extensive qualitative research techniques such as ethnography, interviews, and participant observation. It prioritizes qualitative aspects such as human narratives, emotions, and cultural subtleties, i.e., it is a class of data sources that align with ethnographically collected and meticulously analyzed observational data. Building upon the two notions above, (Bornakke & Due, 2018) defines big-thick data as the convergence of big-thin data, e.g., usage analytics, sensor data, general Internet-of-Things data, with small-thick data, e.g., observations, interviews and questionnaires. The intuition is that big quantitative data, prized for its objectivity and scalability, complements the contextual richness of the qualitative insights of thick data (D’ignazio & Klein, 2023).

The notion of big-thick data was originally conceived having in mind the human-centric design of services, with the goal of blending statistical rigor with contextual relevance (Bogers et al., 2016, 2018). However, this intuition is very powerful and can be applied in almost all AI applications, and machine learning (ML) in particular. In ML, for instance, the user’s subjective interpretation of the current situation can help the machine in building a better understanding of what is going on, for instance in order to enable better human-machine interactions (Bontempelli et al., 2022) or better machine-enabled social interactions (Fausto et al., 2021; Osman et al., 2021). The integration of these two types of data facilitates the meaningful bi-directional human-machine collaboration by providing data that allows the machine to learn from human behavior and activities, as well as data that captures the human interpretation of their actions. It is not by chance that (Bornakke & Due, 2018) mentions social media and experience-sampling-method (ESM) data as early examples of big-thick data. Building upon this intuition, (Giunchiglia & Li, 2024) shows how big-thick data can be generated by integrating context-aware personal data, collected using both sensors and questionnaires, with data about the environment within which the personal data are collected, this being done by exploiting OpenStreetMap data enriched with other datasets carrying detailed information about the places involved. As shown in Giunchiglia and Li (2024), the result is a very rich dataset which, while being more focused and much smaller than the original datasets, allows to learn about and provide answers to a much richer set of questions which integrate the objective view with one or more personal subjective views of the current situation.

However, at the current state of the art, the quality of the data collected from users is a major limitation which limits the generation of big-thick data, when this is not done manually but, rather, delegated to a data collection APP. The goal of this paper is to describe a methodology and a platform that enables participants to provide high-quality personal data, as close as possible to the richness of big-thick data, while ensuring minimal disturbance to the user. The target are all the researchers who have a need for the kind of data we want to produce. At the moment we have identified at least four such groups: (i) Researchers in AI and ML with a focus on personalized services; (ii) computational social science researchers where the subjective component is key; (iii) psychology researchers and in particular those following the EMA/ESM methodology (see the related work section) and (iv) service design researchers with a focusing on the problem of designing with data. See, for instance the hackathon described in²

The starting point of the methodology is the identification of the three key roles around which the data collection process evolves, that is: (i) The researcher, that is the person who has designed the experiment and that, during its execution, monitors its evolution, (ii) the participant to the experiment, one or more, where also the researcher can be a participant, namely the person in charge of providing data, via one or more mobile devices, the data to be collected, and, finally, (iii) the platform, collecting the data from the participants. The intuition is to develop a set of features, and corresponding mechanisms, where these three roles have increased awareness and control over the data collection process. We instantiate this intuition via four key functionalities, each building upon and extending the previous one.

A representation of the situational context within which the data collection is performed. The understanding of the local context (including the user’s physical and psychological context) is key to the idea of thick data (Geertz, 2008) and its relevance has been pointed out in most mobile applications, see, e.g., Huang et al. (2016), Intille et al. (2003), Runyan et al. (2013), Wang et al. (2014) and Zhang et al. (2021). Knowledge of the user’s physical, social, emotional, and informational states, allows to better interpret the vast amounts of sensor data collected (Chen et al., 2012), thus improving the relevance and quality of data collected (Boyd & Crawford, 2012; Davenport & Dyché, 2013). The key innovation in this paper is that we focus on the context of the data collection as such. The machine works in some kind of meta-context whose sole goal is to increase the machine / participant / researcher’s awareness of the process of data collection, as a first step towards increasing control and the quality and richness of the collected data. That is, our ultimate aim is to generate big-thick data about the process of generation of big-thick data, the former being a key enabler for the generation of the latter.

A representation of the temporal context within which the data collection is performed. By this we mean that an experiment is modeled as a plan where each action, e.g., a human answer to a machine question, or a sensor data collection, or a machine answer to a human question, is associated with a set of scheduling constraints and, after execution, with a set of execution annotations, encoding information about past, present and future actions. Examples of planning constraints are, for instance, that a question can be asked within a certain time frame, and that should be asked only when at home. Examples of plan execution annotations are, for instance, that a question was not answered, of that it was answered with a delay of half an hour. To this extent, we have developed a representation language, called iLogCal, which allows to represent all the context dimensions, both temporal and situational, and to use them to condition the activation of both questions to the user and sensor data collection activation / stop.

A calendar-based dashboard which allows all the three roles to focus on specific elements of the experiment temporal and situational context. One of the key aspects is that iLogCal has been defined by extending (a subset of) iCal, the Internet standard Calendar³ $^{,}$ ⁴. This allows to use calendars to provide multiple views of a plan, for instance, by focusing on present, past or future, or on a specific context, or on a set of quality parameters, or on one or more participants, while maintaining an overall holistic view of the experiment.

A set of mechanisms for the execution-time revision of the data collection plan. The data collection plan can be revised by a single participant, within the bounds set by the researcher, or by the researcher for one or more of the participants. The plan can also be revised by the platform itself, for instance, based on a ML algorithm which has learned what are the best / worst dates for getting the answer of the best quality. The control hierarchy proceeds from the researcher, to the participants, to the platform.

These four functionalities are being implemented as part of an integrated platform, an APP, called iLog, built on top of an earlier version of iLog itself (Zeni et al., 2014). The two key core features of this earlier version of iLog are the possibility (i) of collecting sensor data from any number of sensors from one of more smart devices, and (ii) of collecting user-provided answers to questionnaires, which can be synchronic as well as diachronic. Since its first application in 2013, iLog has been used in many data collections campaigns, see, e.g., Bison et al. (2021), Giunchiglia et al. (2021), Maddalena et al. (2019) and Zeni et al. (2020). These experiments have allowed to generate an extensive set of studies, see e.g., Assi et al. (2023), Bontempelli et al. (2021), Giunchiglia and Li (2024) and Meegahapola et al. (2023), while at the same time highlighting the problems of data quality that motivate this work. This paper is a rather detailed description of the four functionalities described above and of how they are integrated, as part of a single platform, on top of the original version of iLog. This paper is structured as follows. Section 2 decribes the related work. Section 3 introduces the main features of iLog and how it was used in an experiment, described in Giunchiglia et al. (2021), carried out as part of the WeNet project (Michael et al., 2025)⁵. The description is rather concise focusing only on those aspects which relate to the four functionalities above. A detailed description of the resulting dataset (including GDPR and ethics compliance)⁶, is provided in Busso et al. (2025). Section 4 describes the situational context model. Section 5 introduces the main features of the temporal context model and iLogCal. Section 6 focuses on the monitoring process. Section 7 provides an example where the ML component improves the answer quality based on an analysis of how the temporal and situational context (and a few other parameters) influence how long a participant waits before starting to answer a question. The notions from Sections 4, 5, 6, 7 are exemplified on the experiment and dataset described in Section 3. Finally, Section 8 closes the paper.

2. Related Work

The intuition underlying the notion of context is very similar to that underlying the notion of big-thick data. That is, the knowledge of the local situation is key in order to provide machines with a good enough understanding of what is going on. This notion has been extensively studied and most early studies on context were in Knowledge Representation (KR) and AI, (Giunchiglia, 1993; McCarthy, 1987). Later on, Schilit and Theimer (1994) introduced the concept of context, defining it as involving “locations, identities of nearby individuals and objects, and changes to those objects.” Similarly, Brown et al. (1997) depicted context as being about “locations, varying user roles, time, seasons.” In Dey et al. (1998), Dey et al. provides a definition of context which is closely aligned with our understanding as, “the user’s physical, social, emotional, or informational states.” Dey and Abowd (Abowd et al., 1999) define the context in a more comprehensive manner. They state that “context is any information that can be used to characterize the situation of an entity. An entity is a person or object that is considered relevant to the interaction between a user and an application, including the user and applications themselves.” Existing research on the development of context-aware applications primarily concentrates on interruptions in context (Mishra et al., 2017), gathering user attention (Mehrotra et al., 2016), or enhancing the response rate of questions (Sun et al., 2021). These studies underscore the necessity of providing users with suitable times to facilitate user acceptance of information, with instances found in expert systems (Ye & Johnson, 1995) and, more recently, reccommender systems (Afolabi & Toivanen, 2019). However, so far, no work has ever concentrated on the meta-context of the process of data collection. In general, overall, there has been minimal focus on providing the user with flexibility in answering questions or in driving the sensor collection flow. Typically, the researcher’s dispatch of questions consists of a fixed schedule, with no possibility for the researcher, or participant, or the platform, to control the data collection and in particular the participant’s response activity.

Numerous systems, which leverage data from mobile devices and wearable sensors, have found applications in health monitoring, aging care (Berke et al., 2011; Lee & Dey, 2015), and the understanding of human behaviors and traits. For instance, mobile sensing has proven invaluable for health and physical activity monitoring, where accelerometers, gyroscopes, and GPS sensors are used to track users’ movements and generate insights into exercise routines, sedentary behaviors, and overall activity levels (Dunton et al., 2012; Intille, 2016; Liao et al., 2015; Rabbi et al., 2015) and in research on comprehending and forecasting human behaviors and traits (Do & Gatica-Perez, 2012; Farrahi & Gatica-Perez, 2011; Harari et al., 2016; Peltonen et al., 2020; Wang et al., 2018b). Similarly, behavioral data such as sleep patterns, social interactions, and phone usage have been utilized to detect early signs of mental health issues like stress, depression, and anxiety (Bogomolov et al., 2014; Wang et al., 2018a, 2020). Some of this work has resulted in the development of several mobile sensing frameworks designed to support the data collection and analysis. Some such examples are Campaignr (Joki et al., 2007) and Epicollect 5 (Gohil et al., 2020) which focus on customizable and scalable data collection; CenceMe (Miluzzo et al., 2008) and Social fMRI (Aharony et al., 2011) which emphasize the social context sensing; Empath2 (Dickerson et al., 2015), Emotion Sense (Lathia et al., 2013), and ESMAC (Bachmann et al., 2015) which specialize in emotion detection and behavior analysis.

Closer to our work, is AWARE (Ferreira et al., 2015), a well-established mobile sensing framework designed for the collection of passive data through smartphone sensors. While AWARE provides a broad and extensible framework for environmental and behavioral context awareness, iLog excels in comprehensive activity tracking, making it a more effective tool for personalized research contexts where user-specific logging are critical. DemaWare2 (Stavropoulos et al., 2017) is another prominent framework designed for activity recognition and contextual reasoning. It uses sensor fusion and ontologies for the detection of complex activities. While DemaWare2 excels in identifying hierarchical activities through predefined rules and ontologies, iLog offers greater flexibility. DemaWare2’s focus on predefined activity hierarchies may not be as effective in capturing more fluid or personalized data, thus limiting its applicability in dynamic, real-world settings. The Effortless Assessment Research System (EARS) (Lind et al., 2023) is designed for passive monitoring of behavioral and psychological patterns, particularly in mental health research. It emphasizes effortless data collection by minimizing the need for active user participation. While this makes EARS ideal for studies requiring low participant burden, it may miss opportunities for richer contextual insights that can be derived from active input. Beiwe Onnela et al. (2021) is a high-throughput digital phenotyping platform designed for mental health research and behavioral studies. Like iLog, it integrates passive sensing with active data collection (through surveys), allowing for personalized data collection and analysis. However, iLog offers a broader scope of application, extending beyond the mental health focus of Beiwe to include areas such as habit formation and lifestyle monitoring. Moreover, iLog stands out due to its emphasis on user engagement and ethical data collection practices, providing greater transparency and user autonomy over data sharing, which may not be as explicitly emphasized in Beiwe. RADAR-base (Folarin, 2019) is an open-source platform designed for longitudinal health studies, particularly in neurological and psychiatric research. It allows for the integration of multiple wearable sensors and mobile applications for the collection of health-related data. While RADAR-base excels in health-related contexts, iLog offers more flexibility across a broader spectrum of research domains. By combining passive sensor data with active surveys, iLog provides a richer, more personalized understanding of user behavior, mainly because of its ability to adjust data collection based on the context and the individual’s responses.

3. iLog At Work

The experiment that generated the data that we consider here as our motivating example involved students from the University of Trento, Italy. The experiment was conceived and designed following the mainstream approach in the Social Sciences, in particular in the development of time diaries, where the questions submitted to participants follow the HETUS standard⁷. Zeni et al. (2020) describes a large scale European level data collection experiment which was used to fine tune the methodology used in this experiment. Following this methodology, as a first step, all students were invited via email to participate in a survey. From the 5,000+ respondents, a representative sample of 350 students was selected based on their fields of study and socio-demographic characteristics. To mitigate bias and noise, in this paper we consider the data of a selection of 170 students, where the choice is motivated by considerations related to the number of answers provided and demographic characteristics, including gender, study degree, and department (see Table 1). iLog runs on the participants’ smart phones, both Android and iOS, and acts as an interface through which it is possible to capture annotations / tags / answers from participants. iLog allows for a wide range of question types (e.g., free text, fixed answers, take photo). The questions are sent at intervals defined inside the experiment plan. The experiment described here consisted of three diachronic time-diaries, with varying timings and aims:

(1)
The first diary gathered general information about the day. At 08:00 AM, the participant received two qualitative questions about sleep quality and expectations for the day. At 10:00 PM, the participants were asked (A) to rate their day; (B) to identify any problems they encountered during the day; (C) to describe how they solved them; and, finally, (D) they received a question about the COVID-19 pandemic.
(2)
The second diary is a standard time diary with questions about three main activities and mood. Every half an hour for the first two weeks and every hour for the subsequent two weeks, the participants received a smartphone notification with four questions (the first three based on the HETUS standard):
“What are you doing?” allowing for 34 different answers;

“Where are you?” allowing for 26 different answers;

“Who is with you?” allowing for 8 different answers (including ”being alone”);

“What is your mood?” allowing for a scale of 5 levels ranging from happy to sad.

(3)
In the third time diary, the participants received an additional set of questions about food and drinks. These questions were asked every two hours outside the main meal hours.
Figure 1 shows some of the questions (and pre-compiled answers) asked during the data collection. iLog automatically collects sensor data in the background without requiring user intervention. Researchers are given the flexibility to design the frequency of data collection for each single sensor. In total, iLog allows to collect data from 34 sensors which are categorized into three groups as follows:
Hardware (HW) sensors, the sensors typically found in smart phones, collect information about the surrounding environment. See Table 2 for the list of HW sensors collected during the experiment.

Software (SW) sensors, also typically found in smart phones, collect data about the SW events involving the Operating system and the APPs. See Table 3 for list of SW sensors collected during the experiment.

Question-Answering (QA) sensors collect information about the events that are associated with the question answering process. See Table 4 for list of QA sensors.

Figure 1.
Sample Questions Captured in the WeNet Project.

Table 1.
Selected Participants’ Demographic Information.

Sex Degree Department

Feature Female Male BSc MA+PhD Information Science Industrial Business Sociology

Number 101 69 108 62 50 25 44 51

Table 2.
HW Sensors.

No HW Sensor Estimated Frequency Category

1 Accelerometer Up to 10 samples per second Big

2 Gyroscope Up to 10 samples per second Big

3 Light Up to 10 samples per second Big

4 Location Once every minute Small

5 Magnetic Field Up to 10 samples per second Big

6 Pressure Up to 10 samples per second Big

Table 3.
SW Sensors.

No SW Sensor Estimated Frequency Category

7 Airplane mode [ON/OFF] On change Small

8 Battery charge [ON/OFF] On change Small

9 Battery level On change Small

10 Bluetooth devices Once every minute Small

11 Bluetooth LE (low energy) Devices Once every minute Small

12 Cellular network info Once every minute Small

13 Doze mode [ON/OFF] On change Small

14 Headset status [ON/OFF] On change Small

15 Movement activity label Once every 30 seconds Small

16 Movement activity per time Once every 30 seconds Small

17 Music playback (no track information) On change Small

18 Notifications received On change Small

19 Proximity up to 10 samples per second Small

20 Ring mode [Silent/Normal] On change Small

21 Running applications Once every 5 seconds Small

22 Screen status [ON/OFF] On change Small

23 Step counter up to 10 samples per second Small

24 Step detection On change Small

25 Touch event On change Small

26 User presence On change Small

27 WIFI network Connected to On change Small

28 WIFI Networks available Once every minute Small

Table 4.
QA Sensors.

No QA Sensor Estimated Frequency Category

29 Time diary question On change Small

30 Time diary confirmation On change Small

31 Time diary answer On change Small

32 Task question On change Small

33 Task confirmation On change Small

34 Task answer On change Small

Differently from HW and SW sensors, QA sensors are specific to iLog and, as far as we know, are not found in any other data collection APP. They are the key element which enables the design and implementation of the scheduling language iLogCal described below and, therefore, of the entire data collection methodology described in this paper. Looking at Table 4, it is possible to notice two sets of QA sensors. The first set, concerning Time Diaries, is used to answer questions about the context of the experiment, while the second set, concerning additional Tasks, is used to get information about the data collection process. As an example of task, a user may be asked whether it confirms a previous answer, or if it achieved a specific task, e.g., returning back a missed phone call). Inside each set of QA sensors we have three types of sensors, as follows:
Time Diary/Task question: When a question is generated, ready for delivery;

Time Diary/Task confirmation: When a question, is delivered to the device of the participant (who may then look at it in any moment after this);

Time Diary/Task answer: When an answer is stored, with additional information of the difference between answer and notification time (the notification time and time defined by the researcher when the question is to be submitted to the participant).

As described in detail in Bison et al. (2024), the reaction time, also called the response time, that is, the time difference between when one receives a question and when (s)he starts filling the answer, and the notification time, also called the completion time, that is, the time taken to fill an answer, are key factors which impact the quality of an answer. Here, by the quality of an answer, we mean an answer which has been meaningfully provided (and not just dropped) and which is correct. It can be noticed that response time and completion time can be easily computed, for each question and (possibly missing) corresponding answer, from the information provided by QA sensors.

The work in Bison et al. (2024) also provides evidence of the fact that response and completion time are highly impacted by the situational and temporal contexts. iLog has various features which allow; (i) to compute this information and, even more importantly, (ii) to provide extreme flexibility about the configuration of the data collection. These two features are key, together with QA sensors, for learning about which factors influence the user behavior. Let us consider some examples. First, the researcher has a wide variety of question types to select from. This can be exploited to ask questions which are not related to the data to be collected as part of the experiment as such, but which are about the data collection process meta-context. For instance, as done in Zeni et al. (2019), under certain conditions, the participant can be asked to confirm a specific answer. Second, the possibility of configuring the data collection frequency independently for each sensor allows to collect data whose main purpose is solely that of monitoring the experiment evolution. As an example, collecting the GPS when asking a question about the current location allows iLog to validate the correctness of the answer. This idea is exploited in the work described in Giunchiglia et al. (2018); Zeni et al. (2019). The same applies to the bluetooth or to any other sensor which provides information about the situational context of the question-answering process. As another example, any question and relative answer about the current situational context is key for collecting information about the meta-context at the precise moment when an answer is provided. This idea is exploited in the work described in Bison et al. (2024) for computing the best moment for asking a question. Third, the information provided by SW sensors, if integrated with the information about the experiment temporal context, allows to understand and correlate the activities performed by participants. For instance, Giunchiglia et al. (2017b) exploits this information to correlate academic performance and social media usage, while Kasinidou et al. (2024) uses this information to detect the usage of social media during lectures.

As a conclusive remark, it is worth noticing that the amount of personal information that has been collected in this experiment and that, in general, which can be collected using iLog, is huge, thus raising important privacy and ethics related issues. The approach that we follow is based on tree main pillars. The first is that the use of iLog follows a very precise GDPR and ethics aware methodology, inspired by the approach followed in⁸. The details of how this was applied in the experiment described in this section are reported in Busso et al. (2025). The second is that our main focus is on research-motivated data collections. The third is that iLog is being redesigned to store and keep all the data in the participant’s device. Ultimately, in the next version of the platform, the participant will be in full control of the data and of how to use them for his/her own purpose. In turn this will allow for the possibility, at the moment unexplored, of increasing the participant’s self-awareness of his/her life-style and habits, as modeled as (life) sequences of situational contexts (see Section 4) and visualized by the dashboard (see Section 6). Some early ideas in this direction are provided in Li et al. (2022).
4. Representing the Situational Context

	Sex	Degree	Department
Feature	Female	Male	BSc	MA+PhD	Information Science	Industrial	Business	Sociology
Number	101	69	108	62	50	25	44	51

No	HW Sensor	Estimated Frequency	Category
1	Accelerometer	Up to 10 samples per second	Big
2	Gyroscope	Up to 10 samples per second	Big
3	Light	Up to 10 samples per second	Big
4	Location	Once every minute	Small
5	Magnetic Field	Up to 10 samples per second	Big
6	Pressure	Up to 10 samples per second	Big

No	SW Sensor	Estimated Frequency	Category
7	Airplane mode [ON/OFF]	On change	Small
8	Battery charge [ON/OFF]	On change	Small
9	Battery level	On change	Small
10	Bluetooth devices	Once every minute	Small
11	Bluetooth LE (low energy) Devices	Once every minute	Small
12	Cellular network info	Once every minute	Small
13	Doze mode [ON/OFF]	On change	Small
14	Headset status [ON/OFF]	On change	Small
15	Movement activity label	Once every 30 seconds	Small
16	Movement activity per time	Once every 30 seconds	Small
17	Music playback (no track information)	On change	Small
18	Notifications received	On change	Small
19	Proximity	up to 10 samples per second	Small
20	Ring mode [Silent/Normal]	On change	Small
21	Running applications	Once every 5 seconds	Small
22	Screen status [ON/OFF]	On change	Small
23	Step counter	up to 10 samples per second	Small
24	Step detection	On change	Small
25	Touch event	On change	Small
26	User presence	On change	Small
27	WIFI network Connected to	On change	Small
28	WIFI Networks available	Once every minute	Small

No	QA Sensor	Estimated Frequency	Category
29	Time diary question	On change	Small
30	Time diary confirmation	On change	Small
31	Time diary answer	On change	Small
32	Task question	On change	Small
33	Task confirmation	On change	Small
34	Task answer	On change	Small

The notion of context used here is an elaboration of the notion of context first introduced in Giunchiglia et al. (2017a) and further extended in Xiaoyue et al. (2022). As a motivating example, let us consider a small portion, of the duration of around a couple of hours, of the everyday life of the students participating in the experiment described in Section 3, as represented in Figure 2. Let us call this person, me. As from Figure 2, the activities of me consist of the following:

Figure 2.

An Example of Everyday Life Sequence.

During a first period of time T1 (green box), me is at a pizzeria having lunch with the friend, John. They are having pizza and me is happy;

Then, in the following period of time T2 (orange box), me is driving to work, alone and is in a worried mood;

Finally, during T3 (blue box), me is in a meeting in office with the colleague Bob and me is in a neutral mood.

Following the terminology introduced in Xiaoyue et al. (2022), what is represented in Figure 2 is a specific instance of a (small) fragment of the life of me, written as the Life Sequence of me, $L (m e)$ , that is, a sequence of three situational contexts of me, written

C_{i} (m e)

for a total time duration of T1+T2+T3. We define Life Sequences as follows

L (m e) = ⟨ C_{1} (m e), \dots, C_{n} (m e) ⟩ with 1 \leq i \leq n

(1)

We assume that me is involved in only one personal context at the time. This models the intuition that a situational context is associated to a single location, that is, that moving from one location to another means changing context, and that, at any given moment, a person can be in only one place. A life sequence fully covers the period under consideration, but there may be elapsed times between a context and the next one in the sequence. What makes a set of contexts a life sequence is not the time sequentiality but the fact that they are functionally related by some overall motivation or purpose. Some examples of life sequences are: the lectures in a morning, which may or may not have an elapsed time in between, depending on whether the class is in different rooms, where two classes in the same room can be modeled as a single context or as the sequence of two contexts in the same location; the lectures of the same course in a semester; the editions of the same course along five years; a portion of everyday life as in Figure 2; a full day, and so on.

We model a situational context in terms of five components as follows (from now on we drop the argument me whenever no confusion arises).

C = ⟨ W E, W A, W I, W O, W U ⟩

(2)

where:

WE, the so-called spatial context, is a linguistic description, e.g., a label or some text provided in a formal or natural language, describing the location where me is at the moment. Information about it can be obtained from the sensor data as well as from iLog questions. In the experiment described in Section 3, the HW sensors that can be used to compute the location are, e.g., GPS or WI-FI. The name of the spatial context is the label (selected from a set of predefined ones) provided by the answer to the question “WhE are you?.” In the third context of Figure 2, the place where me is located is an office.

WA, the so-called activity or event context, is a linguistic description of the activities being currently performed by me. A single context may contain one or more activities which in turn, can be performed in sequence or in parallel Li et al. (2022). Information about this can be obtained from sensor data as well as from the iLog questions. In the experiment described in Section 3, a HW sensor that can be used to know about the physical activities is, the accelerometer; whereas, a SW sensor can be used to know about the online activities a person is doing, while the QA sensors allows to know when me is involved in which question-answering activities. The name of the temporal context is the label (selected from a set of predefined ones) provided by the answer to the question “WhAt are you doing?”. In the third context of Figure 2, the temporal activity being carried out is a meeting.

WI, the so-called internal (activity or event) context, is a linguistic description describing the internal activities occurring inside me. Information about this can be obtained from the sensor data (e.g., heart beat, blood pressure) provided by medical devices or smart watches, as well as from the iLog questions. In the experiment described in Section 3, no sensor could provide this type of information. The only question providing this type of information was the question asking “What mood are you In?”. In the third context of Figure 2, the mood of me is neutral (agreeable).

WO, the so-called social context, is a linguistic description describing the people, possibly none, who are with me at the moment. Information about it can be obtained from sensor data as well as from iLog questions. In the experiment described in Section 3, the HW sensors that can be used to compute the social context are, e.g., the blue-tooth or the microphone (via speaker recognition). The social context is described by the label (selected from a set of predefined ones) provided by the answer to the question “WhO are you with?”. In the third context of Figure 2, me is with one or more colleagues.

WU, the so-called material or (tool / utensil) context, is a linguistic description, describing the tools, possibly none, which are used or usable by me. Example of tools are: the car used in a trip or the phone used in the interaction with a friend. In the experiment described in Section 3, no sensor and no question was used to provide this type of information. It could be obtained from sensor data (e.g., bluetooth, rfid, wifi) as well as from a question like “Which Utensils are you using?”. In the third context of Figure 2, me is using a few objects, e.g., a projector, and a laptop.

The scenario in Figure 2 can be modeled by a knowledge graph (KG) (Giunchiglia et al., 2023), see Figure 3. We can identify the following components :

a (sub-)KG for each context, including the internal context;

a node for each entity involved, e.g., Person, Room, and Furniture;

an attribute and corresponding value for each node / entity; for instance, the attributes of ME (whose context we are describing) are Name, Mood, Notification time and Answer time;

an Edge for each relation between two entities; for instance, office is $P a r t O f$ $w o r k p l a c e$ , whereas; Bob (person) and the Office table (Furniture) are both $i n$ the office where a $m e e t i n g$ taking place.

Figure 3.

The Knowledge Graph of the Third Situation Context in Figure 2.

A Life Sequence is represented as a set of KGs. As it will be described in detail in the next section, any of the attributes and relations occurring in one or a combination of KGs can be used as a precondition enabling or disabling a question or a task of iLog.

5. Representing the Temporal Context

We model the temporal context using iLogCal, a scheduling language developed on top of iCal, the iCalendar/RFC5545 (iCal) standard. Using the iCal standard provides three primary benefits, namely;

An explicit and declarative representation of the activities involved in an experiment. This, in turn, allows to easily modify, for instance via a graphical interface, the resulting schedule, e.g., for instance by eliminating questions or adjusting response times;

The possibility of using the iCal Recurrence Rule (RRule) for the formulation of recurring activities;

Access to advanced open-source graphical interfaces in the form of calendar-like visualizations, e.g., Fantastical Calendar⁹.

iLogCal organizes the specification of an experiment in three main components, as follows:

(1)
The general schedule aggregating all the different components;
(2)
The question answering component;
(3)
The sensor data component.
We introduce and discuss below a snapshot of the three iLogCal components using the Extended Bachus-Naur Form (BNF) notation. We use the following conventions:Terminal symbols are written in the font of the text of the paper, <text> is a nonterminal symbol for any <text>, <text1><text2> is the sequence of <text1> and <text2>, <text1>|<text2> stands for a choice between <text1> and<text2>. {<text>} is any number of occurrences of <text>, allowing for zero occurrences.
5.1. Experiment General Schedule

The BNF of an experiment general schedule is reported in Figure 4. We have the following observations.

Figure 4.

Experiment General Schedule.

A user may be associated with multiple calendars; this allows a user to participate in multiple experiments in parallel. A life sequence consists of the data collected by one of more calendars;

Each calendar contains multiple context collections; this allows to have an articulated specification of an experiment, while maintaining the unity of the same experiment;

Each context collection allows for any number of question collections as well as sensor collections. This facilitates the specification of multiple diverse data collections within the same context collection.

Identifiers, e.g., <calendar id> allow to identify components inside the same group from one another. The full version of iLogCal has ids for each and every element listed in Figure 4.

As an example of instantiation of the schedule in Figure 4, the experiment described in Section 3 was organized as follows. Three calendars were created to collect data from participants. The first calendar gathered general questions about the day, the second calendar collected time diary questions as detailed in Section 3, and the third calendar compiled additional questions regarding food and drinks. Notably, all three calendars gathered the same sensor data. The researcher then decided how to use the sensor data collected from the different calendars. The details of this step are discussed in the following subsections.

5.2. Question Collection

The BNF of the iLogCal sensor collection is reported in Figure 5. We have the following observations.

A <Question collection> is associated with; a unique id <Cid>; a time range specified by a start and end time (<start>, <end>); a variable <status> which indicates whether the event has been accepted by the user, with 1 meaning acceptance and 0 meaning rejection; a Recurrence Rule (<RRule>) and a <question>;

<RRule> specifies the values used to determine each recurrence and how the event should be repeated. It includes three elements: <Interval>, <Count>, and <Frequency>. <Interval> specifies the number of <Frequency> units that must elapse before the next occurrence of the event. <count> specifies the number of times the event will be repeated;

Each <question> encompasses a question category <Qcategory>, as defined in section 4. <question content> consists of the text of the query to be posed to the participant, e.g., “What are you doing?”. The participant answers, e.g., “Meeting”, are included in <answer content>. The field <Qtype> is instantiated following, with minor adaptations, the state of the art from the Social Sciences, see, e.g., Bradburn et al. (2004). The values written in the BNF are those used in the experiment in Section 3.

We have five possible values for Qcategory, one for each type of context, as from above. That is, we associate a question collection to each any relevant context type. Given that, as from above, we may have any number of question collections we may have zero, one or multiple instances of the same context type.

Figure 5.

Question Collection.

We illustrate the use of the BNF in Figure 5 by applying it to the data in Figure 3. We select WA as the value for Qcategory. As a consequence we have that the value for question context is “What are you doing?”. We set the Qtype as a single choice question and we have answer content instantiated to “meeting”, i.e., the input from me. As from Section 3, during the first two weeks questions were dispatched 48 times a day (once every 30 minutes) over a span of 14 days (two weeks). This was achieved by setting an RRule with a <Frequency> set to daily, an interval of 48 times, and a count of 14.

5.3. Sensor Data Collection

The BNF of the iLogCal sensor data collection is reported in Figure 6. The structure is essentially the same as that used for question collections and exploits a similar set of nonterminal symbols. <Name> is the name of the specific sensor, e.g., GPS, accelerometer, <Description> explains how the specific sensor should be used; <Sensor type>, identifies the family to which the specific sensor belongs. For instance the GPS is a Location sensor, while the accelerometer is a Motion sensor. We illustrate the BNF in Figure 6 by applying it to to the collection of the GPS data in Figure 3, as part of the collection of data of the user’s spatial context (WE). In this case, the field Description is filled with, e.g., a text saying ”Location information using GPS connections”. Furthermore, as from Table 2, the GPS information was collected every minute for 48 days; this means a RRule with <Frequency> set to Minute, interval to 1, and a count to 69120 $(= 48 \times 24 \times 60)$ .

Figure 6.

Sensor Data Collection.

6. Monitoring the Data Collection

Monitoring the data collection entails tracking how well an experiment is executed based on a comparison with a predefined experiment plan. The first main component for experiment control is iLog System Administration Component (ISAC), a tool which allows for the creation of the experiment plan/calendar. ISAC allows researchers; (1) to create the experiment plan; (2) to visualize the experiment Timeline where Researchers can see the entire study schedule at a glance, making it easier to plan and adjust various phases of the data collection process; and (3) to adapt the sampling frequency dynamically, thus adjusting the intervals at which data is collected or questionnaires are sent out. Figure 7 depicts ISAC when used to generate a question in the definition of the experiment plan. The information necessary includes all the elements of the BNF defined in the previous section and, in particular, name and description of the question, repetition frequency, available answer options, and scheduled day and time for sending it out. As a complement to ISAC, the platform features a component where the researcher can also set various parameters which then are use to rank the quality of the participants’ involvement in the experiment, see Figure 8. These parameters include: Maximum allowed number of unanswered questions, average maximum completion and response time. During the execution of the experiment, a participant’s performance can be ranked as $g o o d$ , $m e d i u m$ or $p o o r$ . In case of $p o o r$ behavior, the participant may be kicked out of the experiment and, for instance, not get the economic incentive if this was promised (as it was the case in the experiment in Section 3).

Figure 7.

Using iLog System Administration Component (ISAC) to Generate a Question.

Figure 8.

Quality Parameters Used to Rank Participants.

The second main component is a dashboard which enables researchers and participants to efficiently track the progress of the experiment execution. It provides real-time insights and control, ensuring that researchers maintain the quality and integrity of the data collection process while also supporting the participant engagement and compliance. This includes tracking both sensor data and responses to questions made by participants. Key features include:

Live Data Feeds. This module allows researchers to view the incoming data as it is collected, providing immediate visibility of the participant activity and data trends in real-time.

Compliance Tracking. This module displays participant compliance rates, showing who is completing the required tasks (answering questions) and who might be falling behind. This allows researchers to quickly identify and address potential issues.

Data Quality Checks. This module consists of a set of algorithms which discover possible participant misbehavior and/ or errors in the data. This is used to notify researchers and also participants so that they can take corrective actions promptly.

Advanced Analytics. It incorporates data filtering techniques that enable researchers to conduct real-time data analysis, swiftly discovering trends and creating insights.

The main interface of the dashboard is reported in Figure 9. This is the main interface presented to the user when (s)he logs in. In 9 we can identify (left to right, top to bottom) the following elements:

Here, the user is presented with the quality parameters set by the researcher (see Figure 8). The researcher can modify them by calling the module mentioned above, while the participant can only view them.

This section is a summary of the number of participants in the experiment in real time. This section is visible only to the researcher.

This section reports the progress of the experiment, in terms of the number of days covered or left.

Question delivery is key when monitoring an experiment. This section shows the level at which questions are being delivered to the user.

For any experiment, the number of answers given affects its overall quality. The researcher is presented with an average of all the answers in the experiment, whereas participants view a summary of their answers.

As with questions and answers, the sensors section helps the user understand the sensors being collected. The frequency of collection is also reported.

Figure 9.

Dashboard: Summary of an Experiment.

The dashboard main view is integrated with various other additional visualizations which focus on specific aspects of the data collection. For instance, Figure 10 displays a heatmap of the participants’ responses from each experiment day, with green denoting participants with a high answer rate and yellow denoting those with a low answer rate. Participants are represented on the $x$ -axis, while experiment days selected on the $y$ -axis. Looking at the heatmap closely, it can be noticed that no data was recorded for the date ’2020-11-12’, which may indicate a systemic issue that needed to be fixed before the experiment further progressed successfully. As another example, Figure 11 reports the outcome of a functionality which allows the researcher, but also the single participants, in this latter case restricted to their own personal data, to explore and navigate their collected data. Based on the questions and nature of the experiment, they can learn about their lifestyle, such as whether they eat a lot of snacks, spend a lot of time in one place, or even experience mood swings. As a last example, in Figure 12, three participants (orange, red, and blue lines) are compared with another participant (green line). Two graphs are displayed; the top showing the answers contributed so far; and the one at the bottom, the time in seconds before starting to answer (so called, reaction time).

Figure 10.

Heatmap of Participants’ Answers per Day.

Figure 11.

A Participant’s Data as Seen from the Dashboard.

Figure 12.

Comparison of Answering Behavior of Three Other Participants.

7. Improving the Answer Quality

We are interested in collecting context information via smartphone questions. However, these questions, when asked frequently, can become intrusive, especially when they interrupt users during periods of activity or when in mobility. As a consequence, the quality of the answers from humans is not always as high as needed. Users often do not read, or do not answer, or provide wrong answers to machine-asked questions, or turn-off their data collection APP, and more; see, e.g., Bison et al. (2024); Bontempelli et al. (2020). This problem can arise from various factors (Furnham, 1986), such as recall bias, where participants do not accurately recall previous activities (Porta, 2014), missing or incorrect responses (Bison et al., 2024; Schneier, 2015). And it becomes particularly acute when one tries to scale the collection of big-thick data to (life) long, human-in-the-loop human-machine interactions (Bontempelli et al., 2022), that is, the applications which motivate the work described in this paper. To address these challenges, it is essential to develop methodologies that not only optimize the timing of questions to minimize interruptions but also improve the overall quality of the responses collected.

By answer quality, we focus here on the number of correct answers, rather than on the number of missing answers. As already mentioned in Section 3, the state of the art suggests that the quality of answers is influenced by reaction time (Bison & Zhao, 2023; Bison et al., 2024; van Berkel et al., 2019). The shorter the reaction time, the higher the quality of answers, a factor also aligned with the recall bias theory. The aim of this section is to describe a ML component, exploiting the notions of situational and temporal context described above, capable of learning when to ask a question so that to minimize the reaction time. The proposed ML algorithm exploits the following information:

Temporal context. We consider the day of the week, represented numerically from 1 (Monday) to 7 (Sunday) plus the Hour of the day. organized in four time periods: Morning (6 AM to 11 AM), Afternoon (12 PM to 5 PM), Evening (6 PM to 11 PM), and Night (12 AM to 5 AM).

Situational context. We consider the answers to the three questions ”Where are you?” (the spatial context), ”What are you doing?” (the activity context), and ”Who are you with?” (the social context).

Demographics. We consider information is consistently used to characterize individuals through ascriptive and acquisitive traits (Blau & Duncan, 1967). Namely, we used the gender, degree and department of each participant.

All the three dimensions above play an important role on the answer quality. As an example, we report below the statistical analysis results of the spatial context, social context, and temporal context.

For instance, as shown in Table 5, high-quality answers constitute a minor fraction of 28.87% in restaurants or pubs. In contrast, in academic settings such as classrooms or university libraries, high-quality answers constitute 52.02%. Focusing on low-quality answers, restaurants have the highest percentage of low-quality answers at 71.13%, likely due to the relaxed and enjoyable atmosphere, the participant would not pay attention to the questions, which led to incorrect answers. Friends’ houses rank third with a low-quality answer percentage of 70.24%, as visiting friends or attending social gatherings often leads to positive interactions and experiences, inducing not to focus on the smartphone questions.

If we move to the temporal context and analyze the data collected during the 28 days of the experiment, we can distinguish various distinct patterns across different weekdays, as shown in Table 6. Specifically, high-quality answers are most prevalent on Monday (46.94%) and Saturday (44.46%), with the lowest incidence observed on Friday (39.31%). Conversely, the incidence of low-quality answers peaks on Friday (60.69%) and Thursday (59.48%). These findings suggest a correlation between the day of the week and answer quality states. Notably, the onset of the workweek and weekend (Monday and Saturday) witnesses a relative surge in high-quality answers. Similar considerations can be made for the social context, see Table 7.

Table 5.
Statistics of Varying Answer Quality at Different Locations.

Spatial Context High-quality Answers Low-quality Answers

Home apartment/room 45.30% 55.70%

Home relatives 41.47% 58.53%

House friends/others 29.76% 70.24%

University classroom/library 52.02% 47.98%

University canteen 40.00% 60.00%

Restaurant/pub 28.87% 71.13%

In the street 39.58% 60.42%

Another indoor place 26.93% 73.07%

Another outdoor place 28.68% 71.32%

Spatial Context	High-quality Answers	Low-quality Answers
Home apartment/room	45.30%	55.70%
Home relatives	41.47%	58.53%
House friends/others	29.76%	70.24%
University classroom/library	52.02%	47.98%
University canteen	40.00%	60.00%
Restaurant/pub	28.87%	71.13%
In the street	39.58%	60.42%
Another indoor place	26.93%	73.07%
Another outdoor place	28.68%	71.32%

Table 6.

Statistics of Varying Answer Quality on the Different Days of the Week.

Weekday	High-quality Answers	Low-quality Answers
Monday	46.94%	53.06%
Tuesday	43.83%	56.17%
Wednesday	43.19%	56.81%
Thursday	40.52%	59.48%
Friday	39.31%	60.69%
Saturday	44.46%	55.54%
Sunday	42.02%	57.98%

Table 7.

Statistics of Varying Answer Quality with Different Interact People.

Social Context	High-quality Answers	Low-quality Answers
Alone	45.66%	54.34%
Partner	35.50%	64.50%
Roommates	43.06%	56.94%
Classmates	37.82%	62.18%
Relatives	45.21%	54.79%
Friends	28.90%	71.10%
Colleagues/other	29.42%	70.58%

As part of the flexibility that the platform provides, we can apply various ML models on the data collected. As from Table 8, in this work we have used random forests (RF), K nearest neighbors (KNN), logistic regression, support vector machines (SVM), and Gaussian Naive Bayes. The classifiers were trained using 5-fold cross-validation on the comprehensive training and testing sets for all participants, with 80% of the data allocated to the training set and the remaining 20% to the testing set. The goal was to predict whether the participant would answer the question within 30 minutes, this being the time within which the answer is most likely to be correct, as from Bison et al. (2024). As depicted in Table 8, RF surpassed the other four classifiers, achieving a prediction accuracy of 0.758. These ML results demonstrate that our component can effectively predict if the participant can answer questions within 30 minutes, thus predicting answer quality based on the context information collected through questions and sensors. The RF classifier has been selected for the next step, which was to predict the answer quality for each participant. Focusing on each specific participant, we used their first two weeks of data to train the RF algorithm and then predicted their answer quality in the subsequent two weeks. As illustrated in Figure 13, the accuracy of answer quality predictions varies for each participant. In this figure, the $x$ -axis represents the user ID, which distinguishes different users, and the $y$ -axis indicates the accuracy of predicting the moments when each user can answer questions with high quality. The figure is arranged in descending order of accuracy, meaning the first user achieves the best performance, and the last one faces the worst. It is worthwhile to analyze the results in the case of a few selected participants as examples. We have the following:

Participant 244 demonstrated a high prediction accuracy of 88.61%. This participant consistently provided high-quality answers during the periods identified by the algorithm. This participant was most of the time alone and at the university.

Participant 30 showed a moderate prediction accuracy of 77.55%. The variability in the context information contributed to the fluctuations in answer quality. This participant always provided a lot of different context information in different places and activities, which made it hard to predict.

Participant 137: Exhibited the lowest prediction accuracy of 65.89%. This participant provided a non negligible number of wrong answers, e.g., driving while being in the University classroom or in the library.

The overall conclusion is that we have a good average level of predictability which does not decay much in the worst case, with very good results in the best situations.

Figure 13.

The Prediction Results of Each Participant, with Some Participant Ids Made Explicit.

Table 8.

Prediction Answer Quality Results of Different Machine Learning Classifiers.

Classifier	Accuracy	Kappa	Precision	Recall	F1 Score
Random forest	0.7583	0.4472	0.7092	0.7283	0.7200
KNN	0.7311	0.4389	0.7023	0.7254	0.7229
Logistic regression	0.7017	0.3914	0.6595	0.7003	0.7002
Gaussian Naive Bayes	0.6687	0.3405	0.6081	0.6687	0.6761
SVM	0.7282	0.4097	0.6712	0.7082	0.7104

Note. KNN = K nearest neighbor; SVM = support vector machine.

8. Conclusion

This paper introduces a novel methodology and platform, an enhanced version of the iLog APP, for the collection of large-scale sensor data and qualitative human feedback. Our main contributions are as follows:

a language for modeling the situational context based on five key dimensions, that is: spatial, temporal, internal, social, and utensil;

a language for modeling the temporal context, thus enabling a precise scheduling, still very flexible and modifiable during experiment execution, of the various aspects of the data collection;

a dashboard component which enables both researchers and participants to edit and monitor the progress of the experiment plan as a prerequisite for enhancing the quality of the data collection;

a ML component which allows the platform to infer about the best moment to take action, for instance, to ask a question.

We foresee two avenues for future research. The first is the exploration of ways to validate and verify the user responses and reduce the burden of answering questions. To address this issue, the starting point will be the work on Skeptical Learning (Zhang, 2019; Zhang et al., 2022) which allows handling of mislabeling in personal context recognition. The second is the validation of the methodology of the platform in other domains with a specific interest in health and entertainment.

Footnotes

Acknowledgment

The authors are grateful for the interactions and feedback from the people working in WeNet.

ORCID iDs

Ivan Kayongo

Leonardo Malcotti

Haonan Zhao

Fausto Giunchiglia

Funding

The author(s) received the following financial support for the research, authorship, and/or publication of this article: The research by Fausto, Ivan, and Leonardo were funded by the European Union’s Horizon 2020 FET Proactive project “WeNet–The Internet of us” ( https://internetofus.eu/ ), grant agreement No 823783. The work by Haonan received funding from the China Scholarships Council (No.202107820038).

Conflicting Interests

The author(s) declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Notes

References

Abowd

G. D.

Dey

A. K.

Brown

P. J.

Davies

Smith

Steggles

(1999). Towards a better understanding of context and context-awareness. In International symposium on handheld and ubiquitous computing, (pp. 304–307). Springer.

Afolabi

A. O.

Toivanen

(2019). Improving the design of a recommendation system using evaluation criteria and metrics as a guide. Journal of Systems and Information Technology, 21(3), 304–324.

Aharony

Pan

Khayal

Pentland

(2011). Social FMRI: Investigating and shaping social mechanisms in the real world. Pervasive and Mobile Computing, 7(6), 643–659.

Assi

Meegahapola

Droz

Kun

De Götzen

Bidoglia

Stares

Gaskell

Chagnaa

Ganbold

, et al (2023). Complex daily activities, country-level diversity, and smartphone sensing: A study in denmark, italy, mongolia, paraguay, and uk. In Proceedings of the 2023 CHI conference on human factors in computing systems, (pp. 1–23).

Bachmann

Zetzsche

Schankin

Riedel

Beigl

Reichert

Santangelo

Ebner-Priemer

(2015). ESMAE: A web-based configurator for context-aware experience sampling apps in ambulatory assessment. In Proceedings of the 5th EAI international conference on wireless mobile communication and healthcare, (pp. 15–18).

Berke

E. M.

Choudhury

Ali

Rabbi

(2011). Objective measurement of sociability and activity: Mobile sensing in the community. The Annals of Family Medicine, 9(4), 344–350.

Bison

Giunchiglia

Zeni

Bignotti

Busso

Chenu-Abente

(2021). Trento 2018—an extended pilot on the daily routines of university students. DataSet vailable at https://datascientia.disi.unitn.it.

Bison

Zhao

(2023). Factors impacting the quality of user answers on smartphones. In Proceedings of the second international conference on hybrid human-machine intelligence (HHAI 23), Vol. 3456, (pp. 208–213).

Bison

Zhao

Giunchiglia

(2024). What impacts the quality of the user answers when asked about the current context? arXiv preprint arXiv:2405.04054.

10.

Blau

P. M.

Duncan

O. D.

(1967). The american occupational structure.

11.

Bogers

Frens

Van Kollenburg

Deckers

Hummels

(2016). Connected baby bottle: A design case study towards a framework for data-enabled design. In Proceedings of the 2016 ACM conference on designing interactive systems, (pp. 301–311).

12.

Bogers

Van Kollenburg

Deckers

Frens

Hummels

(2018). A situated exploration of designing for personal health ecosystems through data-enabled design. In Proceedings of the 2018 designing interactive systems conference, (pp. 109–120).

13.

Bogomolov

Lepri

Ferron

Pianesi

Pentland

(2014). Daily stress recognition from mobile phone data, weather conditions and individual traits. In Proceedings of the 22nd ACM international conference on multimedia, (pp. 477–486).

14.

Bontempelli

Britez

M. R.

Zhao

Erculiani

Teso

Passerini

Giunchiglia

(2022). Lifelong personal context recognition.

15.

Bontempelli

Teso

Giunchiglia

Passerini

(2020). Learning in the wild with incremental skeptical gaussian processes. In Proceedings of the 29th international joint conference on artificial intelligence (IJCAI).

16.

Bontempelli

Teso

Giunchiglia

Passerini

(2021). Learning in the wild with incremental skeptical gaussian processes. In Proceedings of the twenty-ninth international conference on international joint conferences on artificial intelligence, (pp. 2886–2892).

17.

Bornakke

Due

B. L.

(2018). Big–thick blending: A method for mixing analytical insights from big and thick data sources. Big Data & Society, 5(1), 2053951718765026.

18.

Boyd

Crawford

(2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information, Communication & Society, 15(5), 662–679.

19.

Bradburn

N. M.

Sudman

Wansink

(2004). Asking questions: The definitive guide to questionnaire design–for market research, political polls, and social and health questionnaires. San Francisco: John Wiley & Sons.

20.

Brown

P. J.

Bovey

J. D.

Chen

(1997). Context-aware applications: From the laboratory to the marketplace. IEEE Personal Communications, 4(5), 58–64.

21.

Busso

Bontempelli

Javier Malcotti

Meegahapola

Kun

Diwakar

Nutakki

Rodas Britez

M. D.

Song

Ruiz-Correa

Mendoza-Lara

A.-R.

Gaskell

Stares

Bidoglia

Ganbold

Chagnaa

Cernuzzi

Hume

…Giunchiglia

(2025). DiversityOne: A multi-country smartphone sensor dataset for everyday life behavior modeling. ACM Conf. on Interactive, Mobile, Wearable and Ubiquitous Technologies (IMWUT), 9(1), 1–49.

22.

Chen

Chiang

R. H.

Storey

V. C.

(2012). Business intelligence and analytics: From big data to big impact. MIS Quarterly, 36, 1165–1188.

23.

Das

T. K.

Kumar

P. M.

(2013). Big data analytics: A framework for unstructured data analysis. International Journal of Engineering Science & Technology, 5(1), 153.

24.

Davenport

T. H.

Dyché

(2013). Big data in big companies. International Institute for Analytics, 3(1–31), 1–31.

25.

Dey

A. K.

Abowd

G. D.

Wood

(1998). Cyberdesk: A framework for providing self-integrating context-aware services. Knowledge-Based Systems, 11(1), 3–13.

26.

Dickerson

R. F.

Hoque

Emi

I. A.

Stankovic

J. A.

(2015). Empath2: A flexible web and cloud-based home health care monitoring system. In Proceedings of the 8th ACM international conference on pervasive technologies related to assistive environments, (pp. 1–8).

27.

D’ignazio

Klein

L. F.

(2023). Data feminism. Cambridge, MA: MIT press.

28.

T. M. T.

Gatica-Perez

(2012). Contextual conditional models for smartphone-based human mobility prediction. In Proceedings of the 2012 ACM conference on ubiquitous computing, (pp. 163–172).

29.

Dunton

G. F.

Liao

Kawabata

Intille

(2012). Momentary assessment of adults’ physical activity and sedentary behavior: Feasibility and validity. Frontiers in Psychology, 3, 260.

30.

Farrahi

Gatica-Perez

(2011). Discovering routines from large-scale human locations using probabilistic topic models. ACM Transactions on Intelligent Systems and Technology (TIST), 2(1), 1–27.

31.

Fausto

Ivano

Matteo

Ronald

C.-A.

Marcelo

Mattia

Can

Giuseppe

Amalia

Peter

Amarsanaa

Altangerel

George

Sally

Miriam

Luca

Alethia

Jose Luis

Lakmal

Daniel

G.-P.

(2021). A worldwide diversity pilot on daily routines and social practices (2020). Dataset soon to be available at: https://ri.internetofus.eu. University of Trento (Tech. Rep.) - DataScientia dataset descriptors.

32.

Ferreira

Kostakos

Dey

A. K.

(2015). Aware: Mobile context instrumentation framework. Frontiers in ICT, 2, 6.

33.

Furnham

(1986). Response bias, social desirability and dissimulation. Personality and Individual Differences, 7(3), 385–400.

34.

Geertz

(2008). Thick description: Toward an interpretive theory of culture. In The cultural geography reader, (pp. 41–51). Routledge.

35.

Giunchiglia

(1993). Contextual reasoning. Epistemologia, Special Issue: I Linguaggi e le Macchine, 16, 345–364.

36.

Giunchiglia

Bignotti

Zeni

(2017a). Personal context modelling and annotation. In 2017 IEEE international conference on pervasive computing and communications workshops (PerCom Workshops), (pp. 117–122). IEEE.

37.

Giunchiglia

Bison

Busso

Chenu-Abente

Rodas

Zeni

Gunel

Veltri

De Götzen

Kun

, et al (2021). A worldwide diversity pilot on daily routines and social practices (2020). DataSet vailable at https://datascientia.disi.unitn.it.

38.

Giunchiglia

(2024). Big-thick data generation via reference and personal context unification. In ECAI 2024, (pp. 1975–1984). IOS Press.

39.

Giunchiglia

Busso

Rodas-Britez

(2023). A context model for personal data streams. In Web and big data: 6th International joint conference, APWeb-WAIM 2022, Nanjing, China, November 25–27, 2022, Proceedings, Part I, (pp. 37–44). Springer.

40.

Giunchiglia

Zeni

Bignotti

Zhang

(2018). Assessing annotation consistency in the wild. In 2018 IEEE International conference on pervasive computing and communications workshops (PerCom Workshops), (pp. 561–566). IEEE.

41.

Giunchiglia

Zeni

Gobbi

Bignotti

Bison

(2017b). Mobile social media and academic performance. In International conference on social informatics, (pp. 3–13). Springer, Cham.

42.

Gohil

Sharma

Sachdeva

Gupta

Dhillon

M. S.

(2020). Epicollect 5: A free, fully customizable mobile-based application for data collection in clinical research. Journal of Postgraduate Medicine, Education and Research, 54(4), 248–51.

43.

Harari

G. M.

Lane

N. D.

Wang

Crosier

B. S.

Campbell

A. T.

Gosling

S. D.

(2016). Using smartphones to collect behavioral data in psychological science: Opportunities, practical considerations, and challenges. Perspectives on Psychological Science, 11(6), 838–854.

44.

Huang

Xiong

Leach

Zhang

Chow

Fua

Teachman

B. A.

Barnes

L. E.

(2016). Assessing social anxiety using gps trajectories and point-of-interest data. In Proceedings of the 2016 ACM international joint conference on pervasive and ubiquitous computing, (pp. 898–903).

45.

Intille

(2016). The precision medicine initiative and pervasive health research. IEEE Pervasive Computing, 15(1), 88–91.

46.

Intille

S. S.

Rondoni

Kukla

Ancona

Bao

(2003). A context-aware experience sampling tool. In CHI’03 extended abstracts on Human factors in computing systems, (pp. 972–973).

47.

Joki

Burke

J. A.

Estrin

(2007). Campaignr: A framework for participatory data collection on mobile phones. UCLA: Center for Embedded Network Sensing, 1–15.

48.

Kasinidou

Kleanthous

Busso

Rodas

Otterbacher

Giunchiglia

(2024). Artificial intelligence in everyday life 2.0: Educating university students from different majors. In Proceedings of the 2024 on innovation and technology in computer science education V. 1, (pp. 24–30).

49.

Kayongo

Zhao

Malcotti

Giunchiglia

(2024). A methodology and system for big-thick data collection. In: INFORMATIK 2024, Gesellschaft f’ur Informatik eV, 2024, (pp. 455–463). ArXiv:2404.17602v3.

50.

Lathia

Pejovic

Rachuri

K. K.

Mascolo

Musolesi

Rentfrow

P. J.

(2013). Smartphones for large-scale behavior change interventions. IEEE Pervasive Computing, 12(3), 66–73.

51.

Lee

M. L.

Dey

A. K.

(2015). Sensor-based observations of daily living for aging in place. Personal and Ubiquitous Computing, 19(1), 27–43.

52.

Rodas-Britez

Busso

Giunchiglia

(2022). Representing habits as streams of situational contexts. In Advanced information systems engineering workshops: CAiSE 2022 international workshops, Leuven, Belgium, June 6–10, 2022, Proceedings, (pp. 86–92). Springer.

53.

Liao

Intille

S. S.

Dunton

G. F.

(2015). Using ecological momentary assessment to understand where and with whom adults’ physical and sedentary activity occur. International Journal of Behavioral Medicine, 22, 51–61.

54.

Lind

M. N.

Kahn

L. E.

Crowley

Reed

Wicks

Allen

N. B.

(2023). Reintroducing the effortless assessment research system (ears). JMIR Mental Health, 10(1), e38920.

55.

Maddalena

Ibáñez

L.-D.

Simperl

Gomer

Zeni

Song

Giunchiglia

(2019). Hybrid human machine workflows for mobility management. In Companion proceedings of the 2019 world wide web conference, (pp. 102–109). ACM.

56.

McCarthy

(1987). Generality in artificial intelligence. Communications of the ACM, 30(12), 1030–1035.

57.

Meegahapola

Droz

Kun

De Götzen

Nutakki

Diwakar

Correa

S. R.

Song

Bidoglia

, et al (2023). Generalization and personalization of mobile sensing-based mood inference models: An analysis of college students in eight countries. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 6(4), 1–32.

58.

Mehrotra

Pejovic

Vermeulen

Hendley

Musolesi

(2016). My phone and me: Understanding people’s receptivity to mobile notifications. In Proceedings of the 2016 CHI conference on human factors in computing systems, (pp. 1021–1032).

59.

Michael

Bison

Busso

Cernuzzi

de Götzen

Diwakar

Gal

Ganbold

Gaskell

Gatica-Perez

Heesen

Miorandi

Osman

Ruiz-Correa

Schelenz

Segal

Sierra

Giunchiglia

(2025). Towards open diversity-aware social interactions. ArXiv, under submission.

60.

Miluzzo

Lane

N. D.

Fodor

Peterson

Musolesi

Eisenman

S. B.

Zheng

Campbell

A. T.

(2008). Sensing meets mobile social networks: The design, implementation and evaluation of the cenceme application. In Proceedings of the 6th ACM conference on embedded network sensor systems, (pp. 337–350).

61.

Mishra

Lowens

Lord

Caine

Kotz

(2017). Investigating contextual cues as indicators for ema delivery. In Proceedings of the 2017 ACM international joint conference on pervasive and ubiquitous computing and proceedings of the 2017 ACM international symposium on wearable computers, (pp. 935–940).

62.

Onnela

J.-P.

Dixon

Griffin

Jaenicke

Minowada

Esterkin

Siu

Zagorsky

Jones

(2021). BEIWE: A data collection platform for high-throughput digital phenotyping. Journal of Open Source Software, 6(68), 3417.

63.

Osman

Chenu-Abente

Shen

Sierra

Giunchiglia

(2021). Empowering users in online open communities. SN Computer Science, 2(4), 338.

64.

Peltonen

Sharmila

Asare

K. O.

Visuri

Lagerspetz

Ferreira

(2020). When phones get personal: Predicting big five personality traits from application usage. Pervasive and Mobile Computing, 69, 101269.

65.

Porta

(2014). A dictionary of epidemiology. New York: Oxford university press.

66.

Rabbi

Aung

M. H.

Zhang

Choudhury

(2015). Mybehavior: Automatic personalized health feedback from user behaviors and preferences using smartphones. In Proceedings of the 2015 ACM international joint conference on pervasive and ubiquitous computing, (pp. 707–718).

67.

Ranjan

Rashid

Stewart

Conde

Begale

Verbeeck

Boettcher

Hyve Dobson

Folarin

; RADAR-CNS Consortium. (2019). Radar-base: Open source mobile health platform for collecting, monitoring, and analyzing data using sensors, wearables, and mobile devices. JMIR MHealth and Uhealth, 7(8), 1–26.

68.

Runyan

J. D.

Steenbergh

T. A.

Bainbridge

Daugherty

D. A.

Oke

Fry

B. N.

(2013). A smartphone ecological momentary assessment/intervention “app” for collecting real-time data and promoting self-awareness. PloS one, 8(8), e71325.

69.

Schilit

B. N.

Theimer

M. M.

(1994). Disseminating active map information to mobile hosts. IEEE Network, 8(5), 22–32.

70.

Schneier

(2015). Secrets and lies: Digital security in a networked world. John Wiley & Sons.

71.

Stavropoulos

T. G.

Meditskos

Kompatsiaris

(2017). Demaware2: Integrating sensors, multimedia and semantic analysis for the ambient care of dementia. Pervasive and Mobile Computing, 34, 126–145.

72.

Sun

Rhemtulla

Vazire

(2021). Eavesdropping on missing data: What are university students doing when they miss experience sampling reports. Personality and Social Psychology Bulletin, 47(11), 1535–1549.

73.

van Berkel

Goncalves

Koval

Hosio

Dingler

Ferreira

Kostakos

(2019). Context-informed scheduling and analysis: Improving accuracy of mobile self-reports. In Proceedings of the 2019 CHI conference on human factors in computing systems, (pp. 1–12).

74.

Wang

Chen

Harari

Tignor

Zhou

Ben-Zeev

Campbell

A. T.

(2014). Studentlife: Assessing mental health, academic performance and behavioral trends of college students using smartphones. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing, (pp. 3–14).

75.

Wang

Harari

G. M.

Wang

Müller

S. R.

Mirjafari

Masaba

Campbell

A. T.

(2018b). Sensing behavioral change over time: Using within-person variability features from mobile sensing to predict personality traits. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(3), 1–21.

76.

Wang

Mirjafari

Harari

Ben-Zeev

Brian

Choudhury

Hauser

Kane

Masaba

Nepal

, et al (2020). Social sensing: Assessing social functioning of patients living with schizophrenia using mobile phone sensing. In Proceedings of the 2020 CHI conference on human factors in computing systems, (pp. 1–15).

77.

Wang

DaSilva

Huckins

J. F.

Kelley

W. M.

Heatherton

T. F.

Campbell

A. T.

(2018a). Tracking depression dynamics in college students using mobile phone and wearable sensing. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 2(1), 1–26.

78.

Xiaoyue

Marcelo

R. B.

Matteo

Fausto

(2022). Representing habits as streams of situational contexts. In Advanced information systems engineering workshops: CAiSE 2022 international workshops.

79.

L. R.

Johnson

P. E.

(1995). The impact of explanation facilities on user acceptance of expert systems advice. Mis Quarterly, 17, 157–172.

80.

Zeni

Bison

Gauckler

Fernando

Giunchiglia

(2020). Improving time use measurement with personal big data collection - the experience of the european big data hackathon 2019. Journal of Official Statistics, 37(2), 1–25.

81.

Zeni

Zaihrayeu

Giunchiglia

(2014). Multi-device activity logging. In Proceedings of the 2014 ACM international joint conference on pervasive and ubiquitous computing: Adjunct publication, (pp. 299–302).

82.

Zeni

Zhang

Bignotti

Passerini

Giunchiglia

(2019). Fixing mislabeling by human annotators leveraging conflict resolution and prior knowledge. Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 3(1), 32.

83.

Zhang

(2019). Personal context recognition via skeptical learning. In IJCAI, (pp. 6482–6483).

84.

Zhang

Shen

Teso

Lepri

Passerini

Bison

Giunchiglia

(2021). Putting human behavior predictability in context. EPJ Data Science, 10(1), 42.

85.

Zhang

Zeni

Passerini

Giunchiglia

(2022). Skeptical learning–an algorithm and a platform for dealing with mislabeling in personal context recognition. Algorithms, 15(4), 109.

A Methodology and a Platform for High-quality Rich Personal Data Collection #

Abstract

Keywords

1. Introduction

2. Related Work

3. iLog At Work

Footnotes

Acknowledgment

ORCID iDs

Funding

Conflicting Interests

Notes

References