Abstract
We present an e-therapy framework that collects live therapeutic context by analyzing body joint data in a noninvasive way. Using our proposed framework, a therapist can model complex gestures by mapping them to a set of primitive gesture sequences and generate high-level serious game-based therapies. As a proof of concept, we have developed scenarios to express a hemiplegic patient's behavior into a set of trackable primitive gestures. We have used Second Life and Map browsing based serious game environment for immersive experience. The initial feedback from the therapists who have tested our developed framework is encouraging. Finally, we share the implementation details and analysis of test results.
1. Introduction
A context-aware e-therapy system assumes that each patient is surrounded by a smart 3D space, which will be able to identify him/her at home or outdoors, recognize his/her behaviors, therapeutic activities, and health conditions, and assist the patient according to his/her individual preferences anytime and anywhere. Hence, e-therapy related research has been a center point of many entities such as patients, caregiver social networks, government, research, healthcare and medical institutions, and software industry [1]. This is because context-aware e-therapy research domain offers high quality healthcare by leveraging recent advancements in multidisciplinary research domains such as health sensors, smart body area networks, cloud computing, online games, and computer vision techniques. For example, various computing platforms such as Microsoft Kinect [2], Microsoft Digits [3], Vicon camera [4] and LEAP (https://www.leapmotion.com/) motion, to name a few, have recently emerged that help in identifying physiological and gait parameters from a therapy session in real time. Using state-of-the-art gesture recognition techniques, a therapist can identify therapeutic activities such as flexion, extension, adduction, abduction, circumduction, pronation, and supination from different body parts [5]. Although it is hard to calculate all the metrics using a single existing framework, we can create a mashup of state-of-the-art platforms to be able to detect and track a rich set of body gestures. For instance, due to the lack of precision in detecting minute hand gestures by Microsoft Kinect, we can augment with a 3D motion-sensing sensor such as LEAP. As an example, Kinect cannot detect forearm rotational movement such as pronation and supination and flexion/extension of MCP or PIP joints of hand. In summary, a mashup framework [6] is envisioned to detect and generate a rich set of kinematic data. Analyzing these multimodal sensory data would give a rich set of a patient's context so that a therapist can evaluate the current state of a patient and decide the next levels of therapy.
To add to this advancement, rapid growth in Web 2.0 has made e-therapy applications ubiquitous. Thanks to social networks, a patient can now be always connected with his/her community of interest such as therapist, caregiver family members, medical facilities, and government. Moreover, due to the ubiquitous availability of 4G and Wi-Fi networks, a patient can now be connected with a therapist where a therapist can observe the activities of a patient live and guide the patient, even if the patient is at home [7]. Advancement in networked games such as Second Life has also made impact on e-therapy environment. For example, “Second Life” is able to create a live network of a very large number of online patients, medical staff, therapist, and caregiver social networks [8, 9]. Advancements in the above areas have resulted in a context-aware e-therapy framework, which is aware of the events and disability level of a patient and is sensitive, flexible, and responsive to the needs, practices, and gestures of a patient. By leveraging the advancements in the state of the art, it can then seamlessly provide therapeutic services when or wherever it is needed.
The existing literature review and recommendations from therapists suggest that if correct and prescribed therapies are practiced by a disabled patient, he/she can regain enough muscle power to be able to move the affected joints like a normal person [10]. Each therapy module is intended to move a certain set of body joints and muscles through some body motions of the affected area. This helps in regaining and producing enough muscle power to help in different actions. Researchers in the medical sector and the “computing professionals” have teamed up to address the challenge of developing specialized gesture tracking technologies [11–14]. According to the therapists, the ultimate goal of a successful therapy is to achieve the following four aspects. Firstly, help the patient to return to their normal state through personalized therapy modules. Secondly, if the disability is incurable, personalized therapy support is provided instead so that he/she can do his/her day-to-day physical activities in home or at school. Thirdly, since the disabled patients pass a significant amount of time in their homes, they need to continue the therapy sessions with the aid of their caregiver family members while they are at home. Finally, the therapy data originated from each therapy session needs to be stored in a network repository that is accessible to the caregiver institution, medical professional, and the therapists to analyze quality of improvement metrics. These four aspects form the basis of the research presented in this paper.
As a proof of concept, we leverage the existing state of the art and propose a 3D serious game environment specifically tailored for disabled children with hemiplegia [2, 7, 15, 16]. Although previous studies have used wearable sensors to capture body motion of disabled patients [12, 17–21], this work mashes up Microsoft Kinect and LEAP as a noninvasive way of capturing motion from 35 joints of the human body (see Figures 4 and 5). This noninvasive multisensory approach aims to map the body motions as they occur in the physical environment with that of the relevant avatar in the 3D virtual environment. The developed clinical repository stores the session data, which comprises of physical activities done through 35 joints of a human body. Algorithms have been developed to parse the session data to produce live animation in 3D, as well as produce graphical plotting of kinematic data. The multisensor multimedia environment helps us capture primitive natural gestures of both the therapist and the patients. The 3D virtual world allows parents at home, disabled children, caregiver institution, medical professionals, and the therapist to be integrated in one virtual world. Using the proposed framework, a therapist can define different levels (e.g., simple, intermediate, and advanced levels) of therapy sessions where each unique therapy session is designed for each unique type of disability and the types of activities to be monitored during the session.
2. Framework Design
2.1. Motivating Scenario
Before we go into the framework modeling, let us assume several scenarios where each patient having diversified therapeutic needs has to be dealt separately. We assume that a disabled children hospital has several left hemiplegic patients [22] who have different levels of disability in their left body parts. We term them as intrapatient and interpatient disability. In the former case, intrapatient disability level is defined as follows: at the time of admission, a hemiplegic patient has certain disability in different joints of his/her body. After he/she is exposed to different therapies, the range of motion (ROM) of the patient is assumed to improve as close to normal. This development is said to be intrapatient disability. While at the case of interpatient disability, it refers to how one patient's disability level is different from other patients. Since every patient has different physiological developments, we define a scenario for the intradisability, which can be extended to interpatient disability as well.
We assume a child named Alice suffers from sudden numbness in her left side. Her parents took her to the nearest disability hospital and the doctors classified her suffering as left hemiplegic and suggested her parents to take her to the Hemiplegia therapist to discover details about the type of hemiplegia and make arrangements for needed therapy so that she can regain her physical strength in the affected body parts. At her first visit to the assigned therapist, the therapist requested Alice to make some gestures and motions to see the current state of joints and muscles. Based on these initial physical movements, the therapist rates the severity level of Alice's hemiplegia and chooses different therapy modules for Alice. The physical condition of Alice is such that she cannot wear complex gloves or wearable devices, rather a noninvasive way of tracking her movements is highly desirable. At her every visit to the therapist, Alice's kinematic data is recorded in her health record so that the therapist can monitor the quality of improvement of the targeted muscles and joints that are affected.
Since most of the time Alice is at home, her therapist wants Alice to carry on the therapy exercises, with the help of her mom or other family members. In order to assist Alice's caregiver family member, the therapist has defined some ideal therapy sessions and shared them with Alice such that she can do those exercises in a 3D game environment. The therapist also wants Alice to conduct the required numbers of therapy sessions at home and the results to be available to him for review. The therapist wants to keep track of certain quality of improvement metrics that signifies that right therapy module is given to Alice and that she is improving at the desired pace. The therapist is interested to observe statistical data regarding quality of improvement in a weekly, monthly, and yearly time period. While Alice is progressing, the therapist wants to increase the complexity and difficulty level to make Alice close to her normal counterpart. To be aligned with the distributed nature of the therapy scenario, the health and session data of Alice needs to be stored in an online repository.
From the above scenario and suggestion from other researchers [23], we can conclude that a patient with hemiplegia is recommended to be exposed to a multimedia environment that can track the necessary joints, muscles, and actions and produce necessary kinematic metrics that helps in decision making of a therapist. Existing researches also suggest that a noninvasive way of measuring kinematic data is desirable since it does not need the wearable computing and medical devices, allowing the patient to exert natural gestures [24]. Next, we present the therapeutic context modeling, similar to [25]. At first, we model individual actions in the terms of human body joints. We then combine a subset of primitive actions to form a high-level therapy action.
2.2. Primitive Therapies
We first define how primitive therapies can be mashed up to create high-level therapy. This serves as a context vocabulary or schema that can be used to verify the completeness of a unique high-level context. We express the set of therapeutic context primitives as follows [25]: c1 = patient wrist flexed (events context); c2 = patient forearm at pronated (events context); c3 = patient elbow at 90 degrees (events context); c4 = patient elbow extended (events context); c5 = patient wrist extended (events context); c6 = patient forearm supinated (events context); c7 = time is afternoon (temporal context); c8 = location is home (spatial context).
If (c1) and (c2) and (c3) and (c4) and (c5) and (c6) and (c7) and (c8), then context = patient did medial epicondylitis therapy at home in afternoon.
To illustrate the context primitives pertaining to some other dimensions of life, let us consider the following additional contexts of a user such as “patient is grasping an object by hand,” “patient is poking,” “patient is walking,” and “patient is moving his forearm clockwise and then anticlockwise.” Leveraging the topology tree [25], we can break down each of the four high-level contexts into smaller chunks of primitive contexts as shown below. Again, the following breakdown of primitives is just an instance from many possibilities, depending on the suggestion of the therapist. c1 = patient thumb MCP joint flexed (event context); c2 = patient thumb PIP joint flexed (event context); c3 = patient thumb DIP joint flexed (event context); c4 = patient index MCP joint flexed (event context); c5 = patient index PIP joint flexed (event context); c6 = patient index DIP joint flexed (event context); c7 = patient middle MCP joint flexed (event context); c8 = patient middle PIP joint flexed (event context); c9 = patient middle DIP joint flexed (event context); c10 = patient ring MCP joint flexed (event context); c11 = patient ring PIP joint flexed (event context); c12 = patient ring DIP joint flexed (event context); c13 = patient pinky MCP joint flexed (event context); c14 = patient pinky PIP joint flexed (event context); c15 = patient pinky DIP joint flexed (event context); c16 = patient palm surface curly (event context).
If (c1) and (c2) and (c3) and (c4) and (c5) and (c6) and (c7) and (c8) and (c9) and (c10) and (c11) and (c12) and (c13) and (c14) and (c15) and (c16), then context = patient is grasping an object by hand.
Similarly, other high-level therapy contexts can be broken down into their primitive actions in terms of joints and actions, which in turn can be tracked by the 3D depth and 3D motion sensors. However, the rule how to formulate a high-level user context is completely flexible and a therapist can help in deducing all such user contexts in terms of the primitive actions. Please note that we separate the concept of sensors into two categories: joint-tracking sensors and non-joint-tracking sensors. Joint-tracking sensors are Kinect, which is used to track body joints and gestures except hand joints and motions and LEAP is used only from actions originated from wrist to DIP joints of the hand. Non-joint-tracking sensors are those that we use in our framework for sensing other phenomena such as location, ambient temperature, and heart rate. Figure 1 shows some example therapy actions that we use in our modeling in terms of primitive gestures.

Example of primitive motion types: (a) pronation/supination at forearm, (b) adduction/abduction at shoulder and hip joint, and (c) (from top left) flexion/extension at knee, wrist, elbow, fingers, shoulder, hip, and vertebral column joints, respectively (courtesy: Irving P. Herman).
Table 1 shows sample expressions of complete definition of user context that uniquely defines distinct therapeutic states of a patient. Please note that the complete context definition is domain specific and can be designed by expert therapists during the modeling and design time. For example, a therapist can provide the threshold of each joint's range of motion, levels of difficulty, and so forth.
Deducing therapeutic context primitives from Mashup framework.
Using the above modeling technique, we can break down a complex therapy action such as “walking” and break down into primitive joints and actions. Figure 2 shows an initial walking position (from left) and after 3 steps the same position is reached (right most), making one complete walking gait cycle. This complex walking therapy can be modeled by simultaneously looking at hip, knee, and ankle joints. For example, at every step of walking cycle, the hip flexion/extension, knee flexion/extension, and ankle dorsiflexion/planter flexion values determine the physical posture of our walking sequence. By looking at these joint parameters, we can easily deduce the gait cycle and the range of motion of each of the joints during this therapy.

A walking gait cycle in terms of ankle, knee, and hip joint movements.
2.3. Database Design
In order to hold the therapy primitives and complex/compound therapy details, we designed a database that stores details about disability type, therapy types, types of motions involved in each therapy type, joints and muscles to be tracked in each motion type, metrics that store those joint and muscle values, normal ranges of each of the joints and motions, and improvement metrics for each disability type, to name a few. Figure 3 shows a model where each entity is connected to different other entities in a database. The patient domain refers to the set of disabled person in our system. Each disabled patient is assigned one or more therapy modules from the therapy domain, which is the set of therapies available to the system. Each therapy is mapped to one or more quality of improvement metrics. Each metric is composed of variables in terms of different therapeutic motions that are mapped to a subset of motions from the motion domain. Each motion is composed of a subset of body muscles and joints that are mapped to the muscles and joints domain, respectively.

Database design methodology to support the e-therapy framework.

Interface showing different trackable joints of a body. The trapezoids covering the joints of hand are tracked by 3D motion sensor LEAP and the rest of the joints are tracked by Kinect.

Joint tracking capability of (a) Kinect and (b) LEAP has been mashed up into the framework.
2.4. Motion Analysis
The motion analysis algorithm for the LEAP and Kinect motion analyzer is shown in Algorithm 1. As shown in Algorithm 1, the motion analyzer receives each frame from LEAP and Kinect and delegates each frame to the appropriate model component that has the right therapeutic logic to parse and detect which motion is taking place in which joint.
MotionAnalyzer (PatientID, TherapyID) Get LeapStream, KinectStream; Begin Read joints and movements to be tracked from the database for the given PatientID and TherapyID; Foreach LeapFrame in LeapStream Begin Foreach joint and movement tuple Begin Call appropriate function to process the joint and its related motion; Update visualization window with related metrics; End End Foreach KinectFrame in KinectStream Begin Foreach joint and movement tuple Begin Call appropriate function to process the joint and its related motion; Update visualization window with related metrics; End End End
Algorithm 1
3. Implementation
3.1. Authoring Interface
In order to facilitate a therapist in combining primitive therapy actions into a complex sequence of high-level therapy, we have designed a web interface as shown in Figure 4. Using this anatomical model, a therapist can associate a subset of body joints with a subset of primitive or high-level actions and other trackable metrics. For example, a therapist can choose different low-level primitive therapies as shown in Table 1 by mapping each primitive with a joint using Figure 4 and then combine them to make a high-level therapy, similar to the one shown in Table 1.
3.2. Computing Platform and Sensors
We mashup Kinect-based joint tracking and LEAP-based hand-tracking framework into one single framework. The Kinect framework gives us the joint data shown in Figure 5(a) and the LEAP framework provides us joint tracking capability of the following hand joints: distal interphalangeal (DIP) and proximal interphalangeal (PIP) joints having 1 degree of freedom (DoF); metacarpophalangeal (MCP) joints having 3 DoF; and extra 3 DoF for trapeziometacarpal (TM) thumb joint. We have explored other non-joint-tracking sensors, details of which are out of scope of this paper, but can be explored in [25, 26].
The therapy environment consists of a PC running Windows 8 with 12 GB of RAM with a Kinect for Windows device attached to it to capture joint data. We have used Kinect for Windows 1.8 SDK to capture the kinematic data. We use Microsoft Speech Platform SDK v.11.0 to provide voice-based commands such as “start,” “stop,” “record,” “pause,” “save,” and “exit.” We have developed analytics to capture raw joint data from 35 different joints of a subject per second. As 3D game environment, we have used Second Life (see Figure 6) and GIS-based 3D map browsing environment. We have used 3D motion sensor from Leap Motion. The effective range of the 3D motion sensor extends from approximately 25 to 600 millimeters above the device (see Figure 7). We have launched a 3D web version of the framework. As shown in Figure 7, the actual physical movement (both angular and rotational) of fingers is synchronized with that of the 3D hand. This allows a subject to observe the therapeutic activities in the 3D world over the web. The web version is based on HTML5 and three.js (http://threejs.org/) 3D JavaScript framework. 2D motion data analyzing and plotting have been implemented using PHP and JPGraph (http://jpgraph.net/).

An in-home Kinect and Second Life assisted elbow flexion/extension therapy session performed by a subject.

An in-home LEAP and Web2.0 assisted hand therapy module where (a) a subject with all fingers is in extension state sensed by the LEAP, (b) live 3D mapping of physical fingers with that of a virtual hand showing flexion state, and (c) live 3D mapping of physical fingers with that of a virtual hand showing flexion state.
As a sample exercise, we have implemented a therapy consisting of six movements for the forearm and two joints. The game considered in this test scenario is a 3D map browsing session (see Figure 8) where a subject browses a map by going left (radial deviation), right (ulnar deviation), zoom in (wrist flexion), zoom out (wrist extension/hyperextension), and circling around the airplane (pronation/supination). LEAP is used to monitor pronation and supination of the forearm as well as flexion, extension, radial deviation, and ulnar deviation for the wrist. We have used the following types of games in different test scenarios.

Experimental setup and user interface: (a) shows the experimental setup and (b) shows the user interface (A) Kinect device, (B) LEAP device, (C) live 3D rendering of Kinect provided skeleton, (D) live rendering of LEAP stream showing hand skeleton, (E) inverse kinematic interface, where feedback from Kinect stream appears (F), feedback from LEAP appears in this interface, (G) a user playing a 3D map browsing game which is tracked by both LEAP and Kinect to deduce rotational and angular motions from hand joints, and (H) 3D serious game window (I) control menus (J) hand is represented by a flying kite.
The Red Ball Game. In this game, the user has to move a blue ball with the movement of her hand. There are a number of red balls attacking the blue ball from different angles. Also, there are some black holes and yellow balls on the screens. The user has to push the yellow ball through the blue ball and make it fall into the black holes. The user also needs to make sure that the blue ball does not fall into the black hole. There are some green balls on the canvas too. Hitting them gives the user bonus points.
3D Map Browsing. In the 3D map browsing game, a user has to fly through a city terrain which is fetched from the Nokia Here Map Server. Moving the hand right or left moves the paper plane in the respective direction. Up and down movement of the hand translates into high and low flying, respectively. There is a low threshold on flying so the plane never crashes into the ground.
2D Map Browsing. In the 2D map browsing game, the user can request a map from any of the more than 6 available map servers, including Nokia, Google, and ESRI. The cursor appears in the form of a grey globe on the screen. When the user clenches his fist, the globe changes into green and the map is locked. Moving the clenched fist right or left results in panning the map right or left, respectively. Moving the clenched fist forward and backward translates into north and south motion. To perform the zoom operation, two clenched fists are brought into the view of the leap device to lock the map. If the fists move in opposite direction, the zoom in operation takes place. Similarly, when the fists are brought closer to each other, the zoom out operation is executed.
4. Test Results
Most of the experiments were conducted with the test subject at a distance of about 2 meters from the Kinect sensor, although the distance ranged from 1.5 to 3.5 meters (see Figure 8). In most cases, the subject directly faced the Kinect sensor, in front of which a computer screen displayed the live performance of their movements and rendered skeleton within the multimedia environment. For LEAP Motion, we assumed the setup shown in Figure 7. Our initial system has been developed targeting therapists only. We contacted therapists from 3 different disabled children hospital who are specialized in dealing with hemiplegic patients. We had been frequent in getting their feedback and updating our system. The therapists themselves used our system and gave us useful data for our analysis. This is because we want to make sure that our system adheres to the standard and therapeutic requirements before we actually test with the actual subjects. Since the entire test data we present in this paper is collected from either a therapist or a healthy subject, in this case, we assume all therapies to be active therapies, that is, a therapist or a caregiver family member sitting outside the view of the Kinect or LEAP gave instructions to the subject while the subject did the tests herself.
Table 2 shows the details about different motion detection accuracy along with the maximum and minimum range of motion and the source of joint tracking platform, that is, Kinect or LEAP. Using the primitive motions as shown in Table 2, we can analyze many high-level therapy modules. Using our analytical model, we can deduce kinematic data of primitive motions shown in Table 2. Next, we show how accurately both Kinect and LEAP can detect different primitive and complex therapeutic motions.
Detected primitive motion details at different joints (LP: LEAP, KI: Kinect).
Before we deployed our framework, we have worked with three disability hospitals in Makkah, namely, Basma Center for Disabled Kids, Disabled Children Association Hospital (http://www.dca.org.sa/) in Makkah, and Al Noor Hospital in Makkah. Finally, we have deployed our proposed hand therapy framework in Al Noor Hospital (http://alnoorhospital.com/HandTherapy.aspx), Makkah, Saudi Arabia, where the therapist has allowed us to test our framework on a patient who was a 33-year-old male suffering from hemiplegia due to a fall from the third floor of a building. However, he was being treated in a conventional and manual method through a goniometer before he was introduced to our system. Since the patient had movement problem, the therapist suggested deploying our framework in his home where the therapy environment looked similar to Figure 9.

A patient using our proposed hand therapy framework (identity blurred for privacy reasons).
Before the therapy session started, the patient was tested for range of motion using a goniometer. The speed of movement was measured with a stopwatch. At every interval between the exercises, similar measurements were taken. In addition to that measurement, values were obtained from the framework and compared. Figure 10 shows two observations of hand therapy session data obtained in two different temporal points. We first introduced our framework to the therapist for one month (March 2014) in which we obtained significant suggestions about data collection and graph plotting metrics. The patient was suggested by the therapist to perform a “hand elbow flexion-extension” therapy for 11 times in each session. The Observation 1 was performed by the patient on July 13, 2014 while the Observation 2 was captured on July 15, 2014, both at his home. The x-axis shows the range of motion (ROM) where elbow joint can produce maximum 180 degrees when fully extended and 20 degrees when fully flexed. The y-axis shows the number of frames captured by our framework. As can be seen from the maximum extension ROM data from two readings, the maximum ROM data showed significant improvement between two observations (Observation 2 produced more maximum-extension angle). This shows the effectiveness of the game environment and the capability of showing therapist the effectiveness of the particular therapy. It also shows the improvement rate of the patient.

Range of motion data obtained from a patient (see Figure 9) having hemiplegia.
Figure 11 shows the quality of improvement for two different hand therapies over three different sessions. In the wrist pronation graph (Figure 11(a)), the x-axis shows the frame number while the y-axis shows the value of the palm normal. Initially, the palm normal is −1 which means that the palm is facing downwards. This state is called pronation. In the third session (khaki color), the hand is supinated to its fullest, so we can see the graph going all the way up to +1. The other sessions show partial ability to move the hand from the pronated state to the supinated state. In the wrist adduction graph (Figure 11(b)), the zero lines represent the point where the hand is not making any angle with respect to the forearm in the horizontal position. Thus, the middle finger is pointing straight in this position. Radial and ulnar movements of the hand produce positive and negative values.

Comparing different sessions of a patient's hand therapy data.
5. Conclusion and Future Work
In this paper, we have proposed a therapy modeling framework which facilitates for the therapist to design a complex and high-level therapy by combining a set of primitive therapies. Using our modeling technique, each therapy can be mapped to a set of body joints and motions. We have also developed a mashup architecture where the two most popular joint and motion tracking sensors, that is, Microsoft Kinect and LEAP, can be combined into one framework to help us record a wide range of body gestures. A therapist can create the complex therapy, store it in an online store, and share it with his/her disabled patients where a patient can do those therapies online. The developed open source framework is web based, and hence a patient can perform the therapies at anytime and anywhere. We have developed analytics to parse the Kinect and LEAP generated joint and motion data to infer high-level therapy actions such as walking and moving an object. We have developed 3D game environment where each game represents a particular therapy actions and generates metrics and live plot after each gameplay. A therapist or an authorized caregiver family member can visualize live or statistical plots of the captured joint and motion data to observe the improvement of affected joints of the patient over time.
Our initial test results show that this e-therapy architecture has the potential to explore more detailed clinical data mining and analysis. We will be looking at the current shortcomings and feedback from the therapists to make the framework robust and looking forward to deploying it to real patients. We are at the stage of integrating more primitive actions so that a therapist can produce and make a wide range of high-level therapies. Also, eliminating environmental as well as hardware seen noise is another step that we are focusing since there can be many distractions in the real-life scenario when we deploy to the real patients.
Footnotes
Conflict of Interests
The author declares that there is no conflict of interests regarding the publication of this paper.
Acknowledgments
This project was supported by the NSTIP Strategic Technologies Program (11-INF1703-10) in the Kingdom of Saudi Arabia. The author would also like to thank Dr. Farooque Alwari, Dr. Saleh Basalamah, Ahmad Qamar, and Delwar Hossain of Advanced Media Laboratory of Umm Al-Qura University for helping in demo and usability testing.
