Abstract
With an increasing number of applications appearing, smartphones with powerful processors and a variety of sensors are ideal mobile devices at hand. On one hand, various applications which are able to provide personalization functions that provide the service of interest to users rely on gathering and analyzing the sensor data and other sensitive information. On the other hand, attackers can accurately classify activities of mobile users from these data. As a result, the risk of users compromising their privacy has risen exponentially. Mobile users currently cannot control how various applications handle the privacy of their sensor data. To address the issue, we present P3Android whose architecture can support applications' personalization functions in Android according to users' personae and use the concept of profile to prevent sensitive information from unauthorized access. Furthermore, we provide a privacy protection service for legacy apps. which can enforce different protection policies according to the risk level of apps. All of these cooperate to provide both personalization awareness and privacy protection in Android. Experiments show that P3Android is feasible and effective.
1. Introduction
Nowadays, smartphones take place of the traditional PCs in many different ways. In the meanwhile, they also bring unprecedented convenience to users and gradually transform the way how human beings live. For example, there are 1.43 million Android applications (apps. for short) in Google Play which are more than iOS apps in App Store according to the AppFigures [1]. Such large numbers of apps are increasing the competition pressure against developers and companies, which accounts for the importance to supply various personalization functions. However, the current implementation of personalization functions mostly relies on gathering and analyzing users' sensitive information, including system information, user data, device resources, and application data [2], which may threaten and violate users' privacy. Currently there are a lot of malicious sensor applications which detect the user's physical environment through a series of sensors and monitor the user's credit card number through the microphone. Moreover, they detect key vibration according to the acceleration sensor. PlaceRaider [3] constructs rich, three-dimensional models of indoor environments by leveraging mobile phone camera and sensor. Remote burglars can thus “download” the physical space, study the environment carefully, and steal virtual objects from the environment. Accelprint [4] arises from hardware imperfections during the sensor manufacturing process, causing every sensor chip to respond differently to the same motion stimulus. Moreover, utilizing accelerometer fingerprints makes it easy to track a user over space and time. Aviv et al. [5] demonstrated how to use the accelerometer sensor to learn user tap- and gesture-based input as required to unlock smartphones using a PIN/password or Android's graphical password pattern. Hasan et al. [6] proved the feasibility of leveraging transmission channels of the sensor data to implement remote attack. Substantial apps. collect the location information through location-based services. Analysis of these data infers sensitive privacy information about service recipients, such as their home locations, lifestyles, and health conditions and then targeted pushing business services and advertising and so forth [7].
Unfortunately, the privacy protect mechanism of Android itself cannot prevent the leakage of user data well. For example, the “all-or-nothing” feature of permission check model makes users unknown about which behaviors of an app. violate the privacy protection regulation. Though there have been some research works on privacy protect, such as fine-grained permission access control, isolation, policy enforcement, encryption, and security authentication, they almost care little for personalization requirements and, additionally, require users' much technique knowledge about system and privacy. Wei et al. [8] analyzed the Android platform to see how the set of permissions had evolved and found that system permission set tends to grow. And the growth is aimed at offering access to new hardware features rather than providing finer-grained permissions; a particular concern is that Android third party and preinstalled apps. do not follow the principle of least privilege, which prevents the system's permission review mechanism from playing a good role in privacy information protection. The malware DroidDream and its simplified version, DroidDreamLight [9], broke the sandbox mechanism to steal a large amount of data, which affects the safety of tens of thousands of users' privacy. Additionally, malicious applications could bypass the sandbox mechanism constraints due to the untreated permission transferring [10]. A sensor-based voice privacy theft attack named CPVT is presented in [11]. CPVT can be disguised as a normal Android app. and the attack process can be fully controlled by the attacker without the knowledge of the victim.
Research Question. How to balance the privacy protection and personalization support requirements of smart phone users at the same time? To cope with this challenge, we propose a new architecture which supports client-side personalization awareness privacy protection for Android and have implemented a prototype called P3Android. P3Android consists of two system services: personalization support service (PSS) and privacy protection service (PPS). PPS assesses each app.'s risk level through machine learning and enforces different privacy protect policies accordingly, for example, to simulate for sensitive data. PSS is responsible for providing personalization support, such as different skin and content display styles. It abstracts each user's personal profile from prediction results. Then, it provides personalization functions in system layer by modifying Android Java framework. P3Android takes care of privacy-sensitive access permissions, asks for and stores users' decision, and supplies a set of APIs to third-party developers who want to customize their personalized services. P3Android advocates a benign and secure programming environment, which eventually prevents sensitive user data leakage against distrust apps.
Research Contributions. The contributions of this paper are as follows:
Use the personae profile to support OS-wide personalization for Android applications; and learn users' personae profile automatically through support vector machine (SVM) model. Propose a new personalization awareness privacy protection architecture which can enforce different security policies according to apps.’ risk levels and provide several APIs for apps. developers to facilitate the secure use of users' privacy and better service according to the users' personae profile. Implement a prototype system named P3Android which consists of two main services and two apps. based on Android 5.0. Experiments show that P3Android can protect users' privacy and provide personalization awareness support to apps. with nearly negligible performance loss. P3Android is available (https://github.com/dongyangwu/P3Android/) and open sourced with LGPL which can be adopted by any Android platform provider.
The rest of this work is organized as follows. We discuss the related work in Section 2 and present the design of P3Android in Section 3. In Section 4, we describe the implementation of P3Android in detail and then evaluate its effectiveness and performance in Section 5. Section 6 concludes the paper.
2. Related Work
We classify prior related work into two categories: personalization support and privacy protection.
Personalization Support. Client-side personalization was proposed as a means for privacy protection [12, 13], which was mainly used in search services and targeted advertising. PersonisJ [14] implemented client-side personalization in Android, but it represented user models as hierarchical structure of contexts. For example, model users prefer museums from those they have visited which are cumbersome and difficult to get. And its personalization service cannot work with other services, such as location service. Davidson et al. [15] implemented the client-support personalization on Windows Phone OS, which similarly uses the concept of profile. However, Windows Phone only accounted for 2.7% of all smartphone shipments according to the survey provided by IDC in 2014 [16]. RePriv [17] explored personalization in the context of a web browser by building a user interest profile based on the user's browsing history. Khare et al. [18] studied the dynamic evaluation of user profiles for personalization of web services based on service usage log.
Privacy Protection. Alastair et al. proposed MockDroid [19], which allows a user to “mock” an application's access to resources and encourages users to consider the trade-off between the functionality and the disclosure of personal information. However, it cares little about those critical permissions (CPs) related to users' privacy; thus users will not mock them. Roesner et al. [20] proposed a model of user-driven access control, in which each type of user-owned resource will have a user-driven resource monitor. By capturing permission-granting intent embedded in access control component, the monitor can judge whether an access to the resource should be granted. Though it can automatically determine whether to grant the request permissions, many apps. may crash when they cannot get the request resources.
Additionally, there are also many research works about privacy protection which studied from the perspective of isolation [23], policy enforcement [24], encryption [25], and security authentication [26], and so forth. “Override” proposed in [27] can securely intercept raw sensor data requested by applications and either perturb it according to rules set up by the user or replace it with synthetic data before releasing it to the requestor. Reference [28] provides mobile sensor data that can be utilized by requesting applications while protecting the privacy of users. It proposes the privacy framework, IPShield, which allows users to define two lists of possible inferences which are known as the “Blacklist,” inferences that the user would like to prevent, and the “Whitelist,” inferences that the application is requesting. A detailed survey of privacy protection was discussed in [29]. Zhang et al. [30–32] described the need to protect the privacy of data in wireless sensor network and in web of things.
All the research works mentioned above consider either the privacy protection or the personalization support, which is not enough for the whole system privacy protection and users' personalization support.
3. Design of P3Android
P3Android provides two services, privacy protect service (PPS) and personalization support service (PSS), for achieving privacy protection and universal personalization support, which improve the overall privacy protection of Android OS in complementary way. Figure 1 shows the proposed P3Android's architecture.

P3Android's architecture.
P3Android encourages storing user data on mobile device which makes users regain full control of their privacy data and also declines the pressure of third-party cloud providers. To enable easy application personalization, universal personalization service profiles user's interests and preferences as personae, such as athletes or housewives. Personae approximately represent user's various paths of life. And personae can be extended through custom classifiers. In order to get user's interest profile, P3Android leverages the truth that all user data must flow through the operating system, so there are many excellent places to gather the data which can be trained to classify user's personae.
To supply universal support, P3Android modified the Android framework upon which apps. are built. Though there are so many widgets that can be composed to implement multifarious applications, we only focus on those that are frequently used by third-party apps. to display their contents. When users turn on the personalization service, these widgets can reembellish its contents according to the profile created above.
Though using personae helps limit information leaks, those legacy applications need not to follow these constraints. Therefore, P3Android proposed a dynamic permission checking model which can decide to grant apps. different permissions according to apps.’ risk levels. Three levels (high, medium, and low) are used, which are ranked by a Naive Bayes model. This risk rank process is completely automatic and needs no artificial participation.
3.1. Personae Profile
Personae profile is an abstract character of a user; it provides a way to declassify sensitive information of the user. In P3Android, we target eight kinds of personae. Each persona is represented by a support vector machine (SVM) classifier and trained on a manually created list of keywords characteristic to each persona profile. For example, the sports buff will represent strong interest in basketball or other sports matches and so on. Thus, we have collected multiple sports information from sites such as the “sports.sohu.com.” We choose SVM model instead of Naive Bayes model for its advantage of the recall rate and accuracy in general classification tasks [33].
P3Android supports the following personae: sports buff, entertainment enthusiasts, technophile, travel buff, business executive, medical staff, retiree, and homemaker. Certainly, this personae list is not enough and complete, and we encourage other researchers and third-party developers to extend the list through creating their own classifiers.
It is obvious that no user has interests that match an exact persona. So, it is reasonable for P3Android to assign a weight to any personae of a user; each weight indicates the likelihood that persona matches the user. PSS maintains a hash table dynamically for a certain user which is named the users' profile, and its key value is changing discontinuously according to personalization classifiers.
3.2. Personae Classifiers
Personae classifiers (PCs) can infer users' favorites or preference according to personalization signals; they are the key link to implement personalization support service. At first, personae classifiers have two design choices, monolithic and customizable. In the monolithic model, classifiers will be implemented as an operating system service which expands the TCB of Android OS at the same time and brings much difficulty in upgrading the classifiers. In practice, the build-in, default classifiers also cannot meet all demands required by third party. Therefore, we choose customizable approach which allows ensuring the extensibility and feasibility of P3Android.
Figure 2 shows the architecture of our customizable model for personalization support service. In this model, personalization signals subservice collects users' various signals and saves them in File 1 whose owner is system. PAPTokenizer, a tokenizer built in PSS, gets the signals and transforms personalization signals into a matrix needed by PCs. The matrix is saved in File 2 which is world readable, so PCs can read in the matrix and predict users' personae. The prediction results are saved in File 3 which is also a world readable file and can be read to reset the value of user profile. Obviously, all user data are operated inside the system in this model; the only exposed information is the signals matrix which is composed by a bunch of numbers, and it represents nothing when it is read by other apps.

Architecture of personalization support service.
3.3. Risk Scoring
Risk level means the potential threat of an app. running on Android OS. One of Android's main defense mechanisms against malicious apps. is a risk communication mechanism which warns the user about permissions an app. requires before being installed. However, the warning information displayed requires too much technical knowledge and needs users to consume much time to distill critical information. P3Android therefore supplies users with automatic risk assessment to improve risk communication for apps.
Nowadays, apps. are becoming over privileged, the majority of which want more permissions than normal functional need. The permission feature of each Apk file can be consequently learned for analyzing their severity of threat. In order to get large amounts of feature, we have collected nearly ten thousand Apk files from XiaoMi App market in March 2015 which can be classified into 17 categories, and each category contains at least three hundred. What is more, our dataset also contains 1245 unique Apk files [34] that are known to be malicious. From a practical perspective, these data may consist of some “redundant” packages for the reason that one developer may release many nearly identical apps. and many apps. are generated by automatic tools. So we decided to keep a single instance of the same developer in each category and did nothing for those automatically generated Apk files. The dataset finally contains 10672 apps.
P3Android computes apps.’ risk level by the modified PNB [35] (Naive Bayes with information Priors) model for its monotonic and unsophisticated advantages; in particular, it can differentiate between critical permissions and less critical ones, which makes malware apps. more difficult to reduce their risks by removing rare permissions. In this model, each app. in the dataset is represented by
To avoid overfitting, Beta prior
PNB aims to learn a Naive Bayes model with parameter θ based on three desiderata. One of them is that the function should be monotonic. But, as mentioned before, today's apps. request more permission; many permission features no longer meet this desideratum. For example, READ_PHONE_STATE is requested nearly by 90% apps. in our dataset. In this case, we ignore those 26 critical permissions mentioned in [35] because of the informative priors we give in advance. For others, we initially set
Additionally, in order to highlight the importance and discourage the use of CPs, we set
Nine critical permissions which are important to users' privacy.
3.4. Privacy Policy Enforcement
P3Android modified the permission check model of Android in order to strengthen the privacy protection. As Figure 3 shows, when an app. requests a sensitive permission, PPS will firstly query its risk level and then enforce dynamic permission policy (DPP) according to corresponding level. P3Android provides protection for the 26 critical permissions (CP set 1) and supports data structure simulation for the 9 most critical permissions (CP set 2), as shown in Table 1, which can read users' sensitive information. The protection policies corresponding different risk levels are as follows:
HIGH_LEVEL: P3Android supplies three access decision options, which are MOCKED, DENIED, and GRANTED, to 9 critical permissions in Table 1 and two access decision options, which are DENIED and GRANTED, to other 17 less critical permissions. MEDIUM_LEVEL: P3Android only supplies three access decision options, which are MOCKED, DENIED, and GRANTED, to 9 critical permissions in Table 1. LOW_LEVEL: P3Android uses Android permission check model (APM) directly.

Privacy policy enforcement model.
Additionally, we maintain two hash tables, RISK_STORE and APP_PERM_STATUS, in RAM to improve the efficiency of PPS and warn only once for every critical permission. If an app. requests these permissions next time, PPS will firstly query its status in APP_PERM_STATUS table to avoid warning too much:
RISK_STORE〈APP name, APP risk score〉: store the risk scores of all installed apps. APP_PERM_STATUS〈APP name, APP permission, permission-check-result〉: store the permission decision status (GRANTED or DENIED or MOCKED) for each request permission of each app. which belongs to 26 critical permission sets.
3.5. Mock Privacy Data
The sensitive resources or data are protected by Android's permission mechanism; specifically, an app. must request corresponding permissions if it wants to use certain resource/data. This mechanism runs silently in the background and is incongruous with the user perception; moreover, currently there are only two decision options, “DENIED” and “GRANT,” available for users.
MockDroid [19] has provided mock ways for privacy data; it copied the whole permission check mechanism and added its mock functions, which is complicated and intrusive. Different from MockDroid, P3Android only adds another decision option, PERMISSION_MOCKED, instead of the all mock permissions, such as READ_PHONE_STATE_MOCK, in MockDroid. P3Android will return a mock object of the actual privacy data when each sensitive permission's decision option is PERMISSION_MOCKED for a certain resource. In this way, P3Android can make full use of the existing permission mechanism and significantly reduce changes to Android source codes.
4. Implementation of P3Android
In this section, the implementation of P3Android is described in detail. The prototype is based on Android 5.0 whose source code was made available on 3 November 2014.
4.1. Personalization Signals
In order to get training dataset for PCs, PSS gathers signals from many different places making good use of its privileged position. However, there are two problems to be concerned. First one is what data stream should be collected; a good choice should neglect the performance loss and reduce the noise signals. The second problem is how to collect data; we can collect signals in two ways: one is gathering once data is created and the other is gathering at intervallic time.
In our implementation, we capture four distinct personalization signals concerning the efficiency and accuracy: SMS messages, emails, GPS data, and browsing history and bookmarks. Additionally, because this information is not being created all the time, and its generated time is not fixed, we log the information when signals are created.
SMS. SMSs consist of messages which have been sent or received. We just gather those sent SMSs because the received SMSs commonly represent other people's opinion or preference. Sending message is managed by the SmsManager class and is finally executed by a dispatcher. Considering the different mobile network operators users may select, we log the message text before identifying the network in ImsSMSDispatcher class.
Email. We also ignore those incoming emails. An email body is composed by subject, attach files, and content, and so forth, and among them the most valuable signal is the content text. Therefore, we interpose the sending operation in SmtpSender and Rfc822Output class to log valuable information.
GPS Data. GPS data are the commonly used sensor data in users' activity recognition. For example, the location information of multiple shops or stadiums may reflect a user's favorite activity. Therefore, we interpose the requestLocationUpdates method in LocationManager class to log valuable information. Due to the privacy mode we proposed in Section 4.6, the user may choose to provide “mock” GPS data; P3Android will return a random latitude and longitude regardless of the actual status of the GPS device.
Browsing History and Bookmarks. Browser now is an integral part in the lives of all Internet users. It brings various live information and novelties that meet customers' different tastes. Users always browse those topics and news they usually like and sometimes add them as a bookmark. Thus, these signals can be viewed as users' preference. In Android OS, there is a specific data controller, DataController class, to manage the web information. Due to the privacy mode we proposed in Section 4.6, we should not log signals when users surf the Internet in the privacy mode of browser. Therefore, we finally log signals in the place where history is updated. As for bookmark, the dealing method is the same as browsing history.
All collected signals are saved in the file named “/data/personalization/signals” whose owner is “system.” In other words, only the system user can read/write this file. Therefore, it ensures the security of user's sensitive information. Furthermore, by integrating those multiple data sources, P3Android can guarantee the accuracy of the whole predict process.
4.2. Persona Classifier
Eight sample SVM personalization classifiers (PCs) are implemented in one app., where one of PCs represents one persona and is trained offline due to the constraints of computing resource on mobile device. To get the appropriate training dataset for libsvm which is a main library to implement PCs, we use chi-square kernel algorithm to distill features and TF-IDF (Term Frequency-Inverse Document Frequency) algorithm to count term frequency.
For a certain term t, its chi-square value (
Therein, N means the total number of documents in training dataset; A is the number of certain kinds of documents (suppose K) containing t; B is the number of other kinds of documents except K; C means the number of documents which do not contain t; D is the number of documents which do not contain t or belong in K.
For a certain term t, we calculate its weight value (
Therein,
The detailed procedure of creating the signals_matrix file from news documents is shown in Algorithm 1.
(1) For each document d (2) For each term t (3) Calcute CHI and TF-IDF using formula (2) and formula (4) (4) Normalize the TF-IDF of all terms (5) Sort the terms with CHI value in descending order (6) get top 3000 terms and label them with unique ID (7) write 〈term:termID〉 into terms vector file (8) write 〈termed:TF-IDF〉 into signals_matrix file
Algorithm 1: Create signals_matrix and terms vector file.
We train eight PCs currently by calling libsvm with Linear Kernel and C-SVC model as parameters. After training, eight personae profiles are available. It is worth noting that steps in Algorithm 1 are finished offline and do not need user's interaction.
As shown in Figure 2, after the signals are transformed into matrix, PCs can read in the matrix, compute which persona each matrix belongs to, and finally output the results to a file. PAPTokenizer then read this file and reset user's profile value.
4.3. Universal Personalization Support
Universal personalization that we implemented in P3Android includes apps.’ skin, font color, and font size. For supporting skin and font personalization, we firstly estimate users' approximate age from personae profile; for example, retirees are considered as old and entertainment enthusiasts are considered as young. Old people often need bigger font size and higher contrast between colors than youth, so we modified the personalization functions where each view is rendering.
However, the real power of P3Android comes from exposing personalization APIs to third-party developers, which easily help in deploying content differently. We discuss here four of these API functions:
getPSSeviceStatus() returns the status of personalization support services. We allow users to toggle personalization on and off in Android Settings. Apps. can display their content with different styles according to the PPS status. getTopPersonae() and getTopPersonaeValue() return the most relevant persona and its value if PSS is on. This will present various functions and abilities based on developers' different explanations; even third-party advertising services can push suitable Ads. to different users accordingly. getPersonaeValueScale() returns the proportion of value of each persona, which can help display contents in proportion.
In addition, P3Android provides other APIs such as getPersonaeValueByDescOrder and getUserProbablyAge to third-party developers.
4.4. Risk Scoring
The procedure of risk scoring consists of four steps: collecting training data, training the scoring model, determining the distribution of risk scores, and automatic scoring of the risk of a certain app. in mobile device. The former two steps are described in Section 3.3. We emphasize here the latter two steps. It is important for the privacy protect mechanism to determine the suitable critical value of each risk level. We select other 1000 apps., 500 unknown and 500 malicious, to calculate the risk score using the trained Bayesian classifier; the distribution of their risk scores is as Table 2.
Distribution of risk scores.
There are 70% malicious and 76.2% unknown apps. whose risk scores are greater than 3.0; thus, we formulate them HIGH_LEVEL; similarly, we formulate 26.2% malicious and 12% unknown apps. whose risk scores are between 1.0 and 3.0 MEDIUM_LEVEL. However, these are just the default values; users who care more about their sensitive information can adjust them.
As shown in Figure 1, the risk scoring is implemented as system application (RSApp for short), which automatically rates the risk level against each installed app. and saves the score in XML file under the APP directory. If the PPS is enabled, the scores will be stored into a hash table named RISK_STORE〈APP name, APP risk score〉 in system RAM. Additionally, RSApp creates a BroadcastReceiver to detect if a new app. is installed for automatic risk scoring without users' interaction.
4.5. Privacy Policy Enforcement
P3Android provides dynamic permission policy (DPP described in Section 3.4) by adding two private methods, isPPSCheckNeed(pps, permName, pkgName) and ppsCheckPermission(pps, pkgName, permName), in Android PackageManagerService. The former determines whether to enforce DPP to an APP requesting certain system resource by querying its risk score. The latter enforces different DPP according to the risk scores. For example, for apps. with MEDIUM_LEVEL risk score, P3Android provides one decision of PERMISSION_GRANTED, PERMISSION_DENIED, and PERMISSION_MOCKED (described in Section 4.6) for 9 critical permissions in Table 1. However, for apps. with HIGH_LEVEL risk score, P3Android provides not only similar decision with MEDIUM_LEVEL but also one decision of PERMISSION_GRANTED and PERMISSION_DENIED for other 17 less critical permissions.
Similar to Linux Security Module, the privacy policy enforcement mechanism is integrated into PackageManagerService. Specifically, they are inserted in checkPermission and checkUidPermission methods.
4.6. Mock Privacy Data
Different from MockDroid [19], which imitated the whole permission check model, we only add a new permission status, PERMISSION_MOCKED, to original model and, thus, make less changes to original system.
PPS implements data simulation for nine critical permissions which may read users' sensitive information, as shown in Table 1. Among them, coarse- and fine-grained location information is generated from device sensor and networks; contacts, SMS, calendar, bookmarks are mainly managed by ContentResolver and ContentProvider and so forth; phone state contains some inherent information of each device; AccountManager manages users' various accounts information, while system log information must be gotten from logcat. For the reason of paper space, we mainly describe the mock process of coarse- and fine-grained location, contacts, and phone state in detail.
Coarse- and Fine-Grained Location. If users choose “MOCK” option when system shows the warning box, PPS will return mock location resolution level to LocationManagerService, in which we simulate a virtual provider, and it will generate a location object whose coordinates are random but reasonable.
Contacts. ContentProvider2 class directly executes the read/write task of contacts; it includes two modes, ContactMode and ProfileMode, which, respectively, perform contacts or profiles operation. We added a third mode, called MockContactMode, in which a virtual database helper is built to complete corresponding operations. As for SMS, calendar, and bookmarks, it is all about the same.
Phone State. We can get phone state from exposed interface of telephony service, including device id and SIM serial number. When read request is mocked, telephony manager will return a random fake value.
4.7. Services and Privacy Mode
All the functions except risk rank and personae classifiers are implemented as two services built in system framework: PersonalizationSupportManagerService (PSMS for short) and PrivacyAwarePersonalizeManagerService (PPMS for short). These services are in run mode in default after system booting. We therefore design a status manager module in Android Settings, which can switch on/off the status of privacy protect service, signals collection, and universal personalization, that is, enable/disable those services. In particular, P3Android will not log users' activities when switching off personalization signals subservice, which helps protect users' privacy.
PSMS provides logPersonalizationSignals method for collecting personalization signals and word2Matrix methods for profile classifier to form the profile of the user, which is updated by setPersonaeProfile method. PPMS enforces privacy policies through its queryAppRiskLevel method and PackageManagerService of Android.
4.8. Summary of P3Android's Implementation
P3Android is implemented based on Android 5.0 in a modular way. Table 3 shows the source lines of code (SLOC) that each module contains. It can be seen that the modification to Android framework is noninvasive (2890 SLOC only).
Break-up of P3Android's implementation.
Moreover, four personalization APIs introduced in Section 4.3 are only part of all; for highlighting the effectiveness of the universal personalization support, Table 4 shows the list of all personalization APIs with their features, which can be used by third-party app. developers.
All personalization APIs with their features.
5. Evaluation
In this section, we firstly evaluate P3Android in two test suits. One is privacy protection test suites which used to evaluate the abilities of our dynamic permission check model to prevent privacy leakages; the other is personalization support test suites which can confirm the efficiency and practicability of the PSS. Nowadays, there are nearly 270 cumulative malware families available in third-party Android marketplaces according to the report of Symantec's Internet Security Threat [36]. Among them, 21 percent families will steal users' device data and 22 percent families like spying on users. Thus, we choose four representative malware families, jSMSHider, GamblerSMS, SndApps, and ADRD, to compose the privacy protection test suites. Each test will be analyzed in a real-life scenario in which we will discuss the impact of certain malicious behavior on AOSP (Android Open Source Project) and how they are mitigated by P3Android. In personalization support test suites, we will use a customized RSS reader scenario to demonstrate the efficiency of PSS, and a commercial client-side personalized news recommendation scenario will confirm the practicability of using profile to support personalization.
Secondly, we demonstrate the feasibility of customizable model and the effectiveness of default PCs we designed. Finally, we compare the P3Android's performance with original Android system using two different benchmarks.
5.1. Privacy Protection Capability
The main malicious behaviors of four malware families are as follows: jSMSHider is a Trojan horse which collects device ID, SMSs, locations, and so forth and sends them to a remote server. What is worse, it attempts to get the root for installing other apps.; GamblerSMS is a spyware which can monitor every single SMS message received/sent from the phone and record every outgoing phone call; SndApps can lurk in host apps. to stealthily upload users' personal information such as email accounts as well as phone numbers to a remote server without user's awareness; ADRD can encrypt the stolen information and send it to remote locations, and it can also execute the commands received from control server. We select one sample from each family and analyze the following scenarios in detail.
5.1.1. jSMSHider Scenario
In this malware family, the app. we choose disguises as a love test app. to attract users and ask for permissions listed in the first column of Table 5 while installing. However, when jSMSHider is running on AOSP, it will gather a mass of sensitive information, including IMEI, IMSI, phone number, location, SMSs, and contracts, and send them to its command and control server.
The permissions requested by jSMSHider and their protected status.
On the contrary, when it is installed on P3Android, it must be firstly assessed by risk classifier and its risk score is 2.9 which belongs to MEDIUM_LEVEL. When a user opens this app. and tests his love, PPS will dynamically ask for his decisions when it requests permissions included in Table 1. If he does not understand why this app. requests these permissions, the user can choose the “MOCK” option. So all the sensitive information jSMSHider gathered is fake and generated by random, and therefore the user is free of its threat. The permissions P3Android can protect for jSMSHider are shown in Table 5, where “Used” or “Protected” column means whether the corresponding permission is used in app. or protected by P3Android, as done in Tables 6, 7, and 8.
The permissions requested by GamblerSMS and their protected status.
The permissions requested by SndApps and their protected status.
The permissions requested by ADRD and their protected status.
5.1.2. GamblerSMS Scenario
The sample we choose from this family is a spyware, requesting for multiple permissions listed in the first column of Table 6 during the installation process. From the table, we can find that this app. holds all the permissions to monitor user's daily life.
Once it is installed successfully on AOSP, users cannot find the icon on the home screen which may mislead them to believe that the installation failed. After they reboot the device, this app. will start a service in the background automatically, which will upload device ID, contacts, and monitor users dialing/SMS activities.
However, when installing on P3Android its risk score is calculated as 3.1 which belongs to HIGH_LEVEL. When a user attempts to make a phone call, a request-for-record warning will pop up and ask for his decision. He does not know why this app. wants to record his dialing, so he chooses denied option. Moreover, a similar warning is also encountered when he tries to send SMS, but he knows that sending message does not need to read contacts, so he chooses denying option likewise. Due to lack of corresponding permissions, this app. cannot complete the malicious behaviors; therefore, P3Android prevents user from being monitored effectively. The permissions P3Android can protect for GamblerSMS are shown in Table 6.
5.1.3. SndApps Scenario
In this malware family, the selected app. whose package name is “easybutton” is in disguise as a “gadget” to attract users. Some of permissions it requests are listed in the first column of Table 7.
When users launch this app. on AOSP, it will be automatically added into system startup items. What is more, this app. will collect and upload device ID as well as accounts information if users try to press the beautiful button.
But on P3Android, the risk score 1.5 of “easybutton” makes its secretly sensitive behaviors transparent to users. When one user starts up this app., the privacy protection service will firstly warn him that “easybutton” wants to read his phone state. Then he chooses mock option because he does not want to leak his true device information. However, a warning pops up similarly while he tries to press the button; he does not know why this app. intends to get his accounts and to know the results after pressing the button, so he chooses the mock option likewise. Therefore, P3Android help him avoid accounts information leakage. The permissions P3Android can protect for SndApps are shown in Table 7.
5.1.4. ADRD Scenario
The sample we select from ADRA family is a wallpaper game app. called “Steam.” Some of permissions it requests are listed in the first column of Table 8.
“Steam” appears normally when running on AOSP, and users can play the game randomly. However, it is not as good as it seems; “Steam” will stealthily upload infected cell phone's state information and open several system services. These services can record users' phone content and save it to external storage. And later, we can find that this app. will continually access the network state from logcat and upload the saved file to remote server if the device connects to the Internet.
Running on P3Android, its risk score is calculated as 3.5 and regarded as HIGH_LEVEL, thus, the system can supply “steam” with mocking device state and popping up a warning for users when they make a phone call. It is a strange action that a game app. wants to record users' phone content, so denied option is a good choice; therefore, P3Android prevents users from leaking their voice information. The permissions P3Android can protect for ADRD are shown in Table 8.
The above scenarios demonstrate the privacy protection ability of P3Android. We encourage users to mock the request data when running distrustful apps. or those with high risk score, which can protect themselves and experience apps. functions at the same time.
5.2. Personalization Support Capability
5.2.1. RSS Reader Scenario
To test the effectiveness of P3Android APIs, we develop RSS reader application called PAPRSS which pulls news from 8 news feeds and samples from them to fill the ListView to user.
At first, PAPRSS samples averagely from different news feeds. Then, after users enable PSS in Settings, this app. will query P3Android APIs and record their content according to the rate of each user personae value in Table 9, which is obtained from the eight PCs described in Section 4.2. Moreover, when the top persona is retired, PSS will consider the user as an old man and thus reembellish the PAPRSS, including adjusting the text font size to be larger and color contrast to be deeper.
Eight personae parameters obtained from eight personae classifiers.
PAPRSS demonstrates the effectiveness of achieving personalization function based on little information exposed. As mentioned before, users may have several kinds of interests. Thus, developers can take advantage of the profile file flexibly. PAPRSS also shows the privacy protection ability of P3Android because it needs not access the device information such as device ID, contacts, and sensors data. Further, a user can upload his personae profile to certain a cloud platform and use the profile in another Android device. Therefore, his privacy is secure because the personae profile only consists of numbers.
5.2.2. Commercial Client-Side Personalized Scenario
P3Android uses the concept of user profile to achieve the client-side personalization which is similar to the news recommendation function “My News” in “BBC News” (BN) [37], but BN users need firstly to add their interested topics manually before using this feature.
Different from BN, P3Android can predict users' interest topics automatically without manual interventions. For evaluation purpose, BN is reversely engineered to call the getPersonaeValueScale() and getPersonaeScale() functions of P3Android only in order to get profile of the user and provide news of his interest, which demonstrates the practicability of P3Android's personalization support.
5.3. Performance of Personae Classifiers
To demonstrate the feasibility of customizable model and the effectiveness of default PCs we designed, we test the efficiency, accuracy (Figure 4(a)), memory, and CPU usage (Figure 4(b)) of PCs on XiaoMi 4 smartphone. Different sizes (25, 50, 100, and 200, resp.) of test dataset are sampled from one user's SMSs, emails, and browser histories randomly and run 20 times for each. The test results are shown in Figure 4.

(a) Consuming time and accuracy (b) memory and CPU usage of personae classifiers.
We find that it averagely needs little more than four minutes to predict fifty signals and less than five minutes to predict one hundred signals. Additionally, the averagely accuracy is greater than 82 percent. Though the time is consumed over six minutes when the size of dataset is 200, we generally predict once every two hours or even long. In addition, the limited computing resources of smartphone reject the use of excellent word-segmentation algorithm, which accounts for those false predictions.
5.4. Overall Performance
We compare the performance of P3Android with original Android operating system using two benchmarks, CF-BENCH [38] and Geekbench 3 [39]. We test memory read, memory write, file read, and file write using CF-BENCH and stream copy using Geekbench 3, respectively; each test against P3Android and AOSP is run one hundred times and the average value is calculated. The results of the CF-BENCH are divided in two groups: native score and Java score; an overall score is given based on these two scores. Testing results are shown in Figure 5(a). As Figure 5(b) shows, the performance overload incurred by P3Android under CF-BENCH and Geekbench 3 is 2.4% and 2.9%, respectively; it is acceptable and worthy for users to get P3Android's privacy protect and personalization support. The main performance loss P3Android introduced comes from frequent queries of two hash tables in PPS and personalization signals collection in PSS, which involves memory and file read/write.

Performance test results of Android AOSP and P3Android.
5.5. Comparison with Other Android-Based Privacy Approaches
We compare P3Android with native Android and other systems which aimed to provide privacy protection for Android. The results are shown in Table 10. Except Apex, MockDroid, PMP, and P3Android can provide the mock mode for users' privacy data and detail feedback to users; however, MockDroid and PMP need manual configuration of users, while P3Android can determine the risk level of an app. automatically. Furthermore, P3Android provides a serial of APIs for supporting the personalization of apps and encourage a secure programming method for the third-party apps. to reduce the access to users' privacy data to the least extent.
Android-based privacy approaches in comparison.
6. Conclusions
This paper proposes an approach to support both apps. personalization in operating system level and privacy protection. Besides, the approach implements a prototype system P3Android on Android 5.0. P3Android mainly consists of two system services: PSS supports various personalization functions in system level, while PPS is responsible for the privacy protection to those legacy apps. These two services improve the overall privacy protection of Android OS in complementary way. P3Android provides eleven APIs for third-party developers to use PSS and PPS, and experiments show P3Android can protect the privacy data of users and support the personalization for applications with the nearly negligible performance loss compared with the original Android. Furthermore, P3Android is extensible and open sourced and we hope it will be a community platform for related research. In future work, we will focus on the integrity of personae sets and more efficient algorithms for classification.
Footnotes
Competing Interests
The authors declare that there are no competing interests.
Authors' Contributions
Hongliang Liang and Dongyang Wu proposed the original idea of P3Android and construct the core algorithm. Dongyang Wu and Shirun Liu designed and implemented the PPS and PSS service; Hongliang Liang and Haifeng Liu conceived the experiments; Shirun Liu and Hao Dai performed the experiments and analyzed the data. Hongliang Liang and Dongyang Wu wrote the paper.
Acknowledgments
The authors would like to thank Weidong Fang, Cheng Cheng, and Guangyuan Li for providing valuable feedback on their work. This work was supported in part by the National Natural Science Foundation of China (NSFC) under Grant nos. 91418206 and 61202082.
