Abstract
File pollution is a recent security threat to peer-to-peer (P2P) file sharing systems. By disseminating numerous polluted files with mismatched or partially tampered contents in the P2P system, the attacker causes users to download unexpected files. This attack is aimed at frustrating users and making them abandon the system. Present researches on combating file pollution have mostly focused on pollution modeling or evaluating the extent of pollution. Only a few researches have proposed effective methods to eliminate pollution attacks, and they are primarily based on reputation systems and blacklisting mechanisms. However, these methods require exchange of significant feedback among the peers in order to identify the malicious peers or polluted files in the system. In this paper, we describe the application of fault-tolerant mechanism used in the redundant arrays of independent disks system to suppress file pollution attacks based on the concept that P2P file sharing systems currently have global file storage systems. We have extended the previously developed Fluid Model to analyze and evaluate the proposed antipollution mechanism. The model accuracy has been demonstrated by performing several simulation experiments; the proposed mechanism could effectively suppress the pollution and successfully decrease the polluted-time exposure of a P2P file sharing system by approximately 40∼60%.
1. Introduction
A file sharing system is an important application in peer-to-peer (P2P) networks; it enables files to be shared with high efficiency by allowing parallel downloads of the duplicated file pieces that are distributed across several peers. The robustness of P2P networks is dependent on the total system capacity; as the total system capacity increases, the number of duplication times of resources in more peers also increases, thereby leading to more fault-tolerant systems. Since the release of the Napster system in 1999, P2P file sharing systems such as Kazaa and BitTorrent have become some of the most popular systems on the Internet. The shared contents encompass a wide variety of MP3 songs, movie films, software packages, and games, and they have motivated numerous network citizens (Netizens) to install P2P file sharing packages because of the speed of their sharing and diversity. For example, in the case of Kazaa, which is one of the most popular P2P sharing software packages, Good and Krekelberg [1] pointed out that there were more than 3 million users in 2003 with the shared file size amounting to almost 5000 TB. Today, P2P file sharing flow has become the most important source of network traffic on the Internet [2].
However, among the many people who join the P2P file sharing system, not all of them have an understanding or appreciation of the associated security risks. This makes the P2P file sharing systems easy targets for malicious attackers. Many types of security attacks have already been identified; these include churn attacks, poisoning and pollution, Sybil attacks, worms, malware, and cheating in P2P massively multiplayer online games (MMOGs) [3]. Pollution attack is a major recent security threat that was first proposed by Liang et al. [4] at the INFOCOM in 2005. Pollution attacks occur primarily due to counterattacks against the illegality of fast and quantitative sharing and dissemination of copyrighted materials [5]. Many musical, movie, software, and publishing industries have suffered significant revenue impacts due to the popularity of P2P file sharing software. Hence, the impacted companies have taken a series of countermeasures [4] including prosecuting instances of copyright infringements by companies and individuals and performing file pollution attacks that significantly reduce the file usability by destroying or tampering file titles, metadata, or contents, thereby making users frequently download unexpected files and, hence, be frustrated and quit the system.
The decentralized management architecture adopted for the sake of scalability and autonomy has made the P2P file sharing system vulnerable to security threats. Moreover, the high efficiency of the file sharing feature leads to pollution attacks [6, 7] and the fast disseminating of other malware and viruses [8] across the networks. In 2005, [4] showed that 50%~80% of the shared files in Kazaa had already been polluted; in 2006, other research [9] demonstrated that roughly 50% of the popular files of other well-known file sharing software, eDonkey, had been polluted.
It is anticipated that the researches and developments of legal downloading mechanisms [10] may finally cause copyright owners to stop the file pollution attack measures taken against the P2P file sharing systems in the name of copyright protection. However, file pollution may continue to be a popular method taken by other malicious attackers to threaten the security of P2P file sharing systems. Hence, in this paper, we focus on the suppression and avoidance of the dissemination of polluted files in P2P file sharing systems. Most current methods have adopted the reputation system [6, 11, 12] or blacklisting mechanism [13] to achieve such goals, but they usually require user feedback information to identify the malicious peers in a system or polluted files. Hence, they have the following issues.
A user has to first occupy the network resource and download the files totally or partially before knowing the contents that have been polluted. A user, after having downloaded some files, will not typically check immediately whether the files are polluted or not; this allows the polluted files in the sharing folders to be shared and disseminated further. Feedback information may easily suffer from tampering or forging and thereby cause many additional security issues.
Therefore, in this paper, we have attempted to develop an effective antipollution mechanism without using user feedback. Our method has been developed on the basis of the idea that a file in a P2P sharing system can be transferred after dividing it into segments and then reassembling into a complete file when all the segments are available. We have used a fault-tolerant mechanism adapted from the EVENODD coding scheme [14] that is used in the disk storage technology. This mechanism tolerates disk failures in the redundant array of independent disks (RAID) system effectively and efficiently—each polluted segment is considered to be a failed disk, and with proper rebuilding, a tampered file can be recovered correctly. The degree of tolerance depends on the number of segments into which a file is broken, and a balance between the number of segments and file reconstruction efficiency can be reached after some experiments. Hereafter, the EVENODD coding scheme will be abbreviated as EVENODD, and its operating principle and the reasons for using it will be described in the following sections. This proposed mechanism is capable of avoiding and eliminating the polluted file segments without using user feedback information or changing the P2P network structure. The Fluid Model (FM) [15] has been extended in this paper to construct a model that is able to analyze and evaluate the proposed antipollution mechanism. Moreover, by using simulation experiments, we have demonstrated the accuracy of the proposed model. We have shown that the proposed mechanism can effectively avoid pollution and decrease the polluted-time exposures of P2P file sharing systems by 40~60%.
The rest of this paper is organized as follows. Section 2 discusses the background and related researches that have already been conducted to counter the P2P file pollution. EVENODD is introduced in Section 3. Section 4 describes the system architecture and operations of the proposed antipollution mechanism. The deduction of the mathematical system model is presented in Section 5. Section 6 describes the evaluation results. Section 7 presents equations to amend the system model. Finally, Section 8 summarizes this study.
2. Background and Related Work
The contents that are shared, legally or illegally, on a P2P network are mostly music, movies, documents, and software packages; each of such contents has a 〈Title〉, and further, because of the different content attributes, a title has many different 〈Versions〉. For instance, for a piano etude, there may be several versions corresponding to different performers, while for an MP3 song, there may be different versions comprising different encoding rates. When these versions are disseminated on a P2P network, the file owned by the peers that acquire that particular version is called the “Copy of this Version.”
A user who wishes to download the Copy of a Title from the P2P file sharing system first issues an enquiry to the system. For example, a user who is interested in a song with the 〈Title〉 “Hey Ya” makes an enquiry to the system using the string “Hey Ya.” The system then responds to the user with all the versions available in the P2P file sharing system and the number of quantities and locations of the copies of each version. The user chooses to download a suitable version according to his preference. The file is downloaded and saved in the user's sharing file folder to be downloaded by other peers in the system. In this manner, the file is disseminated very quickly in the network.
2.1. File Pollution Categories
To initiate a pollution attack, a pollution attacker first produces a polluted version by adopting polluting techniques such as quality reductions or content alterations of the target file. The polluter subsequently injects this modified content into the system on his peer or on several peers under his control and, by using the title of this polluted version, makes an announcement to the system that is visible to other peers and downloaded as per the requirement. As a result, while searching for a certain title, a user may be confronted with a polluted file and an authentic file. Since it is not possible to verify in advance whether the content has been polluted or not, an unaware user may choose to download the polluted version. Moreover, the user usually may not check the downloaded file that is saved in the sharing folder immediately, which causes the polluted file to be disseminated faster and further in the P2P file sharing system.
In the strategy adopted by malicious attackers, the pollution on a file can be categorized into three types [4, 9].
(1) Content Pollution. Content-modified decoys are created for users to download, which then disseminate in the system. A decoy is created by adding certain noises that cannot be decoded into meaningful content in a target file or by changing the content into that of a different file but retaining the metadata of the original. Since it is difficult to determine whether the file content has been polluted before downloading, this method represents a very simple and effective polluting approach. The suppression and elimination (removal) of these types of pollution are the primary goal of this paper.
(2) Metadata Pollution. In this type of pollution attack, metadata such as the file name and type of a targeted file are replaced with those of a different unrelated file. This causes a user to choose and download a file containing content that does not totally conform to expectations.
(3) Index Poisoning. In a P2P network framework that possesses the capabilities of file searching, for example, a P2P file sharing network based on distributed hash tables, a pollution attacker causes a nonexistent file to be shared on the network by uploading bogus file records such as the IP address of the file sharing person, port number, and file hash value to the supernodes that are responsible for maintaining these records in the network or by modifying the file records.
2.2. File Pollution Studies and Counteractions
The measurement study conducted by [4] focused specifically on Kazaa, the most popular file sharing system, and developed a crawler that could quickly retrieve the supernodes in the Kazaa network; subsequently, from the raw data on popular music collected by the crawler, statistics and observation studies were conducted on the file pollution status. Their results showed that the pollution was very serious; in the Kazaa file sharing system, more than 50% of the popular copies of musical songs were polluted. Subsequently, many researchers have dedicated themselves to conduct measurement studies to examine the extent of the impact of pollution attacks on the P2P network. For example, Liang et al. [9] also developed an effective methodology to estimate the amounts of index poisoning and pollution in the file sharing system; they used data-harvesting platforms on FastTrack and Overnet to collect the required data. Christin et al. [5] provided a measurement-based analysis to inspect the impacts on content availability due to pollution and poisoning in the P2P network. Dhungel et al. [16] conducted experimental measurements to analyze and study the pollution in P2P live video streaming systems.
Other researchers have constructed analysis models to facilitate the analysis and evaluation of the impacts of pollution on P2P networks. Dumitriu et al. [17] performed analytical modeling against file-targeted attacks and network-targeted attacks in order to investigate the resilience demonstrated by a P2P file sharing system when fighting against denial-of-service (DoS) attacks. The study of Shi et al. [18] presumed that the file-targeted attacks and network-targeted attacks have close and inseparable relationships and, in this manner, proposed a unified model for these two attacks. Thommes and Coates [19] inspected the viruses in P2P networks in addition to the polluting behavior and adopted an epidemiological approach to explore a dynamic model capable of describing viruses and pollution. Similarly, from the concept of an epidemic model, Gu et al. [20] proposed a model of reputation-based approaches, which could simulate and evaluate pollution, in order to study the dissemination of a shared file in P2P networks. Lee et al. [21] investigated user behaviors by using questionnaires and then employed the results as a basis to explore an analytical model in order to study the dynamic properties of pollution in P2P networks. In the study of Yang et al. [22], a modeling framework was constructed to analyze and study the pollution in P2P live video streaming systems. Kumar et al. [15] developed the FM for pollution proliferation in a P2P file sharing system; in this paper, we have extended a small part of the FM to construct a model that is able to analyze and evaluate the proposed antipollution mechanism.
Thus far, few researchers have focused on developing pollution countering (measures), and most of them have adopted a reputation system or a blacklisting mechanism. Credence [12] is a well-known reputation system for fighting viruses in P2P file sharing systems. By employing a simple network-wide voting scheme, Credence enables a user to conduct positive and negative appraisals on object contributions in P2P file sharing systems; moreover, it also allows a client to perform weighting on his vote based on the statistical correlations between himself and his peers. In Dong et al. [23] and Dong et al. [11], a P2P antipollution file sharing system was constructed based on object reputations, but employing different building processes. They proposed a pollution propagation model that first considered file objects, then weighted votes by calculating the vector space similarity, processed the data sparsity problem by using the Horting graph-theoretic approach, and finally adopted a self-adaptive reputation threshold to judge the file authenticity. On the other hand, Costa et al. [6] proposed a P2P antipollution file sharing system, named Scrubber, that was based on peer reputations. In comparison to Credence, Scrubber is a distributed and decentralized reputation system inside which peers mutually designate the reputations to verify and differentiate malicious users who disseminate polluted contents on the network. In addition to the imposition of severe and speedy punishments on the content polluters, Scrubber also includes a rewarding mechanism to recover a peer's reputation. However, the above-mentioned proposals cannot rapidly react to the dissemination of polluted contents. Cai et al. [24] proposed a holistic mechanism to defend against pollution attack. Besides using a general reputation model, the mechanism gathers extra inherent file-related information for peers to identify the potential pollution without completely downloading the requested content. Barcellos et al. [25] developed a reputation-based pollution control strategy in which the dissemination rate of a content is limited according to the reputation of the version. In this way, the polluted content is eliminated before a peer vote on the version negatively. Shin and Reeves [26] presented an antipollution scheme called winnowing. It reduces index pollution by publish message verification and user feedback mediation in DHT-based P2P system.
Liang et al. have proposed certain countermeasures based on a blacklisting mechanism after conducting many measurement studies on P2P network pollution [13]. They had developed the abovementioned crawler that crawls an entire P2P network and collects the metadata of all the shared files for offline analysis. In their automated version-checking procedure, a set of nodes of very probable polluters are verified and added to a blacklist. Further, another study had focused on countering index poisoning attacks in P2P systems; a similar method was followed in [13] via a step named harvesting. In harvesting, massive information relevant to the versions in a P2P system is collected, and the poisoned, polluted, and clean versions are verified; bad sources are then recognized and added to a blacklist. It is possible to suppress the pollution spreading in P2P file sharing systems without downloading the data shared from the blacklisted nodes. However, in blacklisting mechanisms, massive amounts of data have to be collected on a P2P network, and it is not affordable to normal individual users. Moreover, since the peers in a P2P network join and leave the system frequently and unexpectedly, the data search procedures have to be conducted often to prevent the collected data from becoming out of date, and this significantly increases the network overheads.
3. EVENODD Coding Scheme
Reed-Solomon (RS) codes [27] and EVENODD are two representative forward error correction (FEC) codes widely used in telecommunication and information theory, such as data transmission and storage systems. In the RS codes, an update to a single information bit requires an update in all the parity symbols and affects numerous bits in each symbol. As a result, the RS code update operations incur large computation overheads. Unlike RS codes, EVENODD requires only cyclic shifts and XOR operations, which therefore achieves optimal redundancy with considerably lower computational complexity. Furthermore, EVENODD can tolerate up to two errors or erasures of bits (disks) and is more practical than RS codes.
The development of coding schemes remains an active research area. Since the EVENODD was proposed, several schemes have been proposed, such as the remote desktop protocol (RDP) scheme [28], X-code [29], B-code [30], and generalized EVENODD code [31]. These schemes are more efficient than EVENODD, and some of them can tolerate more than even two erasures. However, most of these schemes are variations or extensions of EVENODD, and hence, they are more difficult to implement due to the high complexity for achieving high performances. In this study, we aim to find a fault-tolerant mechanism to reduce the pollution attacks in P2P networks; developing or looking for high-performance coding schemes is beyond the scope of this study. EVENODD is both efficient and simple and hence is suitable for the purpose of this study. In this section, we will briefly introduce the EVENODD encoding scheme. For more details, please refer to [14, 32, 33].
3.1. Encoding Procedure
In a RAID, in order to tolerate the simultaneous damages to two disks, EVENODD requires
Here ⨁ represents an XOR operation. Equation (1) performs an XOR operation on the horizontal block of the original data disk array. The calculation of the parity data saved in disk
Before calculating the S value, the 0th column of the original data array is removed to make it a square array. XOR operations are then performed on the data from the lower-left to upper-right diagonal blocks in the square array to obtain the S values; subsequently, all the blocks of the original data array are shifted one grid to the left in order to obtain another diagonal line. An XOR operation is then performed on the blocks of this diagonal line to obtain a value, and a single XOR operation is again performed on the just calculated value and S values. The results are saved on the 0th block of the

An EVENODD encoding example.
3.2. Decoding Procedure
If we consider the situation in which only one disk crashes, the lost data can be easily recovered by using (1), (2), and (3) for reversal computations. When two disks crash simultaneously, 4 situations must be considered. Let us assume that both disks i and j crash, where
3.2.1.
,
The damaged ones are the two disks with parity data saved on them. The damaged parity data can obviously be recalculated by using (1), (2), and (3) after the new disks are installed.
3.2.2.
,
One of the two damaged disks retains the original data while the other disk m retains the horizontal parity data. The parity data of disk
3.2.3.
One of the two damaged disks retains the encoded original data while the other disk
3.2.4.
Both the damaged disks retain the encoded original data. To recover, first use the following formula to obtain the S value:
Designate the first recovered block and set s to Let Let
For simplicity, the exact meaning of the application of each formula is not explained in this paper; for more details on the recovery processes, please also refer to the examples in [14, 29, 30].
4. System Architecture and Operation
This section describes methods to employ the fault-tolerant capabilities of EVENODD to avoid file pollution in P2P systems; further, the overheads imposed by the scheme for producing redundant data with respect to the network and storage space are briefly analyzed.
4.1. EVENODD for Antipollution P2P File Sharing
In P2P file sharing systems, in order to efficiently share files, a file will usually be split into several pieces by the P2P file shareware before being placed for sharing [34–36]. Such practices have motivated the adoption of EVENODD to solve pollution problems in P2P file sharing systems. In order to conform with the EVENODD data slicing criteria, a shared data file must be split into m uniformly sized pieces, where m is a prime number.
Figure 2 illustrates an example of the operational details of the proposed mechanism in which five peers—A, B, C, D, and E—network over a P2P file sharing system. As shown in Figure 2(a), A, B, and C have copies of file f, which they all provide to the other peers for downloading. A and C provide good copies, while B provides a polluted copy. Prior to sharing, all of these copies were split into the same number of pieces; in this example, all were split into 5 pieces

An example of the operational details of the proposed mechanism.
4.2. Networks and Storage Overhead
After the original data file is successfully split, the previously introduced encoding procedure is employed to produce both the horizontal parity and diagonal parity pieces. As a result of these extra slices, this file has a certain degree of fault-tolerant capabilities. For example, Figure 3 shows an example in which an original data file is split into twelve pieces (A1~D3) that are clustered into four groups (e.g., A1, A2, and A3 are a group). Two pieces of parity data will be created by EVENODD for each group, for example, the two pieces of

Example of dividing a file into 12 pieces and 4 groups.
The generated parity data pieces cause a file to occupy more storage space. Assuming a file is split into m pieces, because the generated parity data have a fixed number of two pieces for certain pieces, that is, a group, the higher the m value, the smaller the extra storage space. Figure 4 shows the ratio of the generated parity data size to the original file size as m increases. The extra storage overhead does not reach more than 66% (i.e., the least splitting case when

Number of split m and extra storage space.
After the EVENODD encoding, a file is disseminated with a certain fault tolerance on the network. Theoretically, when the shared file size increases, it produces additional network traffic overheads. However, because EVENODD tolerates up to errors in two file pieces, a peer need not download all the pieces of a file to obtain the entire file. The missing two pieces can be recovered by merely applying the EVENODD decoding procedure, as shown in Figure 5.

File recovery using EVENODD decoding procedure.
4.3. Fault-Tolerant Capability
The fault-tolerant capability of a file depends on the total number of pieces split as well as the number of pieces in a group. In the same group, EVENODD could tolerate up to two faulty pieces irrespective of the number of pieces. If a group contains more pieces, the number of groups reduces, and hence, the total number of faults a file can tolerate also reduces. Figure 6 shows the relationship between the number of split pieces and the fault-tolerance capability. For the sake of efficiency, a reasonably small piece size is typically used to split a file in P2P file sharing systems; for example, in the BitTorrent protocol specification [34, 37], the default piece size is 256 KB, but sizes of 512 KB and 1 MB are often observed. FastTrack adopts even smaller piece sizes of 64 KB [35], while in the eMule protocol specification [36], a file piece is called a chunk and has a large size of 9.28 MB; each chunk is then further split into blocks of 180 KB. If a piece size is 512 KB, a movie film of approximately 1 GB comprises approximately 2000 pieces; the fault-tolerance rate for such sizes is not shown in Figure 6 in which only files split up to 97 pieces have been shown (the maximum prime number that is less than 100). In fact, when the number of file pieces reaches 2000, the fault-tolerance rate of the EVENODD will be only 0.1%; this is nearly close to no fault-tolerance capability. One solution for such big files is to split the file first into a reasonable number of blocks and then perform EVENODD encoding, respectively, against each block; however, this could also still enhance fault-tolerance rates. Of course, the space and time required for performing such methods also lead to other issues.

The fault-tolerant rate of EVENODD.
4.4. Tamper Detection
Traditional error detecting codes, such as parity checks and cyclic redundancy checks (CRCs), produce redundant parity bits from the data. These bits are then embedded into the data, and operational data errors are detected by error detecting codes and corrected by forward error correction (FEC) codes. However, if security is not considered, traditional error detecting codes cannot fight purposely devised malicious attacks. In this study, information hiding techniques [38, 39] have been employed for detecting the polluted pieces of files; note that the information hiding concept has been used in many fields, for example, secure communication and multimedia security.
5. System Modeling
In this paper, we have expanded the FM developed by Kumar et al. [15] for the pollution proliferation of P2P file sharing systems and then deduced a model for properly analyzing and evaluating the pollution-avoidance mechanism proposed in this paper. A brief introduction of FM will be given in this section and the model will be deduced. The various parameters involved in the following modeling and evaluation are summarized in Table 1.
Summary of parameters involved in the modeling.
5.1. Fluid Model
The primary goal of FM is to develop a model for P2P file sharing systems; this model should be able to observe, analyze, and evaluate the pollution proliferations and to provide important references for the future designs of antipollution mechanisms. Moreover, it can be used to model and analyze the proliferation of polluted versions and good versions of a title in P2P networks.
Two types of peers are defined in FM: attacking peers that inject polluted copies into P2P systems and benign peers. Let M be the number of benign peers wishing to acquire a title copy; there are several versions of this title, and for each version, there are several available copies and some of them are polluted. The following reasonable assumptions have been made in FM.
When a peer obtains a good version of a title, it stops its search. If a peer inspects and confirms that a downloaded version has been polluted, it will delete the version and immediately search again for the said title. If a peer obtains a good version, it will provide this version without any deadlines to other peers on the P2P network for their downloading. All the peers are homogeneous and exhibit identical behaviors.
In summary, in the FM, there are three types of peers at any instant of time: (1) peers that have a good copy, (2) peers that have a polluted copy, and (3) peers that have no copy. Moreover, any one of the M peers will have a maximum of one copy at any time, regardless of whether it is good or polluted.
In the FM, the inspection time is the time counted from the moment when a user issues a download request to the moment when the user finishes inspecting the fully downloaded file. The average inspection time of a peer is expressed as 1/μ, where μ is the inspection rate of a peer, or, in other words, the frequency of the peer deleting a polluted version and issuing a new search request.
Let
After a user has enquired for a certain title, he will receive a series of the versions and the number of copies available for each version of the title on P2P networks. Without considering the user behavior for the selection, in general, the probability of a certain specific version υ being selected could be modeled as a function of the number of copies of each available version in the system as follows:
The pollution proliferation modeling of the FM could begin from the two extreme cases of the selection distribution
Let N represent the number of polluted versions that have been placed in the network by an attacker, and let us assume that N is invariable with time. Further, assume that there are M benign peers who all wish to obtain good versions of this title. The FM uses the discrete-state Markov process approach to analyze the pollution proliferation. Figure 7 is such a Markov state chart wherein x and y represent the peers with uncontaminated and polluted copies, respectively, and state transitions occur when a user inspects a downloaded file. As shown in Figure 7, a system can transit from state

The Markov state transitions for the Copy Centric Model.
If the rate of inspections is
In order to reduce the complexity of obtaining the probability distribution of time from any initial state to the expected state
When a peer either without a copy or with a polluted copy has downloaded a good copy, the number
Similarly, when a peer without a copy downloads a polluted copy, the number
5.2. Modeling with File Pieces
In P2P file sharing systems, assume that there is one version of a shared file, which has been split into m number of pieces. At time t, the probability of a peer downloading a piece of the polluted copy is given by
However, deducing a new
This expression includes the cases from “all downloaded pieces are polluted,” “only one downloaded piece is not polluted,” and “two downloaded pieces are not polluted” to “all downloaded pieces are not polluted.” On the other hand, to a peer that has always downloaded the polluted piece, when it requests again and then downloads a good copy, the speed of increase in
5.3. Modeling with EVENODD
In order to simplify the EVENODD modeling process, assume that a file is split into m pieces and the file, after being encoded, will have two pieces more than the original file. First, consider the probability that “a peer that owns no any piece downloads more than three polluted pieces,” which is a series of permutations and combinations and is expressed as
Now, let us consider the case in which a peer owns a polluted copy and deduce the probability
6. Evaluation
In this section, we present our evaluation results of the antipollution mechanism proposed in this paper. The evaluation was performed using two approaches—by modeling and by simulations.
6.1. Evaluation by Modeling
It is difficult to estimate the distributions of

Pollution proliferation without EVENODD.
Similarly, it is difficult to estimate the distributions of

Pollution proliferation with EVENODD.
Although there is a large difference between the convergence times in Figures 8 and 9, the curve patterns of these two figures are similar. The initial values of both the environment parameters
6.2. Evaluation by Simulations
The evaluation results in the previous section are obtained from the equations with many postulations. In this section, we describe the simulation approach that was used to verify the accuracies of the obtained results. Several simulation experiments were conducted to evaluate the use of EVENODD in suppressing and eliminating pollution attacks in P2P file sharing systems. In the experiments, PCs (3.0 GHz processor; 2 GB DDR II 800 RAM) were used, and Visual C++ was selected as the programming tool.
6.2.1. Behavior of Nodes
In P2P file sharing systems, a peer must act and download before possibly receiving any polluted file. Therefore, the focus of this simulation experiment was on peer behaviors. General impact factors such as network traffic, framework, and search time that are considered in the simulation experiments of P2P file sharing networks were not be considered in this experiment. The conditions and assumptions of the simulation environments of this experiment were as follows.
The number of peers that share a complete and healthy file on the network, the number of benign peers that demand this file, and the number of attacking peers are fixed and maintained constant from the beginning of sharing the file. None of the peers leave the network. A peer that has successfully downloaded a certain file definitely shares all the pieces of this file with the peers that have not completed downloading.
These assumptions are in accordance with the modeling criteria described in the previous section. In addition, the procedure followed by a benign peer for requesting a file is fairly simple and is as follows.
Select a file piece that is still missing. Choose a peer that owns the piece and will provide the benign peer with the piece. Check whether the requested file has been completely downloaded, and if so, stop selecting pieces and peers.
6.2.2. Simulation Results
On the basis of the conditions and assumptions and setting the initial vales of

Pollution proliferation without EVENODD.
Figure 11 presents the experimental results from the EVENODD procedure on a P2P file sharing network; the number of polluted copies decreased fast after the first day, and on the 4th day, all the peers in the P2P file sharing network obtained a good copy of a file. In comparison with Figure 10, this experimental result confirms that the usage of EVENODD enables the efficient suppression and removal of file pollution in a shorter time on a P2P file sharing network; the convergence speed is approximately 40%~60% faster.

Pollution proliferation with EVENODD.
7. More Evaluation by Modeling
The accuracy of the model developed in Section 5.3 can be demonstrated by comparing the modeling evaluation results with those of simulation in Section 6. In this section, additional evaluation of the proposed model mechanism is conducted. In this further evaluation, two kinds of network scale settings are used: a large-scale network, as described in Section 6.1, in which
7.1. Amending Equations
In Sections 6.1 and 6.2, the initial environmental parameter configurations were different; they were for large-scale and small-scale networks, respectively. In order to understand the precision of the experiments described in Section 6.2, the settings in small-scale network are used to redo the calculations in Section 6.1. The results are shown in Figure 13.
From Figures 11 and 13, which are the evaluation results of the model and simulation experiment, respectively, the pollution situations can be observed to be continuingly decreasing from the beginning of the first day. However, the simulation results show that the file pollution in the system was totally eliminated by the 4th day, while the model results show that the elimination happened only by the 16th day. Similar results were obtained when comparing the results of Figures 10 and 12, and here, the differences in the durations of the pollution convergence were even larger. This implies the deduced model differs from the practical one. The reasons for such a situation may be obtained by examining the following loopholes existing during the model deduction.
In the model analysis, time is discrete (day by day). As a result, each peer sees the previous day's distributions of the copies in the P2P file sharing network. If a peer has downloaded a polluted piece, the entire file containing the piece will be considered as polluted regardless of the presence of other unpolluted file pieces, and this peer will also be considered a polluted peer. Similar to (2), as long as a peer has downloaded a piece from a peer that is believed to be polluted, it will be seen as a polluted peer, even if the downloaded piece is a good one.

Pollution proliferation in a small-scale network without EVENODD.

Pollution proliferation in a small-scale network with EVENODD.
In order to deduce a model that is more realistic, a new assumption will be added—a peer will delete all the polluted file pieces that were previously downloaded before requesting a download again. Under this assumption, the probability that a peer will obtain a polluted piece is modified as follows:
Based on the modified assumptions, the evaluations of Sections 6.1 and 6.2 are performed again to yield the results shown in Figures 14 and 15, which are the evaluation results of a P2P file sharing system adopting and not adopting EVENODD, respectively. In these two figures, both the results of model evaluations and simulation experiments have been deliberately combined for comparisons, and from them, it is obvious that the model with the modified assumptions has a curved shape similar to the simulations of the practical situation.

The pollution proliferation without using EVENODD after modifying the assumption.

The pollution proliferation that uses EVENODD after modifying the assumption.
Furthermore, the assumption in the Fluid Model that if a peer obtains a good version it will provide it without deadlines to other peers for downloading is also unrealistic. Several types of peer that will be reluctant to share downloaded files exist, including free riding peers as well as those that leave the P2P file sharing system naturally. To account for this, a new parameter

The effects of
The curves in Figure 16 show variation in the number of polluted copies; as can be seen, although
7.2. Impact of Network Scale
In this subsection, we investigate the impact of network scale on the convergence time of pollution in the proposed mechanism. While a comparison of Figures 8 and 13 would suggest that there is an effect on convergence time owing to network scale, the proportional relation among M, N, and
The variation of good copies in large and small network.
It can be seen that the results in columns 2 and 4 and columns 3 and 5 of Table 2, respectively, are identical. Based on this, it would appear that network scale has no effect on the proposed mechanism as long as the ratios between M and N and M and
7.3. Impact of Fraction of Initial Good Copies and Polluted Copies
This subsection conducted an experiment to inspect the pollution attack resistance ability of a P2P system that uses EVENODD. The aim was to verify the impacts on the convergence times caused by the variance of the initial values

The resistance ability of a P2P system a subject to pollution attack.
7.4. Impact of Degree of Pollution Attack
The parameters R and K both indicate the degree of pollution attack; for example, a larger value of K or a smaller value of

Elimination speed of polluted copies for values of m of 5, 7, 11, 13, 17, and 19 in a large-scale network—
Figure 18(c) shows the results of a more fierce attack. As can be seen, the larger number of split pieces results in the slower elimination of polluted copies, as the pollution attack will be more severe in the beginning. Next, an experiment similar to the first part was conducted, with the results shown in Figure 19.

Elimination speed of polluted copies for values of m of 5, 7, 11, 13, 17, and 19 in a large-scale network—
Figure 19 shows the same results as in Figure 18; this is to be expected, as the analysis in Section 4.3 determined that increasing the number of split pieces will lower the fault tolerance capability of EVENODD. As will be discussed in the next subsection, a more fierce attack will present challenges for the proposed mechanism not only in terms of efficiency but also in terms of network and storage overhead.
7.5. Comparison
The analytical models in the classic antipollution mechanisms Scrubber [6], Credence [12], and Blacklist [13] are implemented and compared with the proposed mechanism in terms of convergence time. In this evaluation, the large-scale network settings described in Section 6.1 are used. Parameters specific to Scrubber and Credence adopted the settings from [6], but the user feedback and β are set to 0.8 and 1.0, respectively, for a more realistic assessment. Because the derivation of the analytical model in this paper is based on the Fluid Model [15], parameters that are specific to Blacklist adopted the settings from that paper. The experiment results are shown in Figure 20.

Convergence time of Scrubber, Credence, Blacklist, and the proposed mechanism.
7.6. Discussion
Most research to date focusing on the development of pollution countering measures has adopted some sort of reputation system based on a globally agreed-upon voting scheme. However, such systems have several inherent weaknesses. First, reputation systems face the cold start problem [26] as, for example, in the case where a new user will have to belong to the system for a period of time in order to build up sufficient reputation. Second, because reputation systems are vulnerable to malicious attacks [40] such as the Sybil attack, additional defensive solutions must be invented and inserted into the system. Third, users may avoid voting or provide a false vote, a tack taken deliberately by malicious users and accidentally by good users who forget to vote or mistakenly cast a reverse vote. Fourth, a reputation system must usually be accompanied by an incentive mechanism [41] that encourages users to provide truthful feedback while also allowing good users to recover from damaged reputation. Together, these potential pitfalls make existing pollution countering schemes based on reputation complicated and difficult to implement.
The major drawback of the proposed mechanism lies in its requirement for additional network and storage overhead, with the analysis in Section 4.2 showing that overhead factors increase as the number of split pieces decreases. However, the experiment conducted in Section 7.4 indicates results contrasting with these in terms of efficiency. Instead of proposing an appropriate number of split pieces, this paper proposes that different types of shared file should be split into different numbers of pieces. For example, a large file split into a small number of pieces will result in high network and storage overhead; furthermore, document files are suited to being broken into smaller numbers of file pieces because they are generally small and their data integrity is important. Another defect of the proposed mechanism comes from the overhead incurred by the tamper detection scheme. When no tamper detection measures are imposed on the proposed mechanism, a user must inspect a downloaded file manually, which will increase the time used for downloading a file and the burden of pollution elimination. On the other hand, if a tamper detection technique as described in Section 4.4 is adopted, the proposed mechanism will become more complicated and cause more network and storage overhead (in the form of additional parity data, e.g.).
The proposed mechanism is very simple and requires only the implementation of EVENODD encoding and decoding procedures on each peer node of a P2P file sharing system. Moreover, as shown in the experimental results, pollution can be effectively suppressed whether or not users can discover and delete polluted files in a timely manner. If reputation-based pollution countering measures can be used in conjunction with the proposed mechanism, file pollution can be eliminated even more effectively.
7.7. Future Work
In this study, we successfully derived an analytical model and obtained numerical results within only a limited space owing to numerous assumptions. However, the analysis of the proposed mechanism can be more realistic if some of these assumptions are not made. First, we consider that the number of polluters is varied. Polluters who are in the system may collude with more malicious peers than those who are not in the system in order to launch a more fierce attack but may give up and exit the system due to the presence of an antipollution mechanism. This variable can be taken into consideration in the derivation of the probability that a peer selects a polluted copy and downloads it (see (22)). Second, since a peer can join or leave a P2P file sharing system at any time, the number of peers is varied from the beginning of file sharing. This includes two variables: one is the probability that a peer stops sharing a downloaded file and leaves the system after finishing the download and the other is that new peers who query for the file join the system in each round. In the former, the files downloaded by peers who stop sharing the file after downloading can be polluted or unpolluted. This implies that the number of polluted and unpolluted versions can be reduced. As in the case of the latter, not only benign peers but also polluters can join the system.
8. Conclusion
Pollution attacks are conducted in P2P file sharing systems to stop the downloading of copyrighted products; further, in recent years, many researchers have attempted to develop mechanisms for legally downloading copyrighted products. We believe that even if the dispute over copyrighted product dissemination ends in the near future, file pollution attacks might become a tool for malicious users to attack P2P file sharing systems, and moreover, this will seriously impact the efficiency of P2P file sharing systems and even degrade the network effectiveness. Hence, we have attempted establish a mechanism to suppress and eliminate the pollution attacks in P2P file sharing systems. By sectioning a file into suitable number of pieces and by applying EVENODD to these pieces, the proposed approach has successfully achieved this goal. Both the correctness and efficiency of the approach were verified by simulation experiments; the evaluation results demonstrate that the application of EVENODD to a P2P file sharing system could effectively reduce 40 to 60% of the pollution duration. Further, EVENODD could be easily embedded in the present P2P file sharing software without requiring any changes to its underlying infrastructures.
Footnotes
Conflict of Interests
The authors declare that there is no conflict of interests regarding the publication of this paper.
