Sage Journals: Discover world-class research

Abstract

Probability and random processes is considered by students to be conceptually one of the most difficult subjects in the undergraduate electrical and computer engineering curriculum. There are numerous reasons for this difficulty encountered by the students. First off, humans are not innately good at probabilistic intuition. Traditionally, this subject has been introduced in a very abstract manner without emphasis on real-world applications from electrical and computer engineering discipline. In addition, extensive use of interactive simulation and visualization tools, offering an alternative way of developing probabilistic intuition, is usually missing from traditional course offerings. This paper presents a unique pedagogical approach to teaching an introductory probability course offered to electrical and computer engineering juniors. The salient features of the proposed pedagogical approach include more emphasis on real-world electrical and computer engineering problems that show the applications of abstract probabilistic concepts; extensive hands-on and interactive MATLAB® simulations of real-world electrical and computer engineering problems that are tightly integrated into the curriculum; highlighting the frequentist approach to build probabilistic intuition using simulations; concrete examples showing how naive probabilistic intuition can be erroneous and how to develop correct probabilistic intuition based on systematically modeling, simulating, and analyzing a problem; and application-based simulations driving the abstract theory rather than the other way around. This pedagogical approach was implemented in a course offered to electrical and computer engineering undergraduates at Purdue University Northwest. The paper presents a concrete example illustrating how the salient features of the proposed pedagogical approach were implemented as part of this course and student data from the courses to validate the efficacy of the proposed approach.

Keywords

Engineering undergraduate education probability simulation visualization interactive software

Introduction

Probability and random processes is an important course in the undergraduate electrical and computer engineering (ECE) curriculum. The importance of this course stems from the fact that it serves as a gateway to the areas of electrical engineering such as communications, signal processing, controls, and networking. Without a good understanding of probabilistic modeling and analysis techniques, a student faces considerable difficulty in the future courses in these areas. No wonder that probability and random processes is offered as a core course in the undergraduate ECE curriculum. Despite its importance, students often find this subject difficult to grasp and fail to develop the motivation to pursue areas of electrical engineering that are heavily dependent on it.¹ There are numerous reasons for comprehension difficulties of this subject encountered by the students.

Due to a strong emphasis on Newtonian mechanics in early engineering education, students are trained to conceptualize and solve deterministic problems. Almost all of the math and physics courses that undergraduate engineering students take in the first two years deal with modeling and analyzing the deterministic world. In addition, introductory electrical engineering courses like digital logic and circuit analysis do not deal with random noisy signals and their effects. Not only is their inadequate exposure and training to solve problems involving uncertainty, but humans, innately, do not tend to do be good at probabilistic intuition.^2
–4 A classic illustration of this is the Monty Hall problem, which even Paul Erdös, one of the most prolific mathematicians, got initially wrong.⁵

Historically, this subject has been taught in a very abstract manner without emphasis on concrete applications from ECE discipline and on modern simulation and visualization tools. Most of the examples covered in the textbooks are of generic nature that fail to provide a deeper insight and motivate ECE students to study this subject.^6
–9 Occasionally, when applications of probability in ECE are presented in textbooks or courses, they are presented in a very non-interactive and terse manner without the aid of modern simulation and visualization tools, giving students a deeper intuitive understanding of the subject.^6
–9

Since probability is taught using the axiomatic approach, emphasis is not placed on the frequentist approach, which even though mathematically not very desirable, can help to build probabilistic intuition. In addition, modern simulation and visualization tools such as MATLAB®,¹⁰ GNU Octave,¹¹ R,¹² and Mathematica®¹³ offer a hands-on and interactive way to develop probabilistic intuition, something that is usually missing from traditional course offerings.

This paper presents the results from a unique pedagogical approach to teaching an introductory probability and random processes course to ECE juniors. The unique pedagogical features of this course include more emphasis on real-world ECE problems from the areas of communication, signal processing, networking, controls, and computer engineering, listed in Table 1, that show the applications of abstract probabilistic concepts; extensive hands-on and interactive MATLAB® simulations of real-world ECE problems highlighting the frequentist approach to build probabilistic intuition; concrete examples showing the fallacies and pitfalls of naive probabilistic intuition, and how to systematically develop correct intuition by doing the appropriate math and validate it through simulations; less time spent on theory and more on solving problems and applications; revisiting different aspects of the same bigger problem during different parts of the course and using abstraction to hide details that depend upon material that has not been covered yet; and highlighting the tradeoffs involved in the design of optimal engineering systems under uncertainty. A complete list of these pedagogical features is shown in Table 2.

Table 1.

Real-world applications of probability in electrical and computer engineering.

No.	Example	Area
1	Signal detection	Communication
2	MAP receiver	Communication
3	MMSE quantization	Signal processing
4	Buffer allocation	Computer networks
5	Link reliability	Computer networks
6	Google’s PageRank Algorithm	Computer engineering
7	Chip reliability	Computer engineering
8	Random phase sinusoid in noise	Communication
9	Carrier phase estimation	Communication
10	Kalman filter	Controls

Table 2.

Pedagogical techniques used in the course.

No.	Pedagogical technique
1	Simulations based on real-world applications of probability in electrical and computer engineering
2	Simulated problems selected from communication, signal processing, networking, controls, and computer engineering
3	Simulation and visualization that help build probabilistic intuition
4	Interactive simulations to show the effect of various system parameters on the performance of a system under uncertainty
5	Relative frequency approach highlighted in simulations along with the axiomatic approach
6	Abstract theory either introduced afterward or during the discussion of a concrete problem with simulation
7	Use of 2D and 3D visualization plots and animations to build probabilistic intuition
8	Identification and correction of the erroneous probabilistic intuition using simulations
9	Simulations highlighting the design tradeoffs involved in optimal engineering system design in the presence of uncertainty
10	Development of a systematic approach to model, analyze, and simulate engineering problems involving uncertainty
11	Re-visiting different aspects of the same problem and simulation during different stages of the course using abstraction

Even though MATLAB®¹⁰ was used as the simulation software during the course due to its wide applicability in ECE curriculum, numerous other alternative simulation software tools exist, such as GNU Octave,¹¹ R,¹² and Mathematica®¹³ that could have equally been used to develop the simulations.

The paper is organized as follows: “Related work” section presents an overview of the related work in this area. “Illustrative example (signal detection)” section presents a concrete example of signal detection to illustrate how the unique pedagogical features of Table 2 were implemented. In “Efficacy studies” section, we present student data to validate the efficacy of the proposed pedagogical techniques. Finally, we conclude the paper in “Conclusion and future work” section highlighting some future directions.

Related work

Engineering education is rapidly evolving based on the new pedagogical techniques enabled by modern computational technology. Various researchers have investigated the efficacy of such pedagogical techniques involving simulation, gaming, visualization, and virtual labs in teaching undergraduate ECE courses. Christou et al.¹⁴ and Dinov et al.¹⁵ describe a design for incorporating Statistics Online Computational Resource (SOCR) tool in instruction. SOCR provides various interactive tools such as statistical calculators, simulation applets, data analysis, and visualization for enhanced instruction in various undergraduate and graduate courses in probability and statistics. Their findings indicate that the effect of SOCR on the treatment group across all classes was significant. The authors concluded that employing SOCR technology enhances overall the students’ understanding and long-term knowledge retention.

Reis and Kenett¹⁶ provide a classification of a wide range of simulation tools used in teaching statistical methods based on the level of complexity and sophistication of the courses. Viali¹⁷ presents the assessment results from the use of a discrete event simulation software QUeuing Event Simulation Tool in a sophomore-level engineering probability and statistics course. The simulations were used to illustrate the effects of different probability distributions on the performance of a system. De Raffaele et al.¹⁸ investigated the efficacy of implementing tangible user interfaces to improve the learning experiences of students in a queuing theory course. By augmenting simulations with a table-top architecture, the authors demonstrate real-time interaction and visualization of queuing theory concepts. Their efficacy study found a 25% increase in grades when compared with the traditional teaching methods.

Garfield et al.¹⁹ present results from a three-month-long teaching experiment utilizing TinkerPlots™ software for statistical modeling and analysis. Their results suggest that incorporating modeling and simulation tools can develop statistical intuition. In addition, their approach seems to help students develop a better appreciation for statistics in practice.

A comprehensive topical survey of some of the most important and recent research on teaching and learning probability is presented in Batanero et al.²⁰ Some of the topics covered in this survey include intuition and learning difficulties in probability and educational technology resources in probability education.

A collection of the latest research in statistics education performed by international scholars is presented in Ben-Zvi and Makar.²¹ The studies in this book present unique challenges that students and teachers of statistics face while learning and teaching statistics, respectively. In addition, important theoretical and techno-pedagogical challenges are also presented. The technological tools presented in this work are used to enhance creative learning, simulate abstract ideas, and experiment with statistical models.

Two recent important bodies of work in the area of application and simulation-based teaching of probability and random processes to electrical and computer engineers are Kay²² and Walrand.²³ The book by Kay²² presents a unique pedagogical approach of presenting motivating examples first and generalizations later to teaching probability and random processes. Even though the book takes a “hands-on” approach to teach probability by providing a comprehensive list of real-world examples and simulations from various disciplines, it is geared toward first-year graduate-level students.

The book by Walrand provides a list of representative applications that make use of a wide range of probabilistic ideas and concepts.²³ The book, however, is written for an upper-division probability course in electrical engineering and computer science and assumes that the students have taken an introductory probability course. The material is organized around applications, and each chapter starts with a real-world electrical engineering and computer science application of probability. “Hands-on” projects using MATLAB® and Simulink® are used throughout the book.

Despite the fact that, recently, there has been an increased interest in an application-based and simulation-driven pedagogical approach to teaching probability, none of the work incorporates all of the pedagogical techniques listed in Table 2. In addition, efficacy studies measuring the effectiveness of these pedagogical techniques are lacking. This paper addresses these shortcomings in the related work.

Illustrative example (signal detection)

Table 1 lists some of the real-world applications of probability and random processes in ECE that were discussed during the course. Interactive MATLAB® simulations were developed for each of these applications from communication, signal processing, networking, controls, and computer engineering. The theory was presented either side-by-side simulations and applications or afterward. Significantly, more time was spent on concrete applications and simulations than on abstract theory. Using simulations, the fallacy of the naive and erroneous probabilistic intuition was highlighted, and a systematic approach to model, analyze, and simulate a probabilistic problem was developed. Only after going through this systematic approach, the students were encouraged to develop correct probabilistic intuition. One of the challenges of using real-world ECE applications was that they relied on probabilistic ideas introduced during different parts of the course. However, this problem was circumvented by using abstraction to hide the details that rely on ideas not yet covered during the course and revisiting the same application during different parts of the course to highlight different aspects of the application. This requires certain skill on the part of the instructor on how to efficiently hide the details without compromising the necessary information needed for the problem at hand; nevertheless, the instructor was able to implement this approach quite efficiently.

Most of the application areas listed in Table 1 are covered to some extent in introductory probability textbooks for ECE majors.^6
–9 So, this is not a question of introducing advanced material to the students, but rather how to more effectively present the material that is already being presented in the current textbooks. Most of the current textbooks tangentially touch these topics without sufficient details and simulations to help build a deeper understanding of the topic. This can be quite frustrating for the students who have to rely on a half-baked idea to understand a difficult concept.

In the following, we present a concrete example of signal detection from Table 1 and show how the pedagogical objectives of Table 2 were achieved using this example as part of the course. In particular, we present extensive simulations and visualizations that were used to develop probabilistic intuition. The interactive nature of these simulations reveals how students can tweak different parameters of the system to see their effects on system performance. Throughout the simulations, we use the relative frequency approach to build a physical understanding of the random phenomenon. Last, the simulations highlight the various design tradeoffs under uncertainty that are an integral part of most of the practical engineering systems.

The signal detection problem along with the associated simulations was re-visited in conjunction with topics such as the total probability theorem, conditional probability, independence, Gaussian random variable, mean, variance, probability density function (PDF), random processes, additive white Gaussian noise (AWGN), power spectral density, response of an LTI system to a stationary random process, wide-sense stationarity, ergodicity, autocorrelation, Monte-Carlo simulations,²⁴ unbiased estimators, variance of estimators, and strong law of large numbers (SLLN). By interacting with the simulations based on the same application of signal detection, the students were able to better grasp the meaning of these abstract probabilistic concepts.

Problem statement

A binary communication system, shown in Figure 1, sends a 0 bit by transmitting a rectangular pulse of magnitude $- v_{0}$ V and duration Ts and a 1 bit by transmitting a pulse of magnitude v₁V and duration Ts. The transmitter sends bit 1 with probability p and bit 0 with probability $1 - p$ . The communication channel corrupts the transmitted signal by adding white Gaussian noise to it. The received signal, corrupted by noise, is passed through an integrator in the receiver, and the output of the integrator is sampled at the end of each T-s interval (assuming pulses are synchronized between transmitter and receiver). The sampled output of the integrator, also known as the decision statistic, is given by the random variable Y

Y = X + N

where X is a random variable denoting the output of the integrator at the end of a T-s interval to the transmitted signal only

X = {\begin{matrix} - v_{0} T, & if 0 is sent \\ v_{1} T, & if 1 is sent \end{matrix}

and N is the output of the integrator at the end of a T-s interval to the noise and is modeled as a Gaussian random variable with mean m and variance

σ^{2} = N_{o} T / 2

V², where

N_{o} / 2

is the ensemble average noise power spectral density in watt per hertz. Noise is assumed to be independent of the transmitted signal. The PDF of N is given by

f_{N} (x) = \frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(x - m)}^{2}}{2 σ^{2}}}, - \infty < x < \infty

(1)

Figure 1.

Binary signal detection.

The receiver uses the following decision rule, at the end of each T second interval, to decide which bit was transmitted during that interval:

If Y > k decide bit 1 was transmitted, else if Y < k decide bit 0 was transmitted, else if Y = k then decide randomly between bit 1 and 0 with equal probability.

In the above, k denotes the decision threshold. The goal is to design the “most reliable” receiver. In other words, what is the value of the optimal threshold k, denoted $k^{*}$ , that minimizes the probability of bit error P_e, also known as the bit error rate (BER), at the receiver. How does this optimal threshold depend upon the system parameters such as pulse amplitudes v₀ and v₁, pulse interval T, probability p of transmitting a bit 1, noise variance $σ^{2}$ , and mean m. What is the minimum value of bit error probability P_e and how does it depend upon the system parameters?

Solution

The probability of bit error at the receiver, P_e, is given by the total probability theorem

\begin{array}{l} P_{e} = pP (error | bit 1 was sent) + (1 - p) P (error | bit 0 was sent) \\ = pP (e | 1) + (1 - p) P (e | 0) \end{array}

(2)

\begin{array}{l} P (e | 1) = P (Y < k | 1) = P (X + N < k | 1) = P (v_{1} T + N < k | 1) = P (N < k - v_{1} T | 1) \\ = P (N < k - v_{1} T), since noise N is independent of the bit transmitted \\ = \int_{- \infty}^{k - v_{1} T} f_{N} (x) d x = \int_{- \infty}^{k} f_{N} (α - v_{1} T) d α \end{array}

(3)

\begin{array}{l} P (e | 0) = P (Y > k | 0) = P (X + N > k | 0) = P (- v_{0} T + N > k | 0) \\ = P (N > k + v_{0} T | 0) = P (N > k + v_{0} T) \\ = \int_{k + v_{0} T}^{\infty} f_{N} (x) d x = \int_{k}^{\infty} f_{N} (α + v_{0} T) d α \end{array}

(4)

Substituting equations (3) and (4) into equation (2)

= pP (N < k - v_{1} T) + (1 - p) P (N > k + v_{0} T)

(5)

P_{e} = p \int_{- \infty}^{k} f_{N} (α - v_{1} T) d α + (1 - p) \int_{k}^{\infty} f_{N} (α + v_{0} T) d α

(6)

Note that P_e in equation (6) is a function of the decision threshold k. In order to find the optimum value of the threshold k, we need to differentiate equation (6) with respect to k and set it equal to 0 to find the critical point. Using the fundamental theorem of calculus and the following property of PDFs

\int_{k}^{\infty} f_{N} (α) d α = 1 - \int_{- \infty}^{k} f_{N} (α) d α

we get

\frac{\partial}{\partial k} P_{e} = p [f_{N} (k - v_{1} T)] + (1 - p) [- f_{N} (k + v_{0} T)] = 0

Substituting equation (1) into above

p [\frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(k - v_{1} T - m)}^{2}}{2 σ^{2}}}] - (1 - p) [\frac{1}{\sqrt{2 π σ^{2}}} e^{- \frac{{(k + v_{0} T - m)}^{2}}{2 σ^{2}}}] = 0

p [e^{- \frac{{(k - v_{1} T - m)}^{2}}{2 σ^{2}}}] = (1 - p) [e^{- \frac{{(k + v_{0} T - m)}^{2}}{2 σ^{2}}}]

e^{- \frac{{(k - v_{1} T - m)}^{2}}{2 σ^{2}} + \frac{{(k + v_{0} T - m)}^{2}}{2 σ^{2}}} = \frac{1 - p}{p}

- \frac{{(k - v_{1} T - m)}^{2}}{2 σ^{2}} + \frac{{(k + v_{0} T - m)}^{2}}{2 σ^{2}} = \ln (\frac{1 - p}{p})

After some simplification

\begin{array}{l} k^{*} = \frac{{(v_{1} T)}^{2} - {(v_{0} T)}^{2} + 2 T m (v_{1} + v_{0}) + 2 σ^{2} \ln (\frac{1 - p}{p})}{2 T (v_{1} + v_{0})} \\ = \frac{(v_{1} - v_{0}) T}{2} + \frac{σ^{2} \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T} + m \end{array}

(7)

The optimal threshold given by equation (7) depends upon pulse amplitudes v₀ and v₁, pulse interval T, probability p of transmitting a bit 1, noise variance $σ^{2}$ , and mean m. In order to check if the critical point given by equation (7) is indeed a minimum, we apply the second derivative test to equation (6)

\frac{\partial^{2}}{\partial k^{2}} P_{e} |_{k = k^{*}} = p {[\frac{\partial}{\partial k} f_{N} (k - v_{1} T)]}_{k = k^{*}} - (1 - p) {[\frac{\partial}{\partial k} f_{N} (k + v_{0} T)]}_{k = k^{*}}

(8)

Differentiating equation (1) with respect to x yields

\frac{\partial}{\partial x} f_{N} (x) = - \frac{(x - m)}{σ^{3} \sqrt{2 π}} e^{- \frac{{(x - m)}^{2}}{2 σ^{2}}}

(9)

Substituting equation (9) into equation (8), we get

\frac{\partial^{2}}{\partial k^{2}} P_{e} |_{k = k^{*}} = p [- \frac{(k^{*} - v_{1} T - m)}{σ^{3} \sqrt{2 π}} e^{- \frac{{(k^{*} - v_{1} T - m)}^{2}}{2 σ^{2}}}] + (1 - p) [\frac{(k^{*} + v_{0} T - m)}{σ^{3} \sqrt{2 π}} e^{- \frac{{(k^{*} + v_{0} T - m)}^{2}}{2 σ^{2}}}]

(10)

From equation (7), it follows that

k^{*} - v_{1} T - m = - \frac{(v_{1} + v_{0}) T}{2} + \frac{σ^{2} \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T}

(11)

k^{*} + v_{0} T - m = \frac{(v_{1} + v_{0}) T}{2} + \frac{σ^{2} \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T}

(12)

Substituting equations (11) and (12) into equation (10) and after some simplification, we get

\frac{\partial^{2}}{\partial k^{2}} P_{e} |_{k = k^{*}} = \frac{(v_{1} + v_{0}) T \sqrt{p (1 - p)}}{σ^{3} \sqrt{2 π}} \exp {- \frac{1}{2 σ^{2}} (\frac{{[(v_{1} + v_{0}) T]}^{2}}{4} + \frac{σ^{4} \ln^{2} (\frac{1 - p}{p})}{{[(v_{1} + v_{0}) T]}^{2}})}

> 0, for 0 < p < 1; v_{1}, v_{0} > 0; σ > 0; T > 0

Hence $k^{*}$ , given by equation (7), is the optimum decision threshold that minimizes the probability of error given by equation (6). The resulting minimum probability of error can be found by substituting $k^{*}$ into equation (5)

\begin{array}{l} P_{e} |_{k = k^{*}} = pP (N < k^{*} - v_{1} T) + (1 - p) P (N > k^{*} + v_{0} T) \\ = pP (\frac{N - m}{σ} < \frac{k^{*} - v_{1} T - m}{σ}) + (1 - p) P (\frac{N - m}{σ} > \frac{k^{*} + v_{0} T - m}{σ}) \\ = p [1 - Q (\frac{k^{*} - v_{1} T - m}{σ})] + (1 - p) Q (\frac{k^{*} + v_{0} T - m}{σ}) \end{array}

(13)

where Q(x) is the Q-function. Substituting equations (11) and (12) into equation (13) and simplifying, we get

\begin{array}{l} = p [1 - Q (- \frac{(v_{1} + v_{0}) T}{2 σ} + \frac{σ \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T})] \\ + (1 - p) Q (\frac{(v_{1} + v_{0}) T}{2 σ} + \frac{σ \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T}) \\ \min_{k} {P_{e}} = P_{e} |_{k = k^{*}} = p Q (\frac{(v_{1} + v_{0}) T}{2 σ} - \frac{σ \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T}) \\ + (1 - p) Q (\frac{(v_{1} + v_{0}) T}{2 σ} + \frac{σ \ln (\frac{1 - p}{p})}{(v_{1} + v_{0}) T}) \end{array}

(14)

where the last equality follows from the identity

1 - Q (- x) = Q (x)

It is convenient to express the probability of bit error in equation (14) in terms of the signal-to-noise ratio (SNR). The SNR is defined as

\begin{array}{l} z = \frac{P_{s}}{P_{n}} \\ z_{dB} = 10 \log_{10} (\frac{P_{s}}{P_{n}}) \end{array}

(15)

where z is the SNR in linear scale, z_dB is the SNR in decibel (dB), P_s is the average (expected value of) signal power in watt, and P_n is the average noise power within the signal bandwidth in watt. For the sake of expressing error probability P_e in terms of SNR, we can assume without loss of generality that v₀ = v₁ (antipodal signaling)

\begin{array}{l} P_{s} = p v_{1}^{2} + (1 - p) v_{0}^{2} = v_{0}^{2} \\ P_{n} = \frac{N_{o}}{T} = \frac{2 σ^{2}}{T^{2}}, since σ^{2} = \frac{N_{o} T}{2} \\ z = \frac{v_{0}^{2} T^{2}}{2 σ^{2}} \\ \frac{v_{0} T}{σ} = \sqrt{2 z} \end{array}

(16)

Substituting equation (16) and v₁ = v₀ in equation (14), we get

P_{e} = pQ (\sqrt{2 z} - \frac{\ln (\frac{1 - p}{p})}{2 \sqrt{2 z}}) + (1 - p) Q (\sqrt{2 z} + \frac{\ln (\frac{1 - p}{p})}{2 \sqrt{2 z}})

(17)

Equation (17) gives the minimum probability of error in equation (14) as a function of SNR for the antipodal signaling scheme. In the special case of equiprobable source bits (p = 1/2), antipodal signaling, and zero-mean Gaussian noise (m = 0), the optimal threshold $k^{*}$ given by equation (7) becomes 0, and the minimum probability of error in equation (17) is given by

\begin{array}{l} P_{e} = \frac{1}{2} Q (\sqrt{2 z}) + \frac{1}{2} Q (\sqrt{2 z}) \\ = Q (\sqrt{2 z}) = Q (\sqrt{\frac{v_{0}^{2} T^{2}}{σ^{2}}}) = Q (\sqrt{\frac{v_{0}^{2} T^{2}}{N_{o} T / 2}}) \\ = Q (\sqrt{\frac{2 v_{0}^{2}}{N_{o} / T}}) = Q (\sqrt{\frac{2 P_{s}}{P_{n}}}) = Q (\sqrt{\frac{2 E_{b}}{N_{o}}}) \end{array}

(18)

where E_b is the average energy in the transmitted pulse and N_o is the single-sided noise power spectral density. Equation (18) is the well-known formula for the BER of the antipodal signaling scheme with equiprobable source symbols in the presence of AWGN.

The reader would agree that the above mathematical analysis, though necessary, can make a student lose the sight of the forest for the trees. Often students are put off by involved mathematical analysis without a bigger picture in mind or alternative means of visualizing the concepts. This is especially true in an abstract course like probability where a lack of alternative means of accessing information can demotivate a student to pursue this subject. Simulations and visualization techniques can present information in a complementary way which significantly enhances the understanding of the student. In order to achieve the pedagogical objectives of Table 2, the following simulations were designed and presented to the students along with the mathematical analysis. It is to be noted that during the course of the semester, these simulations were presented side-by-side the mathematical analysis in an integrated manner.

Simulation

The system shown in Figure 1 was simulated using MATLAB®. A realization of the signal waveforms at different points along the system of Figure 1 is shown in Figure 2. Figure 2(a) shows the transmitted signal waveform corresponding to the bit sequence “01001.” Under the idealized conditions of no noise and channel distortion, the received signal at the input of the integrator in Figure 1 will be the same as the transmitted signal. However, due to the channel noise, the received signal is distorted. A realization of this distorted noisy received signal at the input of the integrator is shown in Figure 2(b). Figure 2(b) only shows one such realization, but as part of the simulation, multiple realizations of the received signal were presented to the students to illustrate the random effects of noise. Figure 2(c) shows the output of the integrator for a realization of the received signal corresponding to the bit sequence “01001.” By looking at Figure 2(c), the students were able to see how decoding errors occur randomly at the sampling instants in the receiver. For example, at T = 2 ms sampling instance, a decoding error occurs because the noise brings the sampled value under the optimal threshold of 0, even though a bit 1 was transmitted during the bit interval between 1 and 2 ms. On the other hand, the receiver decides correctly at the sampling instance of T = 3 ms. The simulation was repeated to show that transmitting the same bit sequence under exactly the same conditions may result in bit errors at different time instants due to the randomness of the noise; however, even this randomness follows certain mathematical laws which are the subject of the probability theory.

Figure 2.

A realization of the signal waveforms for the system shown in Figure 1; v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 1/2, m = 0 mV, and $k^{*}$ = 0 mV. (a) Transmitted waveform for the binary sequence 01001. (b) Received signal plus noise at the input of the integrator; pre-detection SNR = 3 dB; average signal power at the input of integrator P_s = 1 mW; average noise power within the signal bandwidth at the input of integrator P_N = 0.5012 mW; input noise power spectral density $N_{o} / 2 = 2.5059 \times 10^{- 7}$ W/Hz; $σ^{2} = 2.5059 \times 10^{- 10}$ V². (c) Signal at the output of the integrator; pre-detection SNR = –30 dB; average signal power at the input of integrator P_s = 1 mW; average noise power within the signal bandwidth at the input of integrator P_N = 1 W; input noise power spectral density $N_{o} / 2 = 5 \times 10^{- 4}$ W/Hz; $σ^{2} = 5 \times 10^{- 7}$ V².

The students were able to interact with the simulation by changing different system parameters such as noise power, transmitted signal power, and transmitted bit sequence. This provided them a visual interpretation of the effects of signal and noise power on the bit errors. As a result, the students were better able to understand the role of uncertainty due to noise in the performance of a communication system.

An intuitive understanding of abstract probabilistic concepts such as random variable, random process, mean, variance, autocorrelation, power spectral density, stationarity, and ergodicity, can be developed by considering Figure 3 which shows five out of infinitely many possible realizations of the received signal at the integrator output. Each one of these waveforms occurs with a certain probability. The receiver has no a priori knowledge of which one of these waveforms is actually sent; if it did, there would be no point of transmitting any information. The collection or ensemble of these waveforms is called a random process, and each waveform is called a sample function. If we fix the time axis, then the ensemble of real numbers obtained constitutes a random variable. A random process is stationary if its statistics are time-invariant; in other words, they do not depend on the time origin. A random process is ergodic if its ensemble averages, across the sample functions, are equal to the time averages of a single sample function. For example, the mean of the process shown in Figure 3 is zero and consequently does not depend on the time instance. However, this process is not ergodic. To see this consider, a sample function consisting of all 1’s whose time average will not be equal to zero.

Figure 3.

Multiple realizations of the received signal plus noise random process at the input of the integrator for the system shown in Figure 1; v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 1/2, m = 0 mV, and $k^{*}$ = 0 mV; pre-detection SNR = 6 dB; average signal power at the input of integrator P_s = 1 mW; average noise power within the signal bandwidth at the input of integrator P_N = 0.2512 mW; input noise power spectral density $N_{o} / 2 = 1.2559$ × 10⁻⁷ W/Hz; $σ^{2} = 1.2559 \times 10^{- 10}$ V².

An intuitive understanding of why an integrator is used in the receiver of Figure 1 can be garnered from Figure 2(c). As shown by the figure, the integrator smooths out the noise fluctuations while spreading the distance between the output of the two pulses v₀ and v₁ in the absence of noise. Another way of looking at this is to realize that an integrator acts as a lowpass filter and rejects the high-frequency component of noise that is outside the bandwidth of the transmitted pulse without loss of any information. In actuality, the integrator is the optimal filter for the transmitted signal pulses in the sense that it maximizes the output SNR ratio among the class of all LTI systems (matched filter), but this fact is left out for more advanced courses.

It is to be noted that a complete comprehension of the system in Figure 1 would require an understanding of random processes and the response of an LTI system to a wide-sense stationary random process, something that is covered toward the end of an introductory probability course. This is a common problem in using real-world engineering applications to introduce probabilistic concepts as these applications depend upon ideas introduced during different parts of the course. However, this constraint should not preclude application like these and associated simulations from being introduced to the students at an early stage of the course. For example, by using abstraction or information hiding, the information pertaining to the response of an LTI system to a wide-sense stationary process was abstracted out in “Problem statement” section. Later, during the course, when students were introduced to the response of an LTI system to a wide-sense stationary random process, then those aspects of this application and simulation were revealed to them.

In order to further illustrate the effects of different physical parameters on the performance of the system in Figure 1 in a hands-on and interactive manner, more interactive simulations were designed showing the sequence of sampled values at the output of the integrator. This particular visualization approach of showing the sequence of sampled values at the output of the integrator, sequence of random variables that constitute the decision statistic, provides a convenient way of observing the effects of different parameters on the performance of the system, thus revealing a deeper insight into the problem and developing probabilistic intuition. The performance of the system in Figure 1 is measured in terms of probability of bit error, P_e, also known as the BER. This is the likelihood of a bit being in error at the output of the decision device. The frequentist’s interpretation of this is the proportion of errors N_e in a large number of independent and identical transmissions N, i.e., $P_{e} \approx N_{e} / N$ , also known as the relative frequency. Even though this relative frequency approach is not mathematically as desirable as the axiomatic approach, the latter being used to teach probability almost universally, the former can still yield important insight to develop probabilistic intuition. The simulations developed rely on the relative frequency approach for visualization purposes. In the following, we present the effects of changing different physical parameters on the performance of the system, thus revealing deeper probabilistic insights.

Effect of noise variance

Figures 4 and 5 show the effect of noise variance on the probability of bit error of the system shown in Figure 1. Figure 4(a) shows a realization of the received sample values in the absence of noise at the output of the integrator; Figure 4(b) shows a realization of the received sample values with noise at the output of the integrator, i.e. the sequence of random variables Y that are fed into the decision device in Figure 1. The effect of noise perturbing the sample values corresponding to the transmitted signal is evident from this figure. The decision threshold of 0 is also shown in this figure. If the received sample is greater than the decision threshold, then the receiver decides in favor of bit 1; else if it is less than the threshold, the receiver decides in favor of bit 0. If the sample value is equal to the threshold, then the receiver decides between bit 1 and 0 with equal probability. Using this decision rule, the decoded bitstream is shown in Figure 4(c). Finally, the bits in error are displayed in Figure 4(d).

Figure 4.

Effect of noise variance on the BER P_e of the system shown in Figure 1; $σ^{2} = 2 \times 10^{- 7}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 1/2, $k^{*}$ = 0 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.015 and analytical P_e = 0.0127.

Figure 5.

Effect of noise variance on the BER P_e of the system shown in Figure 1; $σ^{2} = 2 \times 10^{- 5}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 1/2, $k^{*}$ = 0 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.413 and analytical P_e = 0.4115.

The simulation provides an interactive way of changing the noise variance and visualizing its effect on the received sample values and probability of bit error. Here, we present two sample figures, Figures 4 and 5, from the simulation. A comparison of these two figures will reveal the effect of noise variance on the probability of bit error. As noise variance is increased hundred folds from Figures 4 to 5, the number of bit errors increases. This is because variance is a measure of spread or dispersion around the mean, and an increase in noise variance results in larger fluctuations around the sample value X corresponding to the transmitted bit, hence resulting in more erroneous crossovers across the decision threshold. This phenomenon can easily be seen by comparing Figures 4(b) and 5(b). The resulting increase in the error rate is evident by comparing Figures 4(d) and 5(d). Often, concepts like mean and variance are presented only as mathematical formulae. By providing illustrative physical examples similar to these can really bring these abstract ideas home to the student.

Figure 16(a) shows the behavior of the optimal decision threshold as a function of noise variance. We observe that the threshold value changes linearly with noise variance, where the slope of this linear dependence is positive for $0 < p < 0.5$ and negative for $0.5 < p < 1$ . There is no dependence of the threshold on noise variance when p = 0.5. This behavior can be explained by noticing that an increase in noise variance results in larger fluctuations from the received sample values that would have been received in the absence of noise. Moving the threshold in the positive direction would result in fewer errors, given that a 0 was transmitted, but more errors, given that a 1 was transmitted. However, since 0’s are more likely to be transmitted than 1’s, moving the threshold in positive direction results in overall reduced error rate. Likewise, when 1’s are more likely to be transmitted, moving the threshold in the negative direction would result in an overall reduced error rate. The larger the fluctuations in noise the further the threshold moves away from 0. Also, the larger the disparity in the likelihood of transmitting a bit 1 or 0, the greater the magnitude of shift in the threshold. In the special case of when p = 0.5, because of the symmetry of the noise around its zero mean, and the equal likelihood of transmitting a bit 1 or 0, the optimal threshold stays at 0. This simulation also provides an intuitive meaning of conditional probability of errors, i.e., $P (e | 1)$ and $P (e | 0)$ , and how do they relate to the total probability of error P_e through the total probability theorem.

Effect of transmitted bit probability

The effect of the probability of transmission of bit 1 on the optimal decision threshold and the probability of bit error is shown by Figures 6 to 8. Figures 6(b) and 7(b) show that as the likelihood of transmitting bit 1 decreases from 0.4 to 0.1, the optimal decision threshold increases from 0.4055 to 2.1972 mV, whereas the probability of bit error decreases from 0.2309 to 0.0908. The reason for the increase in the threshold is because when it is less likely that a 1 was transmitted, it is to the receiver’s advantage that it should err on the side of deciding a 0. This fact provides an intuitive understanding of the total probability theorem. The total probability of error according to the total probability theorem is given by

P_{e} = pP (e | 1) + (1 - p) P (e | 0)

Figure 6.

Effect of transmitted bit probability p on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 2 \times 10^{- 6}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 0.1, $k^{*}$ = 2.1972 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.091 and analytical P_e = 0.0908.

Figure 7.

Effect of transmitted bit probability p on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 2 \times 10^{- 6}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 0.4, $k^{*}$ = 0.4055 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.23 and analytical P_e = 0.2309.

As p decreases, moving the threshold in the positive direction results in a larger value of $P (e | 1)$ and a smaller value of $P (e | 0)$ . But since these conditional probabilities are weighted by p and $1 - p$ , respectively, this results in a net reduction of the total error rate. A similar phenomenon occurs when the value of p increases from 0.4 to 0.9, as evident by comparing Figures 7(b) and 8(b). In this case, the decision threshold moves in the negative direction since when 1’s are more likely to be transmitted, it is to the receiver’s advantage to err on the side of deciding a 1.

Figure 8.

Effect of transmitted bit probability p on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 2 \times 10^{- 6}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 0.9, $k^{*}$ = –2.1972 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.091 and analytical P_e = 0.0908.

Figures 15(a) to (c) show the dependence of optimal decision threshold $k^{*}$ on probability p of transmitting a bit 1 for different combinations of transmitted pulse amplitudes. As p increases from 0 to 1, $k^{*}$ decreases. For any fixed value of p, increasing the ratio $v_{1} / v_{0}$ results in an increase in $k^{*}$ . Likewise, decreasing the ratio $v_{1} / v_{0}$ results in a decrease in the value of $k^{*}$ for any fixed value of p. If $v_{1} / v_{0} = 1$ , the value of the optimal decision threshold is 0 for p = 0.5, irrespective of the value of v₁ or v₀. This is due to the symmetry of the problem with the given parameters: zero-mean noise, equal pulse amplitudes, and the equal likelihood of transmitting a bit 1 or 0. The effect of p on the probability of bit error P_e can be observed by considering Figure 18. For any fixed value of SNR, the further p is from 0.5, the more reduction in P_e occurs; however, this effect is more pronounced in the low SNR region.

Effect of transmitted signal amplitude

Figures 9 to 11 illustrate the effect of increasing the amplitude, and consequently power, of the transmitted pulses on the decision threshold and BER. A comparison of Figures 9(b), 10(b), and 11(b) would reveal that as the transmitted signal amplitudes increase asymmetrically, such that $v_{1} \geq v_{0}$ , the decision threshold moves in the positive direction and the BER decreases. The intuitive reason for this movement of the decision threshold is that as more power is being put into pulses for 1’s as compared to those for 0’s, it becomes less likely for an error to occur, given a 1 was transmitted compared to when a 0 was transmitted. In other words, the conditional probability of error $P (e | 1)$ becomes less than $P (e | 0)$ . Hence the receiver can bring the total probability of error down by erring on the side of deciding a 0. The intuitive reason behind the reduction in the BER, as shown by Figures 9(d), 10(d), and 11(d), is that as the signal amplitudes increase, the transmitted signal power increases and noise has a lesser effect in forcing the received signal samples to cross over the decision threshold.

Figure 9.

Effect of transmitted signal amplitudes v₀ and v₁ on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 5}$ V², m = 0 mV, v₀ = 1 V, v₁ = 1 V, T = 1 ms, p = 0.5, $k^{*}$ = 0 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.374 and analytical P_e = 0.3759.

Figure 10.

Effect of transmitted signal amplitudes v₀ and v₁ on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 5}$ V², m = 0 mV, v₀ = 1.5 V, v₁ = 4 V, T = 1 ms, p = 0.5, $k^{*}$ = 1.25 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.196, analytical P_e = 0.1923.

Figure 11.

Effect of transmitted signal amplitudes v₀ and v₁ on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 5}$ V², m = 0 mV, v₀ = 2 V, v₁ = 8 V, T = 1 ms, p = 0.5, $k^{*}$ = 3 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.056, analytical P_e = 0.0569.

Figures 16(c) and (d) and 17 illustrate the effect of pulse amplitudes on the decision threshold. Figure 16(c) shows the dependence of decision threshold on pulse amplitudes for p = 0.2, i.e. when 0’s are more likely to be transmitted than 1’s. Under this condition, the threshold moves in the positive direction to reduce the overall rate of error; however, this shift in the positive direction increases with an increase in the pulse amplitudes under the constraint that $v_{1} > v_{0}$ . The rate of increase is greater with larger values of the ratio $α = v_{1} / v_{0}$ . The reason for the shift in the threshold with an increase in asymmetrical pulse amplitudes under the constraint $v_{1} > v_{0}$ is because this asymmetry makes the likelihood of bit error given 0 was transmitted higher than the likelihood of bit error given 1 was transmitted. Since bit 0 is already more likely to be transmitted, the threshold should move in the positive direction to reduce the total probability of error. There is one exception to this trend, however; the curve corresponding to v₁ = v₀, which gets closer to 0 as the amplitudes increase. For this case, the increase in SNR for symmetric pulse amplitudes reduces the effect of the shift in the threshold due to p = 0.2.

Similar behavior is observed in Figure 16(d) for the case when 0’s and 1’s are equally likely to be transmitted. As the magnitude of the pulses increases, under the constraint that $α = v_{1} / v_{0} > 1$ is constant, the decision threshold moves in the positive direction in a linear fashion. This shift in the positive direction is more amplified for higher values of α.

Figure 17 shows the dependence of the decision threshold on pulse amplitudes when 1’s are more likely to be transmitted than 0’s. Under this scenario, the threshold should shift in the negative direction due to the higher likelihood of transmitting a 1; however, the proportionately larger increase in the amplitude of v₁ forces the threshold to move in the positive direction. The net effect of these conflicting objectives is illustrated by the curves in Figure 17.

Effect of noise mean

The effect of noise mean on the performance of the system can be understood from Figures 12 to 14. As the noise mean increases in the positive direction, the decision threshold also shifts in the positive direction by the same amount; however, the BER is unaffected. This can be explained by observing that the positive mean of noise shifts the random fluctuations in the received samples by an amount equal to the noise mean. Consequently, the likelihood of a bit error, given bit 0 was transmitted, becomes greater than the likelihood of error, given bit 1 was transmitted, if the threshold were to stay at 0. By moving the threshold in the positive direction, the total probability of error can be reduced when compared to the value obtained with a threshold of 0. The BER does not change in Figures 12 to 14 because by moving the decision threshold by an amount equal to the noise mean, in terms of performance, the system is equivalent to that when both noise mean and decision threshold are equal to 0.

Figure 12.

Effect of noise mean m on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 4}$ V², m = 0 mV, v₀ = 5 V, v₁ = 5 V, T = 1 ms, p = 0.5, $k^{*}$ = 0 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.31, analytical P_e = 0.3085.

Figure 13.

Effect of noise mean m on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 4}$ V², m = 10 mV, v₀ = 5 V, v₁= 5 V, T = 1 ms, p = 0.5, $k^{*}$ = 10 mV, and total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.304 and analytical P_e = 0.3085.

Figure 14.

Effect of noise mean m on the optimal threshold $k^{*}$ and BER P_e of the system shown in Figure 1; $σ^{2} = 1 \times 10^{- 4}$ V², m = 20 mV, v₀ = 5 V, v₁ = 5 V, T = 1 ms, p = 0.5, $k^{*}$ = 20 mV, total number of simulation samples = 1000. (a) Received signal samples without noise at the output of the integrator. (b) Received signal samples with noise at the output of the integrator. (c) Decoded bitstream. (d) Decision errors; simulated P_e = 0.308 and analytical P_e = 0.3085.

Figure 15(d) shows that the optimal decision threshold linearly increases with noise mean. We also observe that for any $0 < p < 0.5$ and fixed noise mean, increasing the noise variance results in an increase in the optimal decision threshold. On the other hand, for $0.5 < p < 1$ and any fixed noise mean, increasing the noise variance results in a decrease in the optimal decision threshold. The reason for this behavior is that increasing the noise variance results in larger deviations from the received sample value in the absence of noise. Therefore, this would suggest that the decision threshold should be moved away from 0 to reduce the error rate unless both bits 0 and 1 are equally likely to be transmitted. The direction in which the decision threshold moves depends upon what type of errors are more likely to occur. When $0 < p < 0.5$ , it is more likely that bit 0’s will be incorrectly decoded as bit 1’s rather than the other way around. Hence the decision threshold should move in the positive direction to minimize the total BER. Similar reasoning holds true when $0.5 < p < 1$ .

Figure 15.

Effect of probability p of transmitting bit 1 on the optimal threshold $k^{*}$ . (a) $v_{1} \geq v_{0}$ . (b) $v_{1} \leq v_{0}$ . (c) v₁ = v₀. (d) Effect of noise mean m on the optimal threshold $k^{*}$ .

Effect of bit interval

Figure 16(b) shows the effect of the duration of transmitted pulses on the decision threshold. As the time duration of the pulses increases, their bandwidth decreases. This results in lesser noise power within the signal bandwidth perturbing the decision statistic. Note that because of the integrator, which acts as a lowpass filter, only the spectral component of the noise within the transmitted signal bandwidth has an impact on the performance of the system. Consequently, the net effect of an increase in the pulse duration is an increase in the SNR ratio of the decision statistic. At higher SNR values, the advantage of shifting the decision threshold away from zero diminishes, and hence the threshold curves get closer to zero as shown in Figure 16(b).

Figure 16.

(a) Effect of noise variance $σ^{2}$ on the optimal threshold $k^{*}$ . (b) Effect of bit interval T on the optimal threshold $k^{*}$ . (c) Effect of transmitted pulse amplitudes, v₀ and v₁, on the optimal threshold $k^{*}, v_{1} \geq v_{0}$ , p = 0.2. (d) Effect of transmitted pulse amplitudes, v₀ and v₁, on the optimal threshold $k^{*}, v_{1} \geq v_{0}$ , p = 0.5.

Figure 17.

Effect of transmitted pulse amplitudes, v₀ and v₁, on the optimal threshold $k^{*}, v_{1} \geq v_{0}$ , p = 0.8.

Effect of SNR on bit error probability

The performance of the system in Figure 1 is measured in terms of the probability of bit error, P_e, which depends on the SNR ratio of the transmitted signals. The SNR is defined by equation (15), and the probability of bit error is given by equation (17), where z is the SNR, p is the probability of transmitting a bit 1, and Q(x) is the Q-function. In order to graphically show the dependence of P_e on SNR, Monte-Carlo simulations were performed to get the performance curves shown in Figure 18. Figure 18(a) shows the performance curves for $0 < p \leq 0.5$ , and Figure 18(b) shows the performance curves corresponding to $0.5 \leq p < 1$ . The symmetric nature of the two sets of curves is evident from Figure 18(a) and (b).

Figure 18.

Effect of SNR ratio in dB on the probability of bit error, P_e (s: simulated; a: analytical); total number of iterations for Monte-Carlo simulations = 10⁶. (a) $0 < p \leq 0.5$ ; (b) $0.5 \leq p < 1$ .

For each simulated value of P_e, 10⁶ iterations of Monte-Carlo simulations were performed. Each simulated value of P_e is itself a random variable which is an estimate of the actual analytical P_e. How close the estimate is to the actual value is measured by its unbiasedness (mean of the estimate is equal to the actual value being estimated) and small variance as the number of iterations grows. The estimate used in the simulations satisfies these two criteria. To ensure that the standard deviation of the estimate is less than the probability that we are trying to estimate, a good rule of thumb is to have at least $N > 10 / P_{e}$ number of iterations. Using this rule of thumb, the number of iterations was fixed to be 10⁶ to estimate $P_{e} > 10^{- 5}$ with reasonable accuracy.

The plots in Figure 18 show that the simulation results are in close agreement with the analytical results. The physical meaning of these curves can be discerned by considering the value of P_e at some point on the curve, for example, consider the point corresponding to SNR = 0 dB on the p = 0.5 curve. An SNR value of 0 dB corresponds to a signal power equal to that of the noise power. For this curve, the corresponding value of P_e is 0.0786, or in other words, approximately 7.865% of the bits will be in error for a sufficiently large number of independent bit transmissions.

Figure 18 shows the design tradeoff between transmitted signal power and the performance of the communication system measured in terms of the probability of bit error P_e. As expected, P_e decreases with an increase in SNR. In other words, the BER of the system can be brought down at the expense of putting more power into the transmitted signals. The simulations effectively highlight this design tradeoff which is an important aspect of any communication system.

The reader would also observe that in the low SNR region (SNR $\leq$ 5 dB), for the same value of SNR, the performance of the system with an unequal likelihood of transmission of bits 1 and 0 is better than the system in which 1’s and 0’s are equally likely to be transmitted. However, this gain in performance diminishes in the high SNR region. The reason for this gain in the performance of the system can be understood by recalling the shifting of the optimal decision threshold when $p \neq 0.5$ . This shift of the threshold in the direction of the bit that is less likely to be transmitted reduces the overall error rate by making it less likely for the more likely of the two types of errors to occur.

The connection between the relative frequency of the frequentist’s approach and probability can be illustrated using this simulation. The link between the two is provided by the SLLN, which states that if X₁, X₂,…, X_n,… are a sequence of independent and identically distributed (IID) random variables with finite mean μ and finite variance, and let M_n be the sample mean given by

M_{n} = \frac{\sum_{i = 1}^{n} X_{i}}{n}

then

P ({\lim_{n \to \infty} M_{n} = μ}) = 1

An intuitive understanding of this statement can be gained within the context of this simulation. Let X₁, X₂,…, X_n,…, be a sequence of IID Bernoulli random variables defined as

X_{i} = {\begin{matrix} 1, & if a decoding error occurs in the simulation \\ 0, & if an error does not occur in the simulation \end{matrix}

where the mean of X_i is

μ = E [X_{i}] = 1 P (X_{i} = 1) + 0 P (X_{i} = 0) = P (X_{i} = 1), \forall i = 1, 2, \dots

and the variance of X_i is

\begin{array}{l} E [(X_{i} - μ)^{2}] = E [X_{i}^{2}] - μ^{2} \\ = 1 P (X_{i} = 1) + 0 P (X_{i} = 0) - P (X_{i} = 1)^{2} \\ = P (X_{i} = 1) [1 - P (X_{i} = 1)] = μ (1 - μ), \forall i = 1, 2, \dots \end{array}

Since the mean and variance are finite, we can apply the SLLN to the sequence of IID random variables X₁, X₂,…, X_n,… The sample mean M_n in this context is given by

M_{n} = \frac{\sum_{i = 1}^{n} X_{i}}{n} = \frac{Number of times bit errors occur in iterations}{Total number of iterations}

In other words, M_n in this context is the relative frequency or proportion of bit errors in the total number of IID iterations in a simulation run. Each nth iteration of a particular simulation run gives a fixed value for the random variable M_n. Stated in another way, each simulation run gives a sequence of real numbers, the values that the sequence of random variables M₁, M₂,…, M_n,… take. Notice that μ in this context is the actual probability of bit error P_e which we are trying to estimate using simulation. The SLLN provides a statement of the convergence of the relative frequency of bit errors to the actual probability of bit errors. What this convergence means is that if we run the simulation experiment multiple times for a sufficiently large number of iterations and then consider the set of those simulation runs in which the sequence of real numbers M_n converges to P_e, then this set is a certain event. In other words, the probability of obtaining an outcome from this event is 1. Equivalently, the probability of a simulation run in which the sequence M_n will not converge to P_e is zero. This type of convergence, when SLLN holds in physical systems, is known as statistical regularity, and it provides strong validation for using the relative frequency as an estimate of the actual probability in physical engineering systems.

Figures 19 and 20 show multiple outputs of the simulation runs illustrating the convergence of the relative frequency of bit errors M_n to P_e as a function of n, the number of iterations in a simulation run. This provides a visual interpretation of the SLLN. The simulation and associated plots in Figures 19 and 20 can also be used to clear the confusion surrounding the distinction between weak law of large numbers (WLLN) and SLLN. The WLLN is a statement of convergence in probability, as opposed to SLLN which is a statement of almost sure convergence. The WLLN is stated as follows: if X₁, X₂,…, X_n,…, are a sequence of IID random variables with finite mean μ, and let M_n be the sample mean given by

M_{n} = \frac{\sum_{i = 1}^{n} X_{i}}{n}

then for any positive ϵ

\lim_{n \to \infty} P (| M_{n} - μ | < ϵ) = 1

Figure 19.

Illustration of the strong law of large numbers showing the convergence of simulated relative frequency to the analytical probability of bit error, p = 0.5, SNR = 0 dB, $P_{e} = 0.0786$ . (a) Number of iterations $1 \leq n \leq 500, ϵ = 0.1$ . (b) Number of iterations $1 \leq n \leq 1000, ϵ = 0.05$ . (c) Number of iterations $1 \leq n \leq 10^{4}, ϵ = 0.005$ .

Figure 20.

Illustration of the strong law of large numbers showing the convergence of simulated relative frequency to analytical probability of bit error, 1000 runs, p = 0.5, SNR = 0 dB, $P_{e} = 0.0786$ . (a) Number of iterations $1 \leq n \leq 500, ϵ = 0.1$ . (b) Number of iterations $1 \leq n \leq 1000, ϵ = 0.05$ . (c) Number of iterations $1 \leq n \leq 10^{4}, ϵ = 0.005$ .

Comparing the definitions of SLLN and WLLN, it is hard to get an intuitive understanding of the difference between them. However, by using the simulation plots in Figures 19 and 20, one can clarify this difference. Under WLLN, with non-zero probability, although very small and decreasing with large n, it is quite possible that at least one of the simulation run will go outside the ϵ distance from μ. Whereas, SLLN says that if we wait long enough, then all infinite sequences that occur with non-zero probability will eventually get inside the ϵ distance from μ and will stay within this distance for any arbitrary positive value of ϵ. Figure 20 shows that all thousand sequences eventually get within ϵ distance from μ and then stay within that band. Using this example, we emphasize again the role of simulations in clarifying abstract probabilistic concepts and correcting erroneous intuition.

Efficacy studies

In order to measure the efficacy of the proposed intuitive, application-based, simulation-driven approach to teaching probability, henceforth referred to as the intervention, a randomized experimental design was utilized. Since this particular course is taught at Purdue University Northwest, a smaller regional campus of Purdue University system, where the average class size for this course is around 10 students, a perfect randomized experimental design was not possible. Nevertheless, the experimental design was constructed to be as close as possible to a completely randomized design within the constraints of the course offering at a campus with small class sizes. Considering the shortcomings of the quantitative statistical tests due to small sample size, we also provide qualitative data from a student survey utilizing 5-point Likert-type scale and essay-based student responses to support the evidence for the efficacy of the proposed pedagogical approach. In addition, the quantitative statistical tests that we employ are the ones recommended for small sample sizes.

As shown below, the data collected from this study provides sufficient evidence, even if the findings are tempered due to the small sample size, for the efficacy of the intervention.

Experimental design

The experimental design consisted of two “randomly” assigned groups. Group 1, the treatment group, consisted of 13 junior-year ECE students who took Probabilistic Methods in Electrical and Computer Engineering (ECE 302) in the spring 2016 semester. Group 2, the control group, consisted of 10 junior-year ECE students who took ECE 302 in the spring 2015 semester. Due to small class sizes and lack of multiple sections, it was not possible to randomly assign subjects to two different groups during the same semester. Nevertheless, the effect of extraneous factors and bias was minimized by keeping the design of the two courses very similar except for the intervention. For instance, the course was taught by the same instructor to both groups using the same textbook, lectures, grading weights , course policy, and difficulty-level of homework, quizzes, and exams. The baseline competency of the two groups was comparable based on their performance in previous courses and the university selection process. The instructor also ensured that homework, quizzes, and exams were of comparable difficulty level throughout the course of the semester for the two groups.

Results

The independent variable was the proposed intuitive, application-based, simulation-driven approach to teaching probability, whereas the dependent variable was the total course score (out of 100%) of each student. The research question being investigated was whether the proposed intuitive, application-based, simulation-driven approach to teaching probability has a significant positive impact on students’ scores? In order to choose an appropriate statistical test to investigate this question, first, the data from each group were checked for normality. Due to the small sample size, Shapiro–Wilk²⁵ and Anderson–Darling²⁶ tests for normality were used with the null hypothesis that the sample is normal with unspecified mean and variance. The Shapiro–Wilk test failed to reject the null hypothesis for the treatment group at 5% significance level with a p-value p = 0.1883 and test statistic (non-normalized) W = 0.9159. The Anderson–Darling test also failed to reject the null hypothesis for the treatment group at 5% significance level with a p-value p = 0.1530, test statistic adstat = 0.5229, and critical value cv = 0.7024.

For the control group, the Shapiro–Wilk test failed to reject the null hypothesis at 5% significance level with a p-value p = 0.3439 and test statistic (non-normalized) W = 0.9247. Likewise, the Anderson–Darling test failed to reject the null hypothesis for the control group at 5% significance level with a p-value p = 0.4412, test statistic adstat = 0.3391, and critical value cv = 0.6857.

Based on the results of the normality tests, a two-sample t-test²⁷ was employed to test the null hypothesis that the student scores in the treatment and control groups came from independent random samples from normal distributions with equal means and unknown and unequal variances, and the alternative hypothesis that the data in treatment and control groups came from populations with unequal means.

The test rejected the null hypothesis at 5% significance level with p-value p = 0.0018, 95% confidence interval for the difference in population mean of the treatment group and the control group $[9.1280, 33.3197]$ , test statistic tstat = 3.7142, degrees of freedom df = 16.2931, and pooled estimate of the population standard deviations $sd = [11.3866, 15.0594]$ . To further obtain the evidence that the intervention resulted in an increase in the population mean of the treatment group’s scores, a left-tailed two-sample t-test was performed with the null hypothesis that the student scores in the treatment and control groups came from independent random samples from normal distributions with the population mean of the treatment group less than the population mean of the control group and unknown and unequal variances. The test rejected the null hypothesis at a significance level of 5% with p-value $p = 9.1733 \times 10^{- 04}$ , 95% confidence interval for the increase in the population mean of the treatment group from the control group $[11.2586, \infty]$ , test statistic tstat = 3.7142, degrees of freedom df = 16.2931, and pooled estimate of the population standard deviations $sd = [11.3866, 15.0594]$ . The results show evidence for a significant increase (at least 11%) in the mean scores of the students in the treatment group with respect to the control group hence providing convincing evidence for the efficacy of the intervention.

Even though the normality tests failed to reject the normality null hypothesis, due to the small sample size and the fact that the median may also reveal valuable information about the populations, the Wilcoxon rank-sum test (equivalent to Mann–Whitney U test)²⁸ was also employed with the null hypothesis that student scores in treatment and control groups are independent samples from continuous distributions with equal medians. The Wilcoxon rank-sum test rejected the null hypothesis at a significance level of 5% with p-value p = 0.0026, rank-sum test statistic ranksum = 205, and z-statistic zval = 3.0078. To test the hypothesis of an increase in the population median of the treatment group, we used a left-sided Wilcoxon rank-sum test with the null hypothesis that student scores in treatment and control groups are independent samples from continuous distributions with the median of treatment group population less than the median of the control group population. The left-sided Wilcoxon rank-sum test rejected the null hypothesis at a significance level of 5% with p-value = 0.0013, rank-sum test statistic ranksum = 205, and z-statistic zval = 3.0078. The results of these non-parametric tests provide further evidence of the efficacy of the proposed intervention.

Student survey data

In addition to the quantitative data presented in “Results” section in support of the evidence for the efficacy of the proposed pedagogical technique, qualitative data from a survey utilizing 5-point Likert-type scale and essay-based student responses were also collected. The respondents for this survey were the same students from the treatment group mentioned in the “Experimental design” section. The survey was conducted online at the end of the semester with the students’ responses kept anonymous. Nine students provided their feedback through this survey. Table 3 lists the questions in the survey, and Figure 21 shows the results of the survey in terms of the frequency distribution of responses to each question on the survey. The survey explicitly mentioned that the term “MATLAB® simulations” in the context of the survey was used to describe the intuitive, application, and simulation-driven pedagogical teaching technique employed in teaching probability during the course of that semester. The results of this survey provide further qualitative evidence for the efficacy of the proposed pedagogical technique. In particular, 88.89% of the students either agreed or strongly agreed that the proposed pedagogical technique improved their understanding and comprehension of abstract probabilistic concepts and helped them in developing probabilistic intuition. About 77.78% of the students either agreed or strongly agreed that the pedagogical technique helped them in applying abstract probabilistic knowledge in practical contexts. The same percentage of students either agreed or strongly agreed that without the pedagogical technique, they would not have been able to grasp an in-depth understanding of the probabilistic concepts in ECE. On the other hand, the survey reveals that the proposed pedagogical techniques had a relatively less impact on motivating students to study probability. One of the remarkable results from this survey is that 88.89% of students either agreed or strongly agreed that they would like to have either a MATLAB®-based lab for this course or a significant portion of the course dedicated to simulation projects as traditionally a lab is not offered along with a probability course.

Figure 21.

Frequency distribution of survey questions. Total number of survey respondents = 9.

Table 3.

Survey questions.

No.	Survey question
1	The MATLAB® simulations improved my understanding and comprehension of abstract probabilistic concepts.
2	The MATLAB® simulations helped me in developing probabilistic intuition.
3	The MATLAB® simulations motivated me to study the subject of probability.
4	The MATLAB® simulations motivated me to study communications, networking, and signal processing; areas of electrical engineering which employ probabilistic analysis.
5	The MATLAB® simulations improved my critical thinking abilities.
6	The MATLAB® simulations improved my ability to solve problems employing probabilistic modeling and analysis.
7	The MATLAB® simulations developed my ability to apply probabilistic concepts to related but different problems in electrical engineering.
8	The MATLAB® simulations helped me with applying the abstract theoretical knowledge in practical contexts.
9	Without MATLAB® simulations, I would not have grasped an in-depth understanding of the probabilistic concepts in electrical engineering.
10	The MATLAB® simulations improved my understanding of probabilistic modeling and performance analysis of various electrical engineering problems in communications, networking, signal processing, and controls.
11	MATLAB® simulations based on real-world engineering applications such as, maximum likelihood signal detection, minimum mean squared error quantization, optimum network buffer allocation, and Kalman filtering, helped foster my interest and motivation in the area of electrical engineering.
12	I would like to have a MATLAB® based lab for this course.

Some of the students’ comments from this survey are also presented in Table 4. They again highlight the importance of the proposed pedagogical technique in helping students develop probabilistic intuition and a deeper understanding of this esoteric subject.

Table 4.

Students’ comments.

Student 1	“MATLAB was a very useful tool in this class, especially for the simulations illustrating some main points to make them more understandable.”
Student 2	“This is a difficult course. During my time of taking the class, I would say at the time it was definitely a tie for my most difficult class, if not the winner. That being said, I think the material was presented as thoroughly and clearly as it possibly could be. In addition, the MATLAB simulations were a great deal of help in trying to understand the deep meanings of some probabilistic measures. This also improved my ability to code and think logically and critically about how to approach a program. This class is truly fundamental in modeling real systems, as all things are probabilistic in their nature, and was taught effectively.”
Student 3	“Some sections of 302 would be exceedingly more difficult to grasp without the MATLAB sections.”
Student 4	“… I also had a hard time understanding correlation and cross-correlation initially, so maybe a practical example on non-abstract things that are correlated, or measuring cross-correlation on two REAL quantities would be helpful.”

Conclusion and future work

Probability and random processes, an important prerequisite for areas of electrical engineering such as communications, controls, signal processing, and networking, is generally considered a conceptually difficult and unintuitive subject by ECE undergraduates.¹ There are numerous reasons for the difficulty encountered by the students while learning this subject. First, due to our physical scale, how we have evolved, and how we observe the physical world with our naked eye, our brains are not wired to intuitively understand the probabilistic phenomena.^3,4 For a long time, this subject has been taught in a very abstract manner with more emphasis on theory than on applications. Modern simulation and visualization tools such as MATLAB® are not often utilized to provide an alternative way of grasping this abstract subject. This paper presents an intuitive, application-based, simulation-driven pedagogical approach to teaching probability and random processes to ECE undergraduates. Some of the highlights of the proposed pedagogical approach entail introducing abstract probabilistic concepts using real-world applications of probability in ECE; extensive, tightly integrated, interactive MATLAB® simulations to build probabilistic intuition; use of relative frequency approach in simulations to complement the axiomatic approach; and modeling and analysis of optimal engineering systems design problems in the presence of uncertainty highlighting the design tradeoffs and their impact on the system performance. The proposed pedagogical approach was implemented during the course of a semester, and data were collected to ascertain the efficacy of this approach by comparing the performance of the treatment group with a control group. The statistical tests employed to measure the efficacy of this approach provide sufficient evidence toward the effectiveness of this approach. In addition, the student survey data provide another validation of the effectiveness of the proposed pedagogical technique. In the future, we intend to extend the proposed pedagogical technique to other courses in the ECE curriculum and measure its efficacy with a larger sample size.

Footnotes

Author's Note

Waseem Sheikh is now affiliated with the CSET Department, Oregon Institute of Technology, 3201 Campus Dr, Klamath Falls, OR 97601, USA. Email: waseem.sheikh@oit.edu

Declaration of Conflicting Interests

The author declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

Funding

The author received no financial support for the research, authorship, and/or publication of this article.

References

EECS undergraduate workload survey results (n.d.), www.eecs.umich.edu/eecs/undergraduate/survey/ (accessed 26 August 2017).

Shermer, M. A random walk through middle land. How randomness rules our world and why we cannot see it, Part 2. Sci Am 2008; 299(4): 40.

Shermer, M. Folk numeracy and middle land. Why our brains do not intuitively grasp probabilities, Part 1. Sci Am 2008; 299(3): 40.

Mlodinow

The drunkard’s walk: how randomness rules our lives. New York, NY: Vintage, 2009.

Rosenhouse

The Monty Hall problem: the remarkable story of math’s most contentious brain teaser. Oxford, UK: Oxford University Press, 2009.

Leon-Garcia

Probability, statistics, and random processes for electrical engineering, 3rd ed. Upper Saddle River, NJ: Pearson/Prentice Hall, 2008.

Bertsekas

Tsitsiklis

JN.

Introduction to probability. Belmont, MA: Athena Scientific, 2002.

Yates

Goodman

DJ.

Probability and stochastic processes. Hoboken, NJ: John Wiley & Sons, 1999.

Therrien

Tummala

Probability and random processes for electrical and computer engineers. Boca Raton, FL: CRC Press, 2011.

10.

MATLAB: The language of technical computing (n.d.), www.mathworks.com/products/matlab.html?s_tid=hp_products_matlab (accessed 26 August 2017).

11.

GNU Octave scientific programming language (n.d.), www.gnu.org/software/octave/ (accessed 26 August 2017).

12.

The R project for statistical computing (n.d.), www.r-project.org/ (accessed 26 August 2017).

13.

Wolfram Mathematica: the world’s definitive system for modern technical computing (n.d.), www.wolfram.com/mathematica/ (accessed 26 August 2017).

14.

Christou N, Dinov ID and Sanchez J. Design and evaluation of SOCR tools for simulation in undergraduate probability and statistics courses, 2007.

15.

Dinov

Sanchez

Christou

Pedagogical utilization and assessment of the statistic online computational resource in introductory probability and statistics courses. Comput Educ 2008; 50: 284–300.

16.

Reis M and Kenett RS. A structured overview on the use of computational simulators for teaching statistical methods. Quality Engineering 2017; 29(4): 730–744.

17.

Viali

Using spreadsheets and simulation to enhance the teaching of probability and statistics to engineering students, ICEE 2002. In: International conference on engineering education, UMIST, Manchester, England, 2002.

18.

De Raffaele C, Smith S and Gemikonakli O. Teaching and learning queueing theory concepts using Tangible User Interfaces. In: IEEE International Conference on Teaching, Assessment, and Learning for Engineering (TALE), 2016, pp. 194–201. IEEE.

19.

Garfield

Zieffler

, et al. Developing statistical modelers and thinkers in an introductory, tertiary-level statistics course. ZDM Math Educ 2012; 44: 883–898.

20.

Batanero

Chernoff

Engel

, et al. Research on teaching and learning probability. New York, NY: Springer, 2016, pp.1–33.

21.

Ben-Zvi

Makar

The teaching and learning of statistics: international perspectives. New York, NY: Springer, 2015.

22.

Kay

Intuitive probability and random processes using MATLAB®. Berlin, Germany: Springer Science & Business Media, 2006.

23.

Walrand J. Probability in Electrical Engineering and Computer Science: An Application-Driven Course. Quorum Books, 2014.

24.

Rubinstein

Kroese

DP.

Simulation and the Monte Carlo method. Vol. 10. New York, NY: John Wiley & Sons, 2016.

25.

Shapiro

Wilk

MB.

An analysis of variance test for normality (complete samples). Biometrika 1965; 52: 591–611.

26.

Anderson

Darling

DA.

A test of goodness of fit. J Am Stat Assoc 1954; 49: 765–769.

27.

Casella

Berger

RL.

Statistical inference. Pacific Grove, CA: Duxbury, 2002.

28.

Hollander

Wolfe

Chicken

Nonparametric statistical methods. New York, NY: John Wiley & Sons, 2013.

An intuitive,application-based,simulation-driven approach to teaching probability and random processes

Abstract

Abstract

Keywords

Introduction

Related work

Illustrative example (signal detection)

Problem statement

Solution

Simulation

Effect of noise variance

Effect of transmitted bit probability

Effect of transmitted signal amplitude

Effect of noise mean

Effect of bit interval

Effect of SNR on bit error probability

Efficacy studies

Experimental design

Results

Student survey data

Conclusion and future work

Footnotes

Author's Note

Declaration of Conflicting Interests

Funding

References