Distributed Random Numbers With NumPy [UPDATED 2024]

Hello and welcome to this tutorial on Mastering Normally Distributed, Random Numbers, using Python and NumPy. Here we will will be exploring how to utilize NumPy for generating these Probability distributions.

Probability distributions are mathematical functions that provide the likelihood of various potential results in a particular experiment.

These distributions form the backbone of statistics and data science, allowing us to make meaningful insights from the data. Among them, is normal distribution, also known as the Gaussian distribution, which stands out for its ubiquity in the natural and social sciences.

The normal distribution is a continuous probability distribution characterized by a symmetrical, “bell-shaped” curve.

Mean, Median & Mode

The mean, median, and mode (of a normally distributed, random variable) are equal and located at the peak of the distribution’s curve.

For those unfamiliar with these three terms, here is definition of the mean, median and mode:

Definition of the Mean

The mean, (also known as the average), is calculated by adding all the numbers in the dataset and then dividing by the quantity of numbers. For a normally distributed random variable, it is the expected value or average outcome.

The plot above shows a normal distribution of a dataset (represented by the histogram and the black curve), along with its mean (represented by the red dashed line).

The mean (μ), also known as the average, is a measure of the central tendency of a dataset. It is calculated by adding all the numbers in the dataset and then dividing by the quantity of numbers. In a normal distribution, the mean is the peak of the distribution and represents the most likely value (i.e., the value that a random variable is most likely to assume).

Remember, for a normally distributed random variable, the mean is the expected value or average outcome.

Definition of the Median

The median is the middle number found in a sorted list of numbers.

If the list has an odd number of observations, the median is the middle number.
If the list has an even number of observations, the median is calculated as the average of the two middle numbers.

In a normal distribution, it is the point at which half the observations are above and half are below

Normal Distribution - mean and median examples

The plot above now includes the median of the dataset (represented by the blue dashed line), in addition to the mean (represented by the red dashed line).

The median is another measure of central tendency, which is the value separating the higher half from the lower half of a data sample. For a normally distributed dataset, the mean and the median are equal and both represent the peak of the distribution.

As you can see, for this dataset, the mean and median lines overlap because the distribution is symmetric, which is a characteristic of a normal distribution.

Definition of the Mode

The mode is the number that occurs most frequently in a dataset. In a normally distributed random variable, it is the most likely outcome or the peak of the distribution’s curve.

In a perfectly normal distribution, the mode is the same as the mean and median.

The mode is the most frequently occurring value in a dataset. However, with continuous data, it’s very unlikely to have exactly the same value appear more than once, so the mode isn’t really a meaningful statistic.

In other words, every value could be unique and therefore occurs only once, so they would all be the mode, which isn’t helpful. That being said, in the context of probability distributions like the normal distribution, the mode is the value at which the distribution reaches its peak. For a normal distribution, this is the same as the mean and median.

Normal Distribution - mean, median and mode examples

The plot above shows a normal distribution of a dataset (represented by the histogram and the black curve), along with its mean (represented by the red dashed line), median (represented by the blue dashed line), and mode (represented by the orange dashed line).

In this case, the mode is defined as the value at the peak of the histogram, where the highest value exists. Note that this is not necessarily the same as the peak of the smooth curve representing the normal distribution. This discrepancy is due to the fact that the histogram is a discrete approximation of the underlying distribution, and its peak can vary depending on the chosen bin size.

As you can see, for this specific realization of the dataset, the mean, median, and mode are close to each other, but not exactly the same. This is a common situation in real-world datasets, even those that are approximately normally distributed.

Exponential Distribution

To see the difference between the mean, median and mode on a graph, we need to use a skewed distribution where the mean, median, and mode are not the same. A common example of such a distribution is the exponential distribution.

In this case, the mode will be the peak of the distribution, the median will still divide the data into two equal halves, and the mean will be the average of all values. In a skewed distribution, the mean is typically pulled in the direction of the skew, while the median tends to resist the effects of skew and outliers.

Exponential Distribution - mean, median and mode

The plot above shows an exponential distribution of a dataset (represented by the histogram), along with its mean (represented by the red dashed line), median (represented by the blue dashed line), and mode (represented by the orange dashed line).

In this skewed distribution:

The mean (red line) is greater than the median and the mode, which is expected in a distribution with a long tail on the right.
The median (blue line) is the middle value that separates the higher half from the lower half of the data. For skewed distributions, it is usually closer to the mode than the mean.
The mode (orange line) is the value that appears most frequently in the dataset. For an exponential distribution, the mode is at the peak of the distribution, which is at the very beginning of the tail (the minimum value of the dataset).

This demonstrates that the mean, median, and mode are not necessarily equal for non-symmetric (e.g., skewed) distributions, and each measures a different aspect of the central tendency of the data.

Introduction to Normal Distribution

Exponential Distribution aside, in a Normal Distribution, these three measures coincide and are located at the peak of the distribution’s curve, which signifies that it is symmetric about the mean.

Why Normal Distribution is important

The normal distribution, often referred to as the Gaussian distribution, is a type of continuous probability distribution that is fundamental to statistics and data science. It is one of the most important and widely used distributions due to its descriptive ability for a large number of natural, biological, and social behaviors.

This distribution is particularly valuable because it models a multitude of natural phenomena and statistical processes with a high degree of accuracy.

For instance, it describes variables such as height, IQ scores, measurement errors, and light intensity.

Understanding the normal distribution and its properties is key to many areas of study, including but not limited to, physics, engineering, computer science, economics, biology, psychology, and any field that uses statistical techniques.

With the right Python code, you can conduct powerful statistical analyses.

Before we begin, in order to access the code from this tutorial, subscribe to the mailing list below:

Characteristics of Normal Distribution

The normal distribution is characterized by a bell-shaped curve, also known as the Gaussian function. It is symmetric about the mean, showing that data near the mean are more frequent in occurrence than data far from the mean.

The shape of the normal distribution is determined by: the mean (μ) and the standard deviation (σ).

The mean (μ) determines the center of the distribution,
while the standard deviation (σ) determines the height and width of the distribution.

Normal Distribution with Mean and Standard Deviations

The plot above is a visual representation of a normal distribution.

The blue curve represents the normal distribution (Gaussian function).
The red dashed line shows the mean (μ) of the distribution, which is the center of the curve.
The green dashed lines represent one standard deviation (σ) away from the mean on either side.

The mean, or average, is a measure of central tendency, meaning it represents a typical value for the data set. In a normal distribution, the mean, median, and mode are all equal and located at the center of the distribution.

Standard Normal Distribution

A normal distribution with a mean of 0 and a standard deviation of 1 is called a standard normal distribution.

If samples are normally distributed, it is likely that a random sample will have a value near the mean.

In fact, approximately 68% of all data samples fall within one standard deviation away from the mean (i.e., between -1 and +1 when the mean is 0). Following this, about 95% falls within two standard deviations, and about 99.7% falls within three standard deviations. This rule is known as the empirical rule or the 68-95-99.7 rule.

Standard Deviation empirical rule or the 68-95-99.7 rule

The plot above shows a standard normal distribution (mean = 0, standard deviation = 1) represented by the histogram and the black curve. The mean of the distribution is marked by the red dashed line, and the standard deviations are indicated by the blue, purple, and orange dashed lines.

The blue lines represent one standard deviation away from the mean (-1 and +1). About 68% of the data falls within this range, according to the empirical rule.
The purple lines represent two standard deviations away from the mean (-2 and +2). About 95% of the data falls within this range.
The orange lines represent three standard deviations away from the mean (-3 and +3). About 99.7% of the data falls within this range.

This visualization illustrates the empirical rule (also known as the 68-95-99.7 rule) for a standard normal distribution.

Parameters of Normal Distribution

A key feature of the normal distribution is that it is entirely defined by its mean and standard deviation, meaning that if you know these two parameters, you can fully describe it.

two normal distributions, both centered around the same mean (μ = 0), but with different standard deviations (σ

The plot above shows two normal distributions, both centered around the same mean (μ = 0), indicated by the red dashed line.

They however; have different standard deviations (σ).

For example;

The blue curve represents a normal distribution with a standard deviation of σ=0.5 (Normal Distribution 1). The points that are one standard deviation away from the mean are marked by blue dashed lines.
The purple curve represents a normal distribution with a standard deviation of σ=1 (Normal Distribution 2). The points that are one standard deviation away from the mean are marked by purple dashed lines.

As you can see, the standard deviation (σ) influences the spread of the distribution.

Remember that in a normal distribution, the points where the dashed lines (representing one standard deviation away from the mean) intersect with the solid line of the distribution, mark the bounds within which approximately 68% of the data lies.

This is split evenly between the left and right sides of the mean. This results in about 34% of the data existing on each side, up to these intersection points.

Therefore;

A smaller standard deviation results in a distribution that is narrower and taller (blue curve),
while a larger standard deviation leads to a distribution that is wider and shorter (purple curve).

Note that the mean (μ), marked by the red dashed line, could shift the entire distribution left or right along the x-axis, however as both distributions have the same mean (μ = 0), they are centered at the same point in these examples.

This visualization demonstrates that a full understanding of a normal distribution’s shape and location can be obtained with knowledge of just two parameters: the mean and the standard deviation.

The Central Limit Theorem and Normal Distribution

Additionally, due to the Central Limit Theorem, the normal distribution also holds a special place in statistics. The theorem states that, under certain conditions, the sum of a large number of random variables, each of which may be randomly distributed, will itself be approximately normally distributed, regardless of the shape of the original distributions.

This plot demonstrates the Central Limit Theorem. The histogram represents the distribution of sample means from a population with a uniform distribution. Despite the original distribution not being normal, the distribution of the sample means approximates a normal distribution, as predicted by the Central Limit Theorem.

The black curve represents the expected normal distribution. As you can see, the distribution of the sample means (shown by the histogram) closely follows this curve, confirming the theorem’s prediction.

This demonstrates that even when we draw samples from a non-normal distribution, the distribution of the sample means tends to become normally distributed as the number of samples increases, which is a key point of the Central Limit Theorem.

Getting Started with NumPy and Matplotlib

Numerical Python, better known as NumPy, is a powerful library for numerical computing in Python. Among its many features, NumPy provides several functions that allow us to work with probabilities and NumPy’s random number generator (RNG).

Notably, it offers a simple and efficient way to generate random numbers that follow specific probability distributions, including the normal distribution.

In the rest of this tutorial, we will explore how to utilize NumPy’s functionality to generate normally distributed random numbers. We will delve into the concept of normal distribution, understand the significance of its parameters—mean and standard deviation—and how they affect the shape and location of the distribution. Subsequently, we’ll demonstrate how to use NumPy to generate random numbers under this distribution, and how these numbers can be applied in data analysis and statistical modeling.

This exploration will not only provide you with a solid understanding of the normal distribution and its application in Python using NumPy, but will also furnish you with a robust toolset for performing sophisticated statistical analysis and data modeling tasks.

In order to delve into the fascinating world of normal distribution and Python’s powerful NumPy library, you need to have NumPy and Matplotlib installed.

To do this you need to have Python Installed as well as an IDE such as PyCharm, if you havent done so already. Alternatively you can use Google Colab online via your Google account.

Once done you can install NumPy and Matplotlib using the following command:

Mastering Normally Distributed Random Numbers with NumPy

Learn to generate and manipulate normally distributed random numbers with ease using NumPy

Mean, Median & Mode

Definition of the Mean

Definition of the Median

Definition of the Mode

Exponential Distribution

Introduction to Normal Distribution

Why Normal Distribution is important

Characteristics of Normal Distribution

Standard Normal Distribution

Parameters of Normal Distribution

The Central Limit Theorem and Normal Distribution

Getting Started with NumPy and Matplotlib

How to Generate Normally Distributed Random Numbers with NumPy

Generating Multi-Dimensional Arrays with NumPy

How to Plot Normally Distributed Random Numbers with NumPy

Define the Mean and Standard Deviation using NumPy

Working with Random Numbers in NumPy

Understanding the Central Limit Theorem Through Iteration

Conclusion

How to use OpenAI’s API & GTP with LangChain

Getting Started with Python Development in Visual Studio Code

Related Posts

Popular

Special Offers & Discounts

Tutorials