In this article we will talk about the mathematical law, which, despite the fact that it was discovered more than a hundred years ago, began to be applied not so long ago. Benford’s law or the law of the first digit states that in the breakdown. In other words, there is a model that describes this probability.
Benford’s law was not discovered at all by Benford, but by the American astronomer Shimon Newcomb. Around 1881, he noticed that journal pages with logarithmic tables on which numbers began with 1 were much more frayed than pages on which numbers began with 2 and so on until 9 — they were clean as if they had not been opened at all.
Newcomb suggested: those pages that were frayed were used more often by scientists in their research. Later, he decided that those scientists who took a notebook before him display a similar breakdown of numbers. But the law was named after Frank Benford, who noticed this feature later – in 1938. Despite the fact that this law was discovered twice, neither Newcomb nor Benford proved the validity of the law. This happened after 60 years after the discovery of Benford. And the author of the proof is Ted Hill, a mathematician at the Georgia Institute of Technology.
Let’s look at the essence of the law and describe it with a formula. Benford’s law sets the probability with which a particular digit will be the first in a random numerical sequence. This law has a logarithmic form with a base corresponding to the number of possible characters. For example, in the case of the decimal system, the probability that the first digit in the sequence will be d is described as follows:
Now we find out what distributions fall under the law. Here is a list of some:
As for the mathematical significant objects that satisfy the law:
These mathematical objects prove that an array that must obey Benford’s law must be exponentially age-related. There are always clusters of large and small values in such exponential breakdown. For example, the breakdown of the area of water bodies, there are lakes with more rivers, rivers with more seas, and more seas than oceans.
American mathematician Mark Nigrini investigated more than 200 thousand tax returns and saw that in reports almost every third number starts with one. He then developed a program to check the number arrays for compliance with Benford law, which was tested in 1995 by the New York Tax Police and this test helped expose several taxpayers who conceal income.
Some scholars propose using Benford law to falsify presidential elections. Fraud can be determined using regression analysis. You can also create a neural network, in turn, a trained model that will show various kinds of anomalies. For any number of such fraudulent scenarios, you can use different versions of the regression model, as shown below:
In this case, we can get the coefficient of determination of the model more, without distorting the data.
The test and the analysis of this law, which will help determine fraud, can be implemented in CaseWare IDEA data analysis software. All you need to do is click the Benford Test button and select the data to analyze. For in-depth analysis and to study individual pieces of data: build a model, calculate significant coefficients, and more, use Python (the data analysis language) that is built into IDEA.