# Benford's Law Computer assignment!

• October 20th, 2012, 04:23 PM
smith999
Benford's Law Computer assignment!
Hi guys.
So this is a pretty big question.. well for me anyways.. and I've started it but am kind of stuck on Question 4. Any advice or help would be greatly appreciated :)

The assignment is at the bottom of this post, and it's rather long. but it's in regards to for loops and arrays... and Benford's law if you have heard of it.

so far I have:
Code java:

```public class BenfordsLaw { public static void main(String[] args) { calculateLeadingDigit(2094928); generateBenfordNumbers(100, 0.1, 4); } /*public static int calculateLeadingDigit(int number) { while (number>9) { number = (number/10); } return number;   }*/ public static double[] generateBenfordNumbers(double initialAmount, double growthRate, int numberPeriods) { double[] benfordArray = new double[numberPeriods]; int x; //counter variable for(x=0;x<numberPeriods;x++) { benfordArray[x] = initialAmount; initialAmount *= (1 + growthRate); } /*for(x=0; x<numberPeriods; x++) { System.out.println(benfordArray[x]); }*/ return benfordArray; } /*public static double[] calculateLeadingDigitProportions(double[]) { } public static double calculateDistance(double[] array1, double[] array2) { }*/ }```
**Particularly for question 4 it says use two methods.... i am very confused about where to start...
THANKS!!
Benfords Law
The rest of these questions involve something called Benford’s Law and should be put into a file BenfordsLaw.java and a class BenfordsLaw. You will be asked to write a series of methods, which due to their independence, can mostly be written one at a time.
Background Information
Benford’s law is an unusual observation that numbers produced in systems governed by exponential growth start with the number 1 the more often than any other digit. That is to say, when a system increases by a certain roughly fixed percentage over time, there tend to be ones at the beginning of the numbers.
Consider the following example. Suppose you start with \$100 and you put your money in a bank account which pays you an interest rate that will cause the money to double every 20 years. (This assumption of a fixed amount of time to double is the exponential growth part). After 20 years, your \$100 will have turned into \$200 and after another 20 years, your \$200 will have turned into \$400. The amount of time that you had an amount of money in the “one hundreds” was twenty years, since for the whole first twenty years you had more than \$100 and less than \$200. The amount of time you had your money in the \$200s is less, however, as the second twenty year interval is partly in the \$200s and partly in the \$300s. Interestingly, this phenomenon can be observed throughout all digits and it turns out that in numbers generated in this sort of way, the approximate percentage of leading digits is given in the graph in Figure 1.
As most prices and incomes tend to grow roughly exponentially over time (due to inflation), our prices and incomes follow the same phenomenon. This knowledge has been used by people to detect tax fraud in cases where people randomly generated numbers without following this pattern. The idea is if you uniformly randomly choose numbers your numbers will start with each of the 9 digits 1/9 of the time. This technique by forensic accountants to clear Bill Clinton of tax fraud one time that he was accused of this as his returns DID obey Benford’s law.
In this question, you will write several methods that will culminate in contrasting the distance from the Benford’s law distribution of two sets: one generated via an exponential growth model and the other generated via a uniformly random distribution.
Page 4
￼Figure 1: The proportion of occurrences of each digit as a first non-zero digit of a number
Question 2: Leading digits (10 points)
Write a method called calculateLeadingDigit which takes as input an int and returns an int rep- resenting the first non-zero digit of the number. Your method header should look like public static int calculateLeadingDigit(int number)
If we call the method calculateLeadingDigit() from a different method, you will provide it with an int and get in return an int
• calcualteLeadingDigit(-41) would return 4 • etc
(If you use Scanner inside your method you will NOT be able to do this correctly for different numbers. As such you should have NO use of Scanner in this method.)
There are several ways to do this. One way is by recalling that dividing a number by 10 shifts the decimal point over by one place. So you could start with your initial number and continue dividing by 10 until the number is a one digit number (i.e. greater than 0 and less than 10). You will need to keep in mind the rules of integer division and also consider what happens if the number is negative in the first place. There are also ways to do this with String as well. (You should know how to do both and study the solutions when they are posted.)
Question 3: Generating a Benford Sequence (20 points)
Write a method generateBenfordNumbers that takes as input two doubles and an int and returns a double[]. The first double initialAmount represents the initial amount of money you start with. The second double growthRate represents the amount of growth per period. The int numberPeriods represents the amount of periods to look at. Your method should return an array of values storing the amount of money after each period of time if initialAmount grows by growthRate proportion in each step.
For example if the amount of money you start with is \$100 and the interest is .1 (i.e. 10 %) and taken over 4 periods, then you should return an array with the contents:
{ 100.0 , 110.0, 121.0, 133.1 }
Note that the first number of the array is the amount you started with. To go from one number to the next, you can multiply the current number by 1 + growthRate
Page 5
public static double[] generateBenfordNumbers(double initialAmount, double growthRate, int numberPeriods)
Question 4: Calculating the Percentages (20 points)
Write a method calculateLeadingDigitProportions which takes as input a double[] numbers and returns a double[]. Your method should analyze the array numbers and return an array representing for each digit, the proportion of times it occurred as a leading digit. In the produced array, the value at the 0th index should represent the proportion of times that a 0 was a leading digit (only occurs when the number is 0 after truncation to an int), the value at the 1st index would contain the proportion of times that a 1 was a leading digit, and so on, up until the 9th index.
For example, if numbers contains:
{ 100, 200.1, 9.3, 10}
then 1 occurs as a leading digit 50 % of the time, 2 occurs 25% of the time, and 9 occurs 25 % of the time, so your produced array should contain:
{0, .5, .25, 0, 0, 0, 0, 0, 0, .25}
You may use the method calculateLeadingDigit to do this, but note that that method expects as input an int, and so you will need to cast or perform a conversion in some other way. Note that if the number is less than 1 but greater than 0 it is fine to call the leading digit 0 after a truncation.
Hint: This method is fairly involved. You may find it useful to write a helper method called, for example, countLeadingDigits which returns an array of counts for each digit and only then calculate the percentages
Question 5: Calculate Distance (15 points)
The final method you should write will involve a comparison of two arrays to calculate how similar they are to each other. Write a method called calculateDistance which takes as input two double[] and returns a double representing the Euclidean distance between the two arrays.
The Euclidean distance can be computed by first calculating the sum of the square distances between the two arrays and then taking the square root of the entire thing (essentially the Pythagorean theorem). For example, if your arrays are acalled array1 and array2 and had size of 3 you could calculate the Euclidean distance by the following:
ﰀ(array1[0] − array2[0])2 + (array1[1] − array2[1])2 + (array1[2] − array2[2])
Since you don’t know ahead of time how large the arrays will be, however, you need to use a loop. Your
method should return this Euclidean distance.
Your method may assume for simplicity that the size of the arrays will always be the same. That is, you do not need to add extra logic to handle that case.
Question 6: Putting it all together (10 points)
The final thing you will do is put everything together using a library function provided for you and putting it into a main method. On the course webpage, you will find a file called BenfordSupportCode.java. This has two methods provided. One method is called generateRandomNumbers and takes as input an int count and returns a double[] filled with count “random” integers. (As you’ll see they are not truly random as the same sequence will occur again and again. This can be changed but was done so that you can reproduce your results more easily). The second method is called getBenfordProbabilities() and returns a double[] representing the Benford’s law distribution of leading digits.
To be able to use these library methods, you must make sure to save the file BenfordSupportCode.java inside the same folder as BenfordsLaw.java . In Eclipse you will need to add it to the project as well. In Dr Java it should be sufficient to simply open the file as well. In notepad, you should compile it by writing javac BenfordsLaw.java BenfordSupportCode.java instead of the usual javac BenfordsLaw.java
￼Page 6
Write a main method that does the following:
1. Generateanarrayof1000numbersusingthemethodyouwrote previouslygenerateBenfordSequence. You should print the parameters you are using (the initial amount of money, the number of steps and the rate of growth at each step). (Note: If you increase the size of the array to be much larger than 1000 you may run into issue with reaching the maximum integer in Java)
2. Generate an array of 1000 numbers using the provided method in the BenfordSupportCode class. Remember that you will need to call these methods by writing the name of the class first (as you do for methods in the Math library).
3. Call your method for getting the distribution of leading numbers on each set.
4. For each set, print the proportion of digits in each.
5. Print the distance between the 2 numbers and the “ideal” Benford distribution. If you have coded everything correctly, the “random” numbers will have a further distance than the Benford distri- bution. (So if this were part of a larger forensic accounting program and you saw the distance was too high, you would realize they had not been generated via a real process but rather an artificial one)
A sample run of the program looks as follows:
dans computer\$ java BenfordsLaw
Generating Benford sequence with initial amount of 100 dollars, .01 growth per period,
and 1000 periods
For Benford, the digit is 0 and the proportion is 0.0
For Benford, the digit is 1 and the proportion is 0.349
For Benford, the digit is 2 and the proportion is 0.167
For Benford, the digit is 3 and the proportion is 0.116
For Benford, the digit is 4 and the proportion is 0.089
For Benford, the digit is 5 and the proportion is 0.074
For Benford, the digit is 6 and the proportion is 0.061
For Benford, the digit is 7 and the proportion is 0.054
For Benford, the digit is 8 and the proportion is 0.048
For Benford, the digit is 9 and the proportion is 0.042
For random generation, the digit is 0 and the proportion is 0.0
For random generation, the digit is 1 and the proportion is 0.11
For random generation, the digit is 2 and the proportion is 0.096
For random generation, the digit is 3 and the proportion is 0.1
For random generation, the digit is 4 and the proportion is 0.108
For random generation, the digit is 5 and the proportion is 0.138
For random generation, the digit is 6 and the proportion is 0.121
For random generation, the digit is 7 and the proportion is 0.104
For random generation, the digit is 8 and the proportion is 0.11
For random generation, the digit is 9 and the proportion is 0.113
The distance of the exponential growth group is 0.05130302135352262
The distance of the random growth group is 0.24517340801971163
Run your code against the test programs to verify it. To do this, you should make sure that SimpsonsParadox.java, BenfordsLaw.java, BenfordSupportCode.java, and AssignmentTwoTests.java are all in the same folder.
Then, compile all of them together by typing
javac *.java
Page 7
OR
In Eclipse, Dr. Java, or the command prompt, you should make sure that the four files are inside the same folder.
If the program does not compile, it means you are either missing a public method or one of the required public methods does not take the correct arguments. Your code must compile with these test cases. If you are not able to do so, you should ask a TA or instructor for help. After this, you may run the test program by typing
java AssignmentTwoTests
As you are writing your code, you may want to test things immediately rather than waiting until you’ve finished the assignment. In this case, a good thing to do is write the method headers and “skeleton methods” with simple return expressions for the required methods. The test cases for the methods you have skeletons of will still fail in most cases, but you’ll be able to run the test cases on the working ones.
• October 21st, 2012, 08:38 AM
Zaphod_b
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by smith999
...stuck on Question 4
.
.
.
Question 4: Calculating the Percentages (20 points)
Write a method calculateLeadingDigitProportions which takes as input a double[] numbers and returns a double[]. Your method should analyze the array numbers and return an array representing for each digit, the proportion of times it occurred as a leading digit. In the produced array, the value at the 0th index should represent the proportion of times that a 0 was a leading digit (only occurs when the number is 0 after truncation to an int), the value at the 1st index would contain the proportion of times that a 1 was a leading digit, and so on, up until the 9th index.
For example, if numbers contains:
{ 100, 200.1, 9.3, 10}
then 1 occurs as a leading digit 50 % of the time, 2 occurs 25% of the time, and 9 occurs 25 % of the time, so your produced array should contain:
{0, .5, .25, 0, 0, 0, 0, 0, 0, .25}
You may use the method calculateLeadingDigit to do this, but note that that method expects as input an int, and so you will need to cast or perform a conversion in some other way. Note that if the number is less than 1 but greater than 0 it is fine to call the leading digit 0 after a truncation.
Hint: This method is fairly involved. You may find it useful to write a helper method called, for example, countLeadingDigits which returns an array of counts for each digit and only then calculate the percentages

Have you tested the code that you have so far?

Can you tell us exactly what part of Question 4 you are stuck on?

The helper method that was suggested could go something like this:

Think of an int array of "counters." Each element of the array tells how many times a particular digit appeared as the leading digit in the set of numbers you are analyzing. (Leading digits for this part can be 0, 1, 2, ..., 9)

Maybe like this:

First element shows the number of times the leading digit was 0
Next element shows the number of times the leading digit was 1
.
.
.
Last element shows the number of times the leading digit was 9

Then, your method to calculate proportions would do what with this array?

Cheers!

Z
• October 21st, 2012, 08:44 AM
curmudgeon
Re: Benford's Law Computer assignment!
I played around with this idea with some small programs and amazingly, it's true and it shows a nice log distribution. Cool!
• October 21st, 2012, 10:52 AM
Zaphod_b
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by curmudgeon
...amazingly...Cool!

I agree that this is a terrific assignment!

It gives the opportunity to learn about applying Java in the investigation of a really interesting real-world concept (above and beyond Java syntax).

For people who weren't particularly interested in the specific assignment the OP was trying to complete, there is a neat explanation of why Benford's Law often (but not always) applies to real-world statistical distributions: Benford's law - Wikipedia

Cheers!

Z
• October 21st, 2012, 04:03 PM
smith999
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by Zaphod_b
Have you tested the code that you have so far?

Can you tell us exactly what part of Question 4 you are stuck on?

The helper method that was suggested could go something like this:

Think of an int array of "counters." Each element of the array tells how many times a particular digit appeared as the leading digit in the set of numbers you are analyzing. (Leading digits for this part can be 0, 1, 2, ..., 9)

Maybe like this:

First element shows the number of times the leading digit was 0
Next element shows the number of times the leading digit was 1
.
.
.
Last element shows the number of times the leading digit was 9

Then, your method to calculate proportions would do what with this array?

Cheers!

Z

Hi!
Thanks so much for taking some time to help me out!

should the input of it be the benfordArray that I already have from the previous method? like this?
public static double[] generateBenfordNumbers(double initialAmount, double growthRate, int numberPeriods)
{
double[] benfordArray = new double[numberPeriods]; //benfordArray is an array of size numberPeriods
int x; //counter variable
for(x=0;x<numberPeriods;x++)
{
benfordArray[x] = initialAmount;
initialAmount *= (1 + growthRate); //changes initialAmount to initialAmount*(1+growthRate)
}
for(x=0; x<numberPeriods; x++) //prints all values of benfordArray up until numberPeriods
{
System.out.println(benfordArray[x]);
}
return benfordArray;
}
}
public static int[] countLeadingDigits(double[] benfordArray) //I haven't written the method yet, because I don't know how... but would that be the correct header or no?
public static double[] calculateLeadingDigitProportions(double[] numbers) {

}

thank youuuu:)
• October 21st, 2012, 04:09 PM
smith999
Re: Benford's Law Computer assignment!
And as for the question you asked at the end?
would it take those numbers and figure out the amount of times each number was the leading digit divided by the total number of times any number was the leading digit?
• October 21st, 2012, 05:05 PM
Zaphod_b
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by smith999
Hi!
...
should the input of it be the benfordArray that I already have from the previous method...

Code java:

``` double [] whatever; // That's all you need; the generateBenfordNumbers() method will fill it in;   // Use the static method from the BenfordsLaw class whatever = BenfordsLaw.generateBenfordNumbers(/* Put in your arguments here */);   // Now analyze the contents of "whatever" int [] frequencies = BenfordsLaw.countLeadingDigits(whatever);```

Of course, if you wanted to you could declare and initialize it in one line (and I hope that you would use a better name than "whatever")

Code java:

` double [] whatever=BenfordsLaw.generateBenfordNumbers(/* Put in your parameters here */);`

Cheers!

Z
• October 21st, 2012, 05:09 PM
Zaphod_b
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by smith999
...amount of times each number was the leading digit divided by the total number of times any number was the leading digit?

That the way that I see it.

Well, I might word it a little differently:

Proportion of the total for a particular digit = (Number of items for which that digit is the leading digit)/(Total number of items)

Cheers!

Z
• October 21st, 2012, 07:39 PM
Junky
Re: Benford's Law Computer assignment!
• October 22nd, 2012, 05:57 PM
smith999
Re: Benford's Law Computer assignment!
Quote:

Originally Posted by Zaphod_b
That the way that I see it.

Well, I might word it a little differently:

Proportion of the total for a particular digit = (Number of items for which that digit is the leading digit)/(Total number of items)

Cheers!

Z

Thanks so much!
I finally got it :)
• October 22nd, 2012, 06:42 PM
jps
Re: Benford's Law Computer assignment!