Hi guys.

So this is a pretty big question.. well for me anyways.. and I've started it but am kind of stuck on Question 4. Any advice or help would be greatly appreciated

The assignment is at the bottom of this post, and it's rather long. but it's in regards to for loops and arrays... and Benford's law if you have heard of it.

so far I have:

**Particularly for question 4 it says use two methods.... i am very confused about where to start...public class BenfordsLaw { public static void main(String[] args) { calculateLeadingDigit(2094928); generateBenfordNumbers(100, 0.1, 4); } /*public static int calculateLeadingDigit(int number) { while (number>9) { number = (number/10); } return number; }*/ public static double[] generateBenfordNumbers(double initialAmount, double growthRate, int numberPeriods) { double[] benfordArray = new double[numberPeriods]; int x; //counter variable for(x=0;x<numberPeriods;x++) { benfordArray[x] = initialAmount; initialAmount *= (1 + growthRate); } /*for(x=0; x<numberPeriods; x++) { System.out.println(benfordArray[x]); }*/ return benfordArray; } /*public static double[] calculateLeadingDigitProportions(double[]) { } public static double calculateDistance(double[] array1, double[] array2) { }*/ }

THANKS!!

Benfords Law

The rest of these questions involve something called Benford’s Law and should be put into a file BenfordsLaw.java and a class BenfordsLaw. You will be asked to write a series of methods, which due to their independence, can mostly be written one at a time.

Background Information

Benford’s law is an unusual observation that numbers produced in systems governed by exponential growth start with the number 1 the more often than any other digit. That is to say, when a system increases by a certain roughly fixed percentage over time, there tend to be ones at the beginning of the numbers.

Consider the following example. Suppose you start with $100 and you put your money in a bank account which pays you an interest rate that will cause the money to double every 20 years. (This assumption of a fixed amount of time to double is the exponential growth part). After 20 years, your $100 will have turned into $200 and after another 20 years, your $200 will have turned into $400. The amount of time that you had an amount of money in the “one hundreds” was twenty years, since for the whole first twenty years you had more than $100 and less than $200. The amount of time you had your money in the $200s is less, however, as the second twenty year interval is partly in the $200s and partly in the $300s. Interestingly, this phenomenon can be observed throughout all digits and it turns out that in numbers generated in this sort of way, the approximate percentage of leading digits is given in the graph in Figure 1.

As most prices and incomes tend to grow roughly exponentially over time (due to inflation), our prices and incomes follow the same phenomenon. This knowledge has been used by people to detect tax fraud in cases where people randomly generated numbers without following this pattern. The idea is if you uniformly randomly choose numbers your numbers will start with each of the 9 digits 1/9 of the time. This technique by forensic accountants to clear Bill Clinton of tax fraud one time that he was accused of this as his returns DID obey Benford’s law.

In this question, you will write several methods that will culminate in contrasting the distance from the Benford’s law distribution of two sets: one generated via an exponential growth model and the other generated via a uniformly random distribution.

Page 4

￼Figure 1: The proportion of occurrences of each digit as a first non-zero digit of a number

Question 2: Leading digits (10 points)

Write a method called calculateLeadingDigit which takes as input an int and returns an int rep- resenting the first non-zero digit of the number. Your method header should look like public static int calculateLeadingDigit(int number)

If we call the method calculateLeadingDigit() from a different method, you will provide it with an int and get in return an int

• calculateLeadingDigit(103) would return 1 • calculateLeadingDigit(0) would return 0

• calcualteLeadingDigit(-41) would return 4 • etc

(If you use Scanner inside your method you will NOT be able to do this correctly for different numbers. As such you should have NO use of Scanner in this method.)

There are several ways to do this. One way is by recalling that dividing a number by 10 shifts the decimal point over by one place. So you could start with your initial number and continue dividing by 10 until the number is a one digit number (i.e. greater than 0 and less than 10). You will need to keep in mind the rules of integer division and also consider what happens if the number is negative in the first place. There are also ways to do this with String as well. (You should know how to do both and study the solutions when they are posted.)

Question 3: Generating a Benford Sequence (20 points)

Write a method generateBenfordNumbers that takes as input two doubles and an int and returns a double[]. The first double initialAmount represents the initial amount of money you start with. The second double growthRate represents the amount of growth per period. The int numberPeriods represents the amount of periods to look at. Your method should return an array of values storing the amount of money after each period of time if initialAmount grows by growthRate proportion in each step.

For example if the amount of money you start with is $100 and the interest is .1 (i.e. 10 %) and taken over 4 periods, then you should return an array with the contents:

{ 100.0 , 110.0, 121.0, 133.1 }

Note that the first number of the array is the amount you started with. To go from one number to the next, you can multiply the current number by 1 + growthRate

Page 5

Hint: Your method header should be

public static double[] generateBenfordNumbers(double initialAmount, double growthRate, int numberPeriods)

Question 4: Calculating the Percentages (20 points)

Write a method calculateLeadingDigitProportions which takes as input a double[] numbers and returns a double[]. Your method should analyze the array numbers and return an array representing for each digit, the proportion of times it occurred as a leading digit. In the produced array, the value at the 0th index should represent the proportion of times that a 0 was a leading digit (only occurs when the number is 0 after truncation to an int), the value at the 1st index would contain the proportion of times that a 1 was a leading digit, and so on, up until the 9th index.

For example, if numbers contains:

{ 100, 200.1, 9.3, 10}

then 1 occurs as a leading digit 50 % of the time, 2 occurs 25% of the time, and 9 occurs 25 % of the time, so your produced array should contain:

{0, .5, .25, 0, 0, 0, 0, 0, 0, .25}

You may use the method calculateLeadingDigit to do this, but note that that method expects as input an int, and so you will need to cast or perform a conversion in some other way. Note that if the number is less than 1 but greater than 0 it is fine to call the leading digit 0 after a truncation.

Hint: This method is fairly involved. You may find it useful to write a helper method called, for example, countLeadingDigits which returns an array of counts for each digit and only then calculate the percentages

Question 5: Calculate Distance (15 points)

The final method you should write will involve a comparison of two arrays to calculate how similar they are to each other. Write a method called calculateDistance which takes as input two double[] and returns a double representing the Euclidean distance between the two arrays.

The Euclidean distance can be computed by first calculating the sum of the square distances between the two arrays and then taking the square root of the entire thing (essentially the Pythagorean theorem). For example, if your arrays are acalled array1 and array2 and had size of 3 you could calculate the Euclidean distance by the following:

ﰀ(array1[0] − array2[0])2 + (array1[1] − array2[1])2 + (array1[2] − array2[2])

Since you don’t know ahead of time how large the arrays will be, however, you need to use a loop. Your

method should return this Euclidean distance.

Your method may assume for simplicity that the size of the arrays will always be the same. That is, you do not need to add extra logic to handle that case.

Question 6: Putting it all together (10 points)

The final thing you will do is put everything together using a library function provided for you and putting it into a main method. On the course webpage, you will find a file called BenfordSupportCode.java. This has two methods provided. One method is called generateRandomNumbers and takes as input an int count and returns a double[] filled with count “random” integers. (As you’ll see they are not truly random as the same sequence will occur again and again. This can be changed but was done so that you can reproduce your results more easily). The second method is called getBenfordProbabilities() and returns a double[] representing the Benford’s law distribution of leading digits.

To be able to use these library methods, you must make sure to save the file BenfordSupportCode.java inside the same folder as BenfordsLaw.java . In Eclipse you will need to add it to the project as well. In Dr Java it should be sufficient to simply open the file as well. In notepad, you should compile it by writing javac BenfordsLaw.java BenfordSupportCode.java instead of the usual javac BenfordsLaw.java

￼Page 6

Write a main method that does the following:

1. Generateanarrayof1000numbersusingthemethodyouwrote previouslygenerateBenfordSequence. You should print the parameters you are using (the initial amount of money, the number of steps and the rate of growth at each step). (Note: If you increase the size of the array to be much larger than 1000 you may run into issue with reaching the maximum integer in Java)

2. Generate an array of 1000 numbers using the provided method in the BenfordSupportCode class. Remember that you will need to call these methods by writing the name of the class first (as you do for methods in the Math library).

3. Call your method for getting the distribution of leading numbers on each set.

4. For each set, print the proportion of digits in each.

5. Print the distance between the 2 numbers and the “ideal” Benford distribution. If you have coded everything correctly, the “random” numbers will have a further distance than the Benford distri- bution. (So if this were part of a larger forensic accounting program and you saw the distance was too high, you would realize they had not been generated via a real process but rather an artificial one)

A sample run of the program looks as follows:

dans computer$ java BenfordsLaw

Generating Benford sequence with initial amount of 100 dollars, .01 growth per period,

and 1000 periods

For Benford, the digit is 0 and the proportion is 0.0

For Benford, the digit is 1 and the proportion is 0.349

For Benford, the digit is 2 and the proportion is 0.167

For Benford, the digit is 3 and the proportion is 0.116

For Benford, the digit is 4 and the proportion is 0.089

For Benford, the digit is 5 and the proportion is 0.074

For Benford, the digit is 6 and the proportion is 0.061

For Benford, the digit is 7 and the proportion is 0.054

For Benford, the digit is 8 and the proportion is 0.048

For Benford, the digit is 9 and the proportion is 0.042

For random generation, the digit is 0 and the proportion is 0.0

For random generation, the digit is 1 and the proportion is 0.11

For random generation, the digit is 2 and the proportion is 0.096

For random generation, the digit is 3 and the proportion is 0.1

For random generation, the digit is 4 and the proportion is 0.108

For random generation, the digit is 5 and the proportion is 0.138

For random generation, the digit is 6 and the proportion is 0.121

For random generation, the digit is 7 and the proportion is 0.104

For random generation, the digit is 8 and the proportion is 0.11

For random generation, the digit is 9 and the proportion is 0.113

The distance of the exponential growth group is 0.05130302135352262

The distance of the random growth group is 0.24517340801971163

Verifying your code

Run your code against the test programs to verify it. To do this, you should make sure that SimpsonsParadox.java, BenfordsLaw.java, BenfordSupportCode.java, and AssignmentTwoTests.java are all in the same folder.

Then, compile all of them together by typing

javac *.java

Page 7

OR

javac AssignmentTwoTests.java SimpsonsParadox.java BenfordsLaw.java BenfordSupportCode.java

In Eclipse, Dr. Java, or the command prompt, you should make sure that the four files are inside the same folder.

If the program does not compile, it means you are either missing a public method or one of the required public methods does not take the correct arguments. Your code must compile with these test cases. If you are not able to do so, you should ask a TA or instructor for help. After this, you may run the test program by typing

java AssignmentTwoTests

As you are writing your code, you may want to test things immediately rather than waiting until you’ve finished the assignment. In this case, a good thing to do is write the method headers and “skeleton methods” with simple return expressions for the required methods. The test cases for the methods you have skeletons of will still fail in most cases, but you’ll be able to run the test cases on the working ones.