Welcome to the Java Programming Forums


The professional, friendly Java community. 21,500 members and growing!


The Java Programming Forums are a community of Java programmers from all around the World. Our members have a wide range of skills and they all have one thing in common: A passion to learn and code Java. We invite beginner Java programmers right through to Java professionals to post here and share your knowledge. Become a part of the community, help others, expand your knowledge of Java and enjoy talking with like minded people. Registration is quick and best of all free. We look forward to meeting you.


>> REGISTER NOW TO START POSTING


Members have full access to the forums. Advertisements are removed for registered users.

Results 1 to 7 of 7

Thread: Averaging Numbers in an Array Based on References

  1. #1
    Forum VIP
    Join Date
    Jul 2010
    Posts
    1,676
    Thanks
    25
    Thanked 329 Times in 305 Posts

    Default Averaging Numbers in an Array Based on References

    Ok, so this is sort of hard to explain, so stay with me. I have an array of Objects that contain a Name and a Number. The Names are dates (months and years). The Numbers need to be averaged, but based on conditions. I need to average two number together for each month (January through December). However, not all months for every year is there, not all months have data, the two most recent years must be averaged, and I cant average two months together that will give misleading averages (unless those are my only options).

    My array objects will look something like this:
    Apr-06 2,476
    May-06 2,201
    Jun-06 1,783
    Jul-06 2,048
    Aug-06 1,557
    Sep-06 1,533
    Oct-06 2,614
    Nov-06 2,804
    Dec-06 2,951
    Jan-07 3,644
    Feb-07 3,250
    Mar-07 3,279
    Apr-07 3,007
    May-07 3,273
    Jun-07 2,340
    Jul-07 2,276
    Aug-07 1,819
    Sep-07 1,519
    Oct-07 1,921
    Nov-07 1,983
    Dec-07 2,200
    Jan-08 2,398
    Feb-08 2,604
    Mar-08 2,664
    Apr-08 1,930
    May-08 1,316
    Jun-08 1,105
    Jul-08 1,090
    Aug-08 593
    Sep-08
    Oct-08
    Nov-08
    Dec-08
    Jan-09
    Feb-09
    Mar-09
    Apr-09
    May-09
    Jun-09
    Jul-09
    Aug-09
    Sep-09
    Oct-09
    Nov-09
    Dec-09 827
    Jan-10 1,539
    Feb-10 1,607
    Mar-10 1,823


    So for this array, I would want to do the following averages:
    Jan: Jan-10 and Jan-08
    Feb: Feb-10 and Feb-08
    Mar: Mar-10 and Mar-08
    Apr: Apr-08 and Apr-07
    May: May-08 and May-07
    Jun: Jun-08 and Jun-07
    Jul: Jul-08 and Jul-07
    Aug: Aug-07 and Aug-06
    Sep: Sep-07 and Sep-06
    Oct: Oct-07 and Oct-06
    Nov: Nov-07 and Nov-06
    Dec: Dec-07 and Dec-06

    Notice how I dont want to average in Aug-08 and Dec-09 because they will make the averages misleading (as they are well below the trends of the rest of the data). Also, I dont want to average in Sep-08 through Nov-09 because those months dont have data for them.


    Now, not factoring in the months that arent there and not factoring in the months without data are not too much of a hassle. But I cant think of a way to determine if the data is too small to include in the averaging or not. This is because each data series can have mediums of below 1000 or above 7000 or possibily higher or lower. Also, just because the data is below a certain point for the series, doesnt mean that it will make the data misleading. For instance, looking at the data, January's data is usually much higher than September's data. We dont want January's high data making it so September's data doesnt get counted because it gets considered too low.

    Can anyone help me think of a process to determine if the data is inconsistent for the corresponding data for its month?


  2. #2
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Averaging Numbers in an Array Based on References

    The easiest way to see if a number is "way off" is to simply discount the highest and lowest value for that month (assuming you've got at least 4-5 samples/month).

    If you want to get fancier, you can compute the standard deviation for each month, and if it's not in some range (the smaller the range the more consistent your data is), remove the values that deviate by more than the standard deviation. Note that this method will likely require more data points than the first method in order to be more effective.

    There are other methods you can use to determine if you data is good if you feel these two methods aren't good enough. Look inside of a statistics book (note that many of these methods will require significantly more data points per month compared to the two methods described above)

  3. #3
    Forum VIP
    Join Date
    Jul 2010
    Posts
    1,676
    Thanks
    25
    Thanked 329 Times in 305 Posts

    Default Re: Averaging Numbers in an Array Based on References

    Well, there is the problem I'm facing. I'm usually not going to get 4-5 samples each time. For the program to run at all, I need at least 2, but that might be all I get (in which I would average the two regardless). The problem of finding out if a data point is out of the trend will occur when I get at least 3 data points (which is more than likely going to happen as long as the data is good). However, with like above, I only get 3 data points.

    The easiest way to see if a number is "way off" is to simply discount the highest and lowest value for that month (assuming you've got at least 4-5 samples/month).
    The problem with this is the condition that if the most recent data is good, then use it in the average. The most recent data could be the highest or the lowest, but if it is not too low, then it will be used.


    A problem will never occur when a point is too high to be included in the average. The only reason data may be too low is because either prior to that point or after that point, there was either a time of no data or an outside force that manipulated month's data. If it wasnt for the latter, finding that low point wouldnt be an issue, but my problem isnt that clean and simple.

  4. #4
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Averaging Numbers in an Array Based on References

    Perhaps if you gave us the problem you are trying to solve on a high level?

    In general these methods work well for a noisy data set, but there may be some special reason why they can't be adapted to fit your application, or there may be a much better way to determine what is bad data.

  5. #5
    Forum VIP
    Join Date
    Jul 2010
    Posts
    1,676
    Thanks
    25
    Thanked 329 Times in 305 Posts

    Default Re: Averaging Numbers in an Array Based on References

    Quote Originally Posted by helloworld922 View Post
    Perhaps if you gave us the problem you are trying to solve on a high level?

    In general these methods work well for a noisy data set, but there may be some special reason why they can't be adapted to fit your application, or there may be a much better way to determine what is bad data.
    What do you mean by "on a high level"?

  6. #6
    Super Moderator helloworld922's Avatar
    Join Date
    Jun 2009
    Posts
    2,896
    Thanks
    23
    Thanked 619 Times in 561 Posts
    Blog Entries
    18

    Default Re: Averaging Numbers in an Array Based on References

    What is the scope of the whole application? For example, does it count the number of people who visit a specific website each month and provide different tools for analyzing the data?

  7. #7
    Forum VIP
    Join Date
    Jul 2010
    Posts
    1,676
    Thanks
    25
    Thanked 329 Times in 305 Posts

    Default Re: Averaging Numbers in an Array Based on References

    The numbers indicate the number of people Onboard an airplane per month for a specific market. Months or Years without data indicate periods of time where there were no flights for the current market. So the closest or several closest Months near these empty periods show the numbers when leaving the market or the numbers when reentering the market. In some instances, there will be months missing from the data set entirely (not just no data, but not in the array at all). The average for each month is needed to build a seasonal trend for demand.

Similar Threads

  1. requir GPS references in java
    By hassan ali in forum Java Theory & Questions
    Replies: 6
    Last Post: July 18th, 2010, 10:34 AM
  2. How to write 2 dimensional array of float numbers to binary file?
    By Ghuynh in forum File I/O & Other I/O Streams
    Replies: 4
    Last Post: June 17th, 2010, 04:26 PM
  3. How to extract a particular element details which has more references ???
    By j_kathiresan in forum Algorithms & Recursion
    Replies: 1
    Last Post: December 31st, 2009, 01:11 AM
  4. Grade averaging program with array and multiple methods.
    By jeremykatz in forum What's Wrong With My Code?
    Replies: 8
    Last Post: November 9th, 2009, 09:44 PM
  5. Com based component project using Java
    By jazz2k8 in forum Java Native Interface
    Replies: 1
    Last Post: October 7th, 2008, 10:54 PM