java - jHiccup: histogram data seems inconsistent -


i'm running application azul's jhiccup. application started via jsvc -javaagent option. jhiccup settings default (except output filename).

the application runs ~107 minutes. however, if add values resulting hgrm file, total length of hiccups seems large.

i calculate sum of hiccup values this:

sum( (totalcountincludingthisvalue[n] - totalcountincludingthisvalue[n-1]) * value[n] ) 

the result 12.75 mln., or ~213 minutes (presuming values in milliseconds). twice duration of application run.

total count * mean (from bottom of histogram values) give 12 mln., formula correct.

for comparison, added "application time" , "stopped time" gc log. results close expect: ~105 min. app time , ~2 min. stopped time.


i've accidentally overwritten original files, these example files different run:
https://dl.dropboxusercontent.com/u/644287/jhiccup/hiccup.dat
https://dl.dropboxusercontent.com/u/644287/jhiccup/hiccup.dat.hgrm

the run took ~79 minutes (17:50-19:09), mean*totalcount ~23 mln. (that is, 380 minutes).

update: [keeping original mistaken post below]

ok. both had math wrong (both in question , in initial answer). sanity check says (total count * mean( should equal runtime bogus math.

this demonstrated following thought exercise:

the following simple scenario: measure application runs (idle, intervals showing 0 hiccup) 100 seconds. ^z system 100 seconds. let run in foreground again , ^c it. total run time 200 seconds.

here proper, "how human describe in numbers" statements such system:

  • the 99.99%'ile close 100,000 msec.
  • the 10%'ile, 20%'ile, ... 50%'ile close 0 msec.
  • during 1st 100 seconds (and first 100,000 sample periods), mean 0 msec.
  • during 2nd 100 seconds (and second 100,000 sample periods) mean 50,000 msec.
  • the overall mean 25,000 milliseconds (over total run time of 200 second period).

but there (properly) 200,000 results reported in totalcount histogram (one each 1msec of elapsed time).

so (mean * total count) = 200,000 * 25,000 == 5,000,000,000

that's 25,000x as wall clock time. , there nothing wrong that. there ~n^2 thing going on multiplication (actually n * (n-1) / 2), , that's how behave...

the key here each percentile, histogram [correctly] reports percentage of overall random samples (performed without coordination on time) encounter hiccup levels below magnitude. when single long freeze occurs (e.g. seen in 2,654 msec max in sample output), that's not 1 point in time affected. there many sample points in time affected there milliseconds in freeze. measured , represented, single freeze generate 1 count per msec, values varying linearly between max value , 0 (2654msec, 2653msec, 2652msec, ... 3msec, 2msec, 1msec). right way calculate spread of possible latencies that random, uncoordinated sample have encountered.

i'm embarrassed, because original answer repeated same mistaken assumption warn (with graphs in slides discuss above scenario) in talk titled "how not measure latency" (http://www.infoq.com/presentations/latency-pitfalls , , under "the coordinated omission problem" 32 minutes in). built both jhiccup , hdrhistogram right math avoid problem, why see right results in data.

in fact, had sanity check shown "good" numbers, max times in 1000s of msec, indication measurement system exhibiting coordinated omission...

-- gil.

-------- original answer, "wrong" -----------

can post example .hgrm data set shows this? files looked @ seem show close relationship between total instrumented run time , counts in .hgrm file. simplest sanity test total count * mean (both reported @ bottom of file) should close runtime. data come same histogram produces percentile lines...

note counts @ percentile lines not additive pretty simple reason. 2 percentile levels reported same exact value. quite common long runs have little noise in them. however, math above (only counting delta counts) "should" work right.

[small note, math off 1 line: each percentile line shows percentile , count of values have values @ or below value on line. value should multiplied delta between counts @ next line (n+1) , count @ "this" line (n), should not have made big difference in measurement].

below sample file forma control run took ~1685 seconds, jhiccup started instrumenting 30 seconds run (so ~1655 seconds of instrumented time). using excel on data set, results of:

sum( (totalcountincludingthisvalue[n+1] - totalcountincludingthisvalue[n]) * value[n] )

shows ~16237 seconds, , mean * total count shows ~16591 seconds.

jhiccup histogram report, mon apr 22 15:28:07 cdt 2013 :

value, percentile, totalcountincludingthisvalue

1.01 0.000000000000         47 1.06 0.100000000000     483774 1.06 0.200000000000     483774 1.06 0.300000000000     483774 1.10 0.400000000000     628589 1.11 0.500000000000     804685 1.12 0.550000000000    1024190 1.12 0.600000000000    1024190 1.12 0.650000000000    1024190 1.13 0.700000000000    1388741 1.13 0.750000000000    1388741 1.13 0.775000000000    1388741 1.13 0.800000000000    1388741 1.13 0.825000000000    1388741 1.13 0.850000000000    1388741 1.13 0.875000000000    1388741 1.13 0.887500000000    1388741 1.13 0.900000000000    1388741 1.13 0.912500000000    1388741 1.14 0.925000000000    1471124 1.14 0.937500000000    1471124 1.14 0.943750000000    1471124 1.14 0.950000000000    1471124 1.14 0.956250000000    1471124 1.14 0.962500000000    1471124 1.14 0.968750000000    1471124 1.14 0.971875000000    1471124 1.14 0.975000000000    1471124 1.14 0.978125000000    1492169 1.14 0.981250000000    1492169 1.14 0.984375000000    1492169 1.14 0.985937500000    1492169 1.14 0.987500000000    1492169 1.14 0.989062500000    1492169 1.15 0.990625000000    1500639 1.15 0.992187500000    1500639 1.15 0.992968750000    1500639 1.15 0.993750000000    1500639 1.15 0.994531250000    1500639 1.16 0.995312500000    1504895 1.16 0.996093750000    1504895 1.16 0.996484375000    1504895 1.16 0.996875000000    1504895 1.16 0.997265625000    1504895 1.16 0.997656250000    1504895 1.17 0.998046875000    1506535 1.17 0.998242187500    1506535 1.17 0.998437500000    1506535 1.17 0.998632812500    1506535 1.17 0.998828125000    1506535 1.18 0.999023437500    1507442 1.18 0.999121093750    1507442 1.18 0.999218750000    1507442 1.18 0.999316406250    1507442 1.18 0.999414062500    1507442 1.18 0.999511718750    1507836 1.18 0.999560546875    1507836 1.18 0.999609375000    1507836 1.18 0.999658203125    1507836 1.18 0.999707031250    1507836 1.19 0.999755859375    1508028 1.19 0.999780273438    1508028 1.19 0.999804687500    1508028 1.19 0.999829101563    1508028 1.20 0.999853515625    1508106 1.20 0.999877929688    1508106 1.21 0.999890136719    1508145 1.21 0.999902343750    1508145 1.22 0.999914550781    1508163 1.22 0.999926757813    1508174 1.25 0.999938964844    1508186 1.34 0.999945068359    1508195 1.82 0.999951171875    1508204 2.42 0.999957275391    1508213 3.54 0.999963378906    1508222 4.74 0.999969482422    1508231 5.09 0.999972534180    1508236 5.98 0.999975585938    1508241 6.24 0.999978637695    1508245 7.01 0.999981689453    1508251 7.97 0.999984741211    1508254 8.26 0.999986267090    1508257 8.96 0.999987792969    1508259 9.02 0.999989318848    1508261 9.98 0.999990844727    1508265 10.24 0.999992370605    1508266 11.01 0.999993133545    1508268 11.01 0.999993896484    1508268 11.97 0.999994659424    1508269 13.95 0.999995422363    1508271 14.98 0.999996185303    1508272 14.98 0.999996566772    1508272 16.00 0.999996948242    1508273 16.00 0.999997329712    1508273 17.02 0.999997711182    1508274 18.05 0.999998092651    1508275 18.05 0.999998283386    1508275 18.05 0.999998474121    1508275 18.05 0.999998664856    1508275 19.07 0.999998855591    1508276 19.07 0.999999046326    1508276 19.07 0.999999141693    1508276 19.07 0.999999237061    1508276 19.07 0.999999332428    1508276 19.97 0.999999427795    1508277 19.97 1.000000000000    1508277  #[mean    =         1.10, stddeviation   =         0.06] #[max     =        19.84, total count    =      1508277] #[buckets =           25, subbuckets     =          256]  

Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

asp.net - Razor Page Hosted on IIS 6 Fails Every Morning -

c++ - wxwidget compiling on windows command prompt -