python - Pandas DataFrame grouped box plot from aggregated results -


i want draw box plot, don't have raw data aggregated results in pandas dataframe.

is still possible draw box plot aggregated results?

if not, closest plot can get, plot min, max, mean, median, std-dev etc. know can plot them using line chart, need boxplots grouped/clustered.

here data, plotting part missing. please help. thanks

import matplotlib.pyplot plt  import numpy np import pandas pd  df = pd.dataframe({         'group' : ['tick tick tick', 'tock tock tock', 'tock tock tock', 'tick tick tick']*3, # , ['tock tock tock', 'tick tick tick']*6,         'person':[x*5 x in list('abc')]*4,         'median':np.random.randn(12),         'stddev':np.random.randn(12)                    }) df["average"]=df["median"]*1.1 df["minimum"]=df["median"]*0.5 df["maximum"]=df["median"]*1.6 df["90%"]=df["maximum"]*0.9 df["95%"]=df["maximum"]*0.95 df["99%"]=df["maximum"]*0.99  df 

update,

i'm 1 step closer result -- have found feature available since matplotlib 1.4, , i'm using matplotlib 1.5, , tested , proved working me.

the problem have no clue why works, , how adapt above code use such new feature. i'll re-post working code below, hope can understand , put 2 , 2 together.

the data have median, average, minimum, 90%,95%, 99%, maximum , stddev, , hope chart them all. , took @ data structure of logstats of following code, after for stats, label in zip(logstats, list('abcd')), , found fields are:

[{'cihi': 4.2781254505311281,   'cilo': 1.6164348064249057,   'fliers': array([ 19.69118642,  19.01171604]),   'iqr': 5.1561885723613567,   'label': 'a',   'mean': 4.9486856766955922,   'med': 2.9472801284780168,   'q1': 1.7655440553898782,   'q3': 6.9217326277512345,   'whishi': 12.576334012545718,   'whislo': 0.24252084924003742},  {'cihi': 4.3186289184254107,   'cilo': 1.9963715983778565,   ... 

so, this

box plot

and bxp doc, i'm going map data follows:

  • whislo: minimum
  • q1: median
  • med: average
  • mean: 90%
  • q3: 95%
  • whishi: 99%
  • and maximum fliers

to map them, i'll select minimum whislo, [90%] mean, [95%] q3, [99%] whishi...here final result:

raw_data = {'label': ['label_01 init', 'label_02', 'label_03', 'label_04', 'label_05', 'label_06', 'label_07', 'label_08', 'label_99'], 'whislo': [0.17999999999999999, 2.0299999999999998, 4.0800000000000001, 2.0899999999999999, 2.3300000000000001, 2.3799999999999999, 1.97, 2.6499999999999999, 0.089999999999999997], 'q3': [0.5, 4.9699999999999998, 11.77, 5.71, 12.460000000000001, 11.859999999999999, 13.84, 16.969999999999999, 0.29999999999999999], 'mean': [0.40000000000000002, 4.1299999999999999, 10.619999999999999, 5.0999999999999996, 10.24, 9.0700000000000003, 11.960000000000001, 15.15, 0.26000000000000001], 'whishi': [1.76, 7.6399999999999997, 20.039999999999999, 6.6699999999999999, 22.460000000000001, 21.66, 16.629999999999999, 19.690000000000001, 1.1799999999999999], 'q1': [0.28000000000000003, 2.96, 7.6100000000000003, 3.46, 5.8099999999999996, 5.4400000000000004, 6.6299999999999999, 8.9900000000000002, 0.16], 'fliers': [5.5, 17.129999999999999, 32.890000000000001, 7.9100000000000001, 32.829999999999998, 70.680000000000007, 24.699999999999999, 32.240000000000002, 3.3500000000000001]} df = pd.dataframe(raw_data, columns = ['label', 'whislo', 'q1', 'mean', 'q3', 'whishi', 'fliers']) 

now challenge how present above dataframe in box plot multiple level of grouping. if multiple level of grouping difficult, let's plotting pd dataframe working first, because pd dataframe has same fields required np array. tried,

fig, ax = plt.subplots() ax.bxp(df.as_matrix(), showmeans=true, showfliers=true, vert=false) 

but got

...\anaconda3\lib\site-packages\matplotlib\axes\_axes.py in bxp(self, bxpstats, positions, widths, vert, patch_artist, shownotches, showmeans, showcaps, showbox, showfliers, boxprops, whiskerprops, flierprops, medianprops, capprops, meanprops, meanline, manage_xticks)    3601         pos, width, stats in zip(positions, widths, bxpstats):    3602             # try find new label -> 3603             datalabels.append(stats.get('label', pos))    3604             # fliers coords    3605             flier_x = np.ones(len(stats['fliers'])) * pos  attributeerror: 'numpy.ndarray' object has no attribute 'get' 

if use ax.bxp(df.to_records(), ..., i'll attributeerror: 'record' object has no attribute 'get'.

ok, got working, plotting pd dataframe, not multiple level of grouping, this:

df['fliers']='' fig, ax = plt.subplots() ax.bxp(df.to_dict('records'), showmeans=true, meanline=true, showfliers=false, vert=false) # shownotches=true,  plt.show() 

note above data missing med field, can add correct ones, or use df['med']=df['q1']*1.2 make works.

import matplotlib import matplotlib.pyplot plt  import numpy np import pandas pd  def test_bxp_with_ylabels():     np.random.seed(937)     logstats = matplotlib.cbook.boxplot_stats(         np.random.lognormal(mean=1.25, sigma=1., size=(37,4))     )     print(logstats)     stats, label in zip(logstats, list('abcd')):         stats['label'] = label      fig, ax = plt.subplots()     ax.set_xscale('log')     ax.bxp(logstats, vert=false)  test_bxp_with_ylabels() 

bxp_with_ylabels

while waiting clarification of df, related to:

dic = [{'cihi': 4.2781254505311281,         'cilo': 1.6164348064249057,         'fliers': array([ 19.69118642,  19.01171604]),         'iqr': 5.1561885723613567,         'mean': 4.9486856766955922,         'med': 2.9472801284780168,         'q1': 1.7655440553898782,         'q3': 6.9217326277512345,         'whishi': 12.576334012545718,         'whislo': 0.24252084924003742}]  

and how data should map:

from bxp doc:

  required keys are:    - ``med``: median (scalar float).       - ``q1``: first quartile (25th percentile) (scalar     float).   - ``q3``: first quartile (50th percentile) (scalar     float). # here guess it's rather : 3rd quartile (75th percentile)   - ``whislo``: lower bound of lower whisker (scalar     float).   - ``whishi``: upper bound of upper whisker (scalar     float).    optional keys are:    - ``mean``: mean (scalar float). needed if     ``showmeans=true``.   - ``fliers``: data beyond whiskers (sequence of floats).     needed if ``showfliers=true``.   - ``cilo`` & ``cihi``: lower , upper confidence intervals     median. needed if ``shownotches=true``. 

then, have do:

fig, ax = plt.subplots(1,1) ax.bxp([dic], showmeans=true) 

so need find way build dic. note not plot std , whisker, need choose whether go 90%, 95% or 99% can't have values. in case need add them afterward plt.hlines().

hth


Comments

Popular posts from this blog

java - Jmockit String final length method mocking Issue -

What is the difference between data design and data model(ERD) -

ios - Can NSManagedObject conform to NSCoding -