This week I had to create a plot using two different scales in the same graph to show the evolution of two related, but not directly comparable, variables. This operation is described in this FAQ on the matplot lib website. Nonetheless I’d like to give a small step by step example…
Consider my input data of the form date release total broken outdated .
20110110T034549Z unstable 29989 133 3 20110210T034103Z wheezy 28900 8 0 20110210T034103Z unstable 30125 209 11 20110310T060132Z wheezy 29179 8 0 20110310T060132Z unstable 30230 945 28 20110410T040442Z wheezy 29487 8 0 20110410T040442Z unstable 31142 991 12 20110510T034745Z wheezy 30247 8 0 20110510T034745Z unstable 31867 610 31 20110610T041209Z wheezy 30328 9 0 20110610T041209Z unstable 32395 328 15 20110710T030855Z wheezy 31403 9 0
I want to create one graph containing three sub graphs, each one containing data for unstable and wheezy. For the sub graph plotting the total number of packages, since the data is kinda uniform, the plot is pretty and self explanatory. The problem arise if we compare the non installable packages in unstable and wheezy, since the data from unstable will squash the plot for wheezy, making it useless.
Below I’ve added the commented python code and the resulting graph. You can get the full source of this example here.
# plot two distribution with different scales def plotmultiscale(dists,dist1,dist2,output) : fig = plt.figure() # add the main title for the figure fig.suptitle("Evalution during wheezy release cycle") # set date formatting. This is important to have dates pretty printed fig.autofmt_xdate() # we create the first sub graph, plot the two data sets and set the legend ax1 = fig.add_subplot(311,title='Total Packages vs Time') ax1.plot(dists[dist1]['date'],dists[dist1]['total'],'o-',label=dist1.capitalize()) ax1.plot(dists[dist2]['date'],dists[dist2]['total'],'s-',label=dist2.capitalize()) ax1.legend(loc='upper left') # we need explicitly to remove the labels for the x axis ax1.xaxis.set_visible(False) # we add the second sub graph and plot the first data set ax2 = fig.add_subplot(312,title='Non-Installable Packages vs Time') ax2.plot(dists[dist1]['date'],dists[dist1]['broken'],'o-',label=dist1.capitalize()) ax2.xaxis.set_visible(False) # now the fun part. The function twinx() give us access to a second plot that # overlays the graph ax2 and shares the same X axis, but not the Y axis ax22 = ax2.twinx() # we plot the second data set ax22.plot(dists[dist2]['date'],dists[dist2]['broken'],'gs-',label=dist2.capitalize()) # and we set a nice limit for our data to make it prettier ax22.set_ylim(0, 20) # we do the same for the third sub graph ax3 = fig.add_subplot(313,title='Outdated Packages vs Time') ax3.plot(dists[dist1]['date'],dists[dist1]['outdated'],'o-',label=dist1.capitalize()) ax33 = ax3.twinx() ax33.plot(dists[dist2]['date'],dists[dist2]['outdated'],'gs-',label=dist2.capitalize()) ax33.set_ylim(0, 10) # this last function is necessary to reset the date formatting with 30 deg rotation # that somehow we lost while using twinx() ... plt.setp(ax3.xaxis.get_majorticklabels(), rotation=30) # And we save the result plt.savefig(output)