In general, arguments to statistical graphics are data and options in equation format (option = value). Options are of two types:
There are three functions associated to each graph. In the case of histograms, we have:
To run these examples, you must load package descriptive and set some global options:
load("descriptive")$ set_draw_defaults( terminal = png, xlabel = "", ylabel = "")$
Specific options (defaults in parentheses):
We begin with an histogram of absolute frequencies. File pidigits.data is part of the Maxima distribution and contains the first one hundred digits of number \(\pi\).
/* read sample */ s1 : read_list (file_search ("pidigits.data")); histogram ( s1, /* specific histogram options */ nclasses = 8, /* local draw options affecting the histogram */ fill_density = 0.3, dimensions = [400,300], fill_color = red, line_width = 3) $
An histogram of percents and more draw options.
histogram ( s1, /* specific histogram options */ nclasses = 8, frequency = percent, /* local draw options affecting the histogram */ fill_density = 0.3, fill_color = green, line_width = 3, /* global draw options */ grid = true, title = "A green histogram", xlabel = "Pi digit", ylabel = "Percentages (%)", dimensions = [400, 400], background_color = cyan) $
Fixing the limits of the plot from -2 to 12. The number of classes is 3.
histogram ( s1, nclasses = [-2,12,3], fill_density = 0.2, dimensions = [400,300])$
Fixing the limits of the plot from -2 to 12. The number of classes is 10, the default.
histogram ( s1, nclasses = [-2,12], fill_density = 0.2, dimensions = [400,300])$
We can also play with tic marks on the x-axis via the htics option. (This is a contibution by Len Brin.)
fpprintprec: 5$ histogram ( s1, nclasses = [-2,12,5], htics = 'intervals, fill_density = 0.2, dimensions = [460,300])$
Instead of writing the intervals on the x-axis, we can write the end points of the classes.
histogram ( s1, nclasses = 6, htics = endpoints, fill_density = 0.6, dimensions = [400,300])$
Write user defined labels for the classes.
histogram ( s1, nclasses = [-2,12,6], htics = ["Cl-1","Cl-2","Cl-3","Cl-4","Cl-5","Cl-6"], fill_density = 0.2, dimensions = [400,300])$
See what happens when the number of labels is less than the number of classes.
histogram ( s1, nclasses = 6, htics = ["Cl 1","Cl 2","Cl 3"], fill_density = 0.2, dimensions = [400,300])$
Or when the list of labels is empty. See also the relative frequency scale.
histogram ( s1, nclasses = 6, frequency = relative, htics = [], fill_color = navy, fill_density = 0.2, dimensions = [400,300])$
Now we want a scene with two objects, the histogram and the Gaussian density. Since the histogram must be consider as any other draw object, we need to call histogram_description to build the complete scene.
/* Simulate a Gaussian sample of size 1000 */ ( load("distrib"), m: 14, /* mean */ s: 2, /* standard deviation */ s2: random_normal(m, s, 1000) ) $ /* See that the order objects are plotted is important */ draw2d( grid = true, dimensions = [400,300], histogram_description( s2, nclasses = 9, frequency = density, fill_density = 0.5), explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s) ) $
Note that histogram_description only returns draw directives to be used later by draw:
histogram_description( s2, nclasses = 9, frequency = relative, fill_density = 0.5);
[[fill_density = 0.5], xrange = [6.4658, 20.402], yrange = [- 0.013, 0.273], [], bars([7.8594, 0.006, 1.3936], [9.253, 0.021, 1.3936], [10.647, 0.075, 1.3936], [12.04, 0.174, 1.3936], [13.434, 0.26, 1.3936], [14.827, 0.239, 1.3936], [16.221, 0.148, 1.3936], [17.614, 0.066, 1.3936], [19.008, 0.011, 1.3936])]
The histogram function automatically sets the x and y ranges, ignoring options xrange and yrange:
histogram ( s1, nclasses = 8, xrange = [-5,15], yrange = [-15,40], fill_density = 0.5, dimensions = [400,300])$
If you want specific values for the axes ranges, make use of histogram_description:
draw2d( histogram_description ( s1, nclasses = 8, fill_density = 0.2), grid = true, xrange = [-5,15], yrange = [-15,40], dimensions = [400,300]) $
An univariate scatter plot.
load (distrib)$ scatterplot( random_normal(0,1,200), xaxis = true, point_size = 2, dimensions = [600,150])$
A bivariate scatter plot. File wind.data is part of the Maxima distribution and contains wind speeds from five Irish weather stations.
s2 : read_matrix (file_search ("wind.data"))$ scatterplot( submatrix(s2, 1,2,3), title = "Data from stations #4 and #5", point_type = diamant, point_size = 2, color = blue, dimensions = [400,300])$
A trivariate scatter plot.
scatterplot( submatrix (s2, 1,2), dimensions = [700,500]);
A multivariate scatter plot.
scatterplot( s2, nclasses = 5, fill_color = blue, fill_density = 0.3, color = red, point_size = 1/2, dimensions = [900,800], xtics = 5) $
Specific options (defaults in parentheses):
Plotting absolute frequencies. File biomed.data is part of the Maxima distribution and contains biomedical data, it is a multivariate sampleof six dimensions. The first variable corresponds to the group (A or B) and the second is the age of the patient. Both variables are discrete.
m : read_matrix (file_search ("biomed.data"))$ /* barsplot for ages */ barsplot( col(m,2), box_width = 1, title = "Ages", xlabel = "years", dimensions = [400,300] )$
Barsplot of two discrete variables. Note that sample spaces are different.
barsplot( col(m,1), col(m,2), box_width = 1, bars_colors = [salmon, green], title = "Patient ages and groups", xlabel = "Age - Group", fill_density = 3/4, dimensions = [400,300] ) $
Barsplots in a multiplot context. When plotting complex or multiplot scenes, we need to call barsplot_description, this function creates a graphic object to be later plotted by any of the draw' functions.
/* Remember that 'gr2d' is a scene object used by package 'draw'. We create two 2d scenes and then we build the multiplot */ bp1 : gr2d( ylabel = "Relative frequencies", color = red, barsplot_description( col(m,1), box_width = 1, bars_colors = [blue], frequency = relative))$ bp2: gr2d( ylabel = "Percentages (%)", color = blue, barsplot_description( col(m,2), box_width = 1, fill_density = 0.5, label_orientation = 'vertical, bars_colors = [red], frequency = percent))$ /* vertical multiplot */ draw( dimensions = [500, 500], bp1, bp2 ) $
Example of an horizontal multiplot.
draw( dimensions = [850, 300], columns = 2, bp1, bp2 ) $
If we want to get an idea on how are ages and groups related, we can plot a stacked barsplot:
agesA: col(subsample (m, lambda([v], v[1] = A)), 2) $ agesB: col(subsample (m, lambda([v], v[1] = B)), 2) $ barsplot( agesA, agesB, groups_gap = 3, bars_colors = [red, blue], grouping = stacked, sample_keys = ["Group A", "Group B"], title = "Groups A and B within ages", ylabel = "# of individuals", dimensions = [600, 400] )$
Comparing two samples:
l1: makelist(random(10),k,1,50)$ l2: makelist(random(10),k,1,100)$ barsplot( l1, l2, box_width = 1, bars_colors = [violet, gray], frequency = percent, sample_keys = ["1st sample", "2nd sample"], dimensions = [400,300] ) $
We now have four samples of a categorical (Yes, No, Maybe) variable sampled in four different cities.
r1: makelist([Yes, No, Maybe][random(3)+1],k,1,150) $ r2: makelist([Yes, No, Maybe][random(3)+1],k,1,50) $ r3: makelist([Yes, No, Maybe][random(3)+1],k,1,75) $ r4: makelist([Yes, No, Maybe][random(3)+1],k,1,200) $ barsplot( r1, r2, r3, r4, title = "Asking for something in four populations", ylabel = "# of individuals", fill_density = 0.5, dimensions = [600, 400], groups_gap = 3, frequency = relative, sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"], yrange = [0, 0.6], /* expanding the y-range upwards to get more place for the legend */ ordering = ordergreatp )$
As above, but with stacked bars.
barsplot( r1, r2, r3, r4, groups_gap = 1, fill_density = 0.5, dimensions = [600, 400], sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"], grouping = stacked, ordering = ordergreatp, bars_colors = [green, blue, yellow, brown], dimensions = [400,300], xrange = [-1,4] /* Now, we expand the x-range rightwards to get more space for the legend */)$
With option start_at, we can draw more than one bars diagrams on the x axis. Labels are vertically oriented. In the next plot, we want three independent barsplots in one scene. Again, we need the barsplot_description mechanism:
/* Sample simulation */ sample1: makelist([Yes, No][random(2)+1],k,1,20) $ sample2: makelist([Yes, No, Maybe][random(3)+1],k,1,100) $ sample31: makelist([A,B,C,D][random(4)+1],k,1,100) $ sample32: makelist([A,B,C,D][random(4)+1],k,1,100) $ draw2d( label_orientation = 'vertical, dimensions = [400,300], barsplot_description( sample1, bars_colors = [red], sample_keys = ["Are you smoker?"], start_at = 10 ), barsplot_description( sample2, sample_keys = ["Do you like pizza?"], start_at = 1 ), fill_density = 0.3, barsplot_description( sample31, sample32, bars_colors = [orange, blue], sample_keys = ["Political party (men)", "Political party (women)"], start_at = 25 ), xrange = [0, 40], yrange = [0, 50] )$
Specific options (defaults in parentheses):
A simple pie chart
s1 : read_list (file_search ("pidigits.data"))$ piechart( s1, proportional_axes = xy, xrange = [-2, 4], yrange = [-1.1, 1.1], dimensions = [550,300], title = "Digit frequencies in pi" )$
User defined colors for sectors
s: makelist(random(6)+1, k, 100)$ piechart( s, sector_colors = [red,blue,yellow,green,black,orange], proportional_axes = xy, dimensions = [450,300], xrange = [-1.5, 1.5] ) $
Two pie charts with different radii. For more than one piechart, make use of piechart_description.
s: makelist(random(6)+1, k, 100)$ a: makelist(['yes, 'no, "do not ask me"][1+random(3)], k, 1, 75)$ draw2d( proportional_axes = xy, xrange = [-2, 3.5], dimensions = [400,300], piechart_description( s, sector_colors = [red,blue,orange,navy,gray,black]), piechart_description( a, sector_colors = [green,yellow,cyan], pie_center = [-1,1.5], pie_radius = 1/2)) $
Specific options (defaults in parentheses):
Comparing the variablility of three samples.
A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], [8, 10, 7, 9, 12, 8, 10], [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$ boxplot( A, dimensions = [400,300]) $
As above, but horizontally oriented, together with the grid.
boxplot( A, box_orientation = horizontal, grid = true, dimensions = [400,300]) $
Boxplot generates its own tics on the axes. If we want to write our own marks, we have to use the boxplot_description mechanism.
/* Setting tics on the vertical axis */ draw2d( boxplot_description(A, box_orientation=horizontal,grid=true), dimensions = [400,300], ytics = {["Sample 1", 1], ["Sample 2", 2], ["Sample 3", 3]} )$
Fine tunning our plot with draw options.
s2 : read_matrix(file_search("wind.data"))$ boxplot( s2, box_width = 0.2, background_color = light_gray, font = "Times-Roman", font_size = 17, title = "Windspeeds in knots", xlabel = "Number of Station", color = red, line_width = 2, dimensions = [400,300] )$
With option range it is possible to handle outliers, which are those observations outside the interval \([q_1-r \cdot IQR, q_3+r \cdot IQR]\), where \(q_i\) is the i-th quartile and \(IQR=q_3-q_1\) is the interquartile range. When outliers are present, they are plotted as isolated points, and whiskers are allocated at the extremes of the rest of observations.
B: [[7, 15, 5, 8, 6, 5, 7, 3, 1], [10, 8, 12, 8, 11, 9, 20], [23, 17, 19, 7, 22, 19]] $ boxplot (B, range=1)$
Outliers in horizontal orientation,
boxplot (B, range=1.5, box_orientation = horizontal)$
Adding a polygonal line. In this case, we make use of function boxplot_description, which describes the boxes, and add it to draw2d together with other graphic objects.
draw2d( /* set the terminal */ terminal = wxt, /* boxplot */ boxplot_description( B, range = 1.5, line_width = 3, outliers_size = 2, color = red, background_color = light_gray), /* polygonal line joining the medians */ points_joined = true, /* color and line_width have being modified during the boxplot definition */ color = blue, line_width = 2, points(map(median, B)), xtics = {["Low",1],["Medium",2],["High",3]}) $
This stem-and-leaf function is a kind contribution by Len Brin. It admits option leaf_unit, which must be a power of ten.
/* Simulating a Gaussian sample */ load(distrib)$ sample: random_normal(15, 6, 100)$ stemplot( sample, leaf_unit = 0.1);
1|69 4|47 5|226 6|8 7|247 8|1344669 9|0678 10|56 11|256678 12|0035899 13|23444789 14|04445 15|224556 16|3458 17|112467899 18|00234 19|45 20|1234 21|46 22|127799 23|19 24|1334 25|12 26|58 28|37 key: 6|3 = 6.3
Changing the value of leaf_unit.
stemplot( sample, leaf_unit = 1);
0|2245556777888889999 1|000011222222223333333334444444555566666777777888888888899 2|000012223333344444557789 key: 6|3 = 63
Specific options (defaults in parentheses):
Simulating two discrete samples with the same sample space.
l1: makelist(random(10),k,1,50)$ l2: makelist(random(10),k,1,200)$ /* Plot based on absolute frequencies. Location and radius defined by the user. */ starplot( l1, l2, stars_colors = [blue,red], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2, dimensions = [400,300] ) $
As above, but plotting relative frequencies. Default location and radius.
starplot( l1, l2, stars_colors = [blue, light_blue], frequency = relative, sample_keys = ["1st sample", "2nd sample"], proportional_axes = xy, line_width = 2, dimensions = [400,300] ) $
Using starplot_description in a multiplot context.
pl1: gr2d( starplot_description( l1, l2, stars_colors = [blue, light_blue], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2, title = "Absolute frequencies") ) $ pl2: gr2d( starplot_description( l1, l2, stars_colors = [blue, light_blue], frequency = relative, sample_keys = ["1st sample", "2nd sample"], proportional_axes = xy, line_width = 2, title = "Relative frequencies") ) $ draw( dimensions = [600, 400], columns = 2, pl1, pl2)$
Statistical charts described above cannot be displayed together with other graphics elements unless you make use of the following mechanism. You can describe an statistical chart without plotting it, and then use this description as any other graphics element when invoking the draw function:
s1 : read_list (file_search ("pidigits.data"))$ /* variable scene stores the histogram description*/ scene: histogram_description ( s1, title = "pi digits", xlabel = "digits", ylabel = "Absolute frequency", fill_color = grey, fill_density = 0.6);
\[ \left[ \left[ {\it title}=\mbox{ pi digits } , {\it xlabel}= \mbox{ digits } , {\it ylabel}=\mbox{ Absolute frequency } , {\it fill\_color}={\it grey} , \\ {\it fill\_density}=0.6 \right] , {\it xrange}=\left[ -0.45 , 9.45 \right] , {\it yrange}=\left[ - 0.65 , 13.65 \right] , \left[ \right] , \\ {\it bars}\left(\left[ 0.45 , 8.0 , 0.9 \right] , \left[ 1.35 , 8.0 , 0.9 \right] , \left[ 2.25 , 12.0 , 0.9 \right] , \\ \left[ 3.15 , 12.0 , 0.9 \right] , \left[ 4.05 , 10.0 , 0.9 \right] , \left[ 4.95 , 8.0 , 0.9 \right] , \\ \left[ 5.85 , 9.0 , 0.9 \right] , \left[ 6.75 , 8.0 , 0.9 \right] , \left[ 7.65 , 12.0 , 0.9 \right] , \left[ 8.55 , 13.0 , 0.9 \right] \right) \right] \]
/* line and histogram, in this order */ draw2d( color = blue, explicit(2*x,x,-5,20), scene) $
If we don't want the blue line behind the bars, change the order of the graphic objects. We also add the grid lines:
draw2d( scene, color = blue, grid = true, xrange = auto, xtics = auto, explicit(2*x,x,-5,20) ) $
Finally, this is the method for building multiplots with statistical charts (output in postscript format):
s3 : read_matrix (file_search ("biomed.data"))$ bp1: barsplot_description( col(s3,1), title = "Groups of patients", xlabel = "Group", ylabel = "# of individuals", fill_density = 0.5)$ bp2: barsplot_description( col(s3,2), title = "Ages", xlabel = "years", fill_density = 0.3)$ draw(gr2d(bp1), gr2d(bp2), dimensions = 100*[10, 10], terminal = eps_color)$
© 2011-2016, TecnoStats.