- Generalities
- Histograms
- Scatterplots
- Barsplots
- Piecharts
- Boxplots
- Stemplots
- Starplots
- Statistical charts in complex scenes

In general, arguments to statistical graphics are data and options in equation format (**option = value**). Options are of two types:

- specific to each stats plot
- general draw options

There are three functions associated to each graph. In the case of histograms, we have:

**histogram**: for plotting in any terminal, works in all interfaces.**wxhistogram**: for embedded graphics in wxMaxima and iMaxima interfaces.**histogram_description**: a draw object to be included in complex and multiple objects scenes.

To run these examples, you must load package **descriptive** and set some global options:

load("descriptive")$ set_draw_defaults( terminal = png, xlabel = "", ylabel = "")$

Specific options (defaults in parentheses):

**nclasses**(10): number of classes, a positive integer**frequency**(absolute):**absolute**,**relative**,**density**or**percent****htics**(auto):**auto**,**endpoints**,**intervals**, or a list of labels

We begin with an histogram of absolute frequencies. File **pidigits.data** is part of the Maxima distribution and contains the first one hundred digits of number \(\pi\).

/* read sample */ s1 : read_list (file_search ("pidigits.data")); histogram ( s1, /* specific histogram options */ nclasses = 8, /* local draw options affecting the histogram */ fill_density = 0.3, dimensions = [400,300], fill_color = red, line_width = 3) $

An histogram of percents and more draw options.

histogram ( s1, /* specific histogram options */ nclasses = 8, frequency = percent, /* local draw options affecting the histogram */ fill_density = 0.3, fill_color = green, line_width = 3, /* global draw options */ grid = true, title = "A green histogram", xlabel = "Pi digit", ylabel = "Percentages (%)", dimensions = [400, 400], background_color = cyan) $

Fixing the limits of the plot from -2 to 12. The number of classes is 3.

histogram ( s1, nclasses = [-2,12,3], fill_density = 0.2, dimensions = [400,300])$

Fixing the limits of the plot from -2 to 12. The number of classes is 10, the default.

histogram ( s1, nclasses = [-2,12], fill_density = 0.2, dimensions = [400,300])$

We can also play with tic marks on the x-axis via the **htics** option. (This is a contibution by Len Brin.)

fpprintprec: 5$ histogram ( s1, nclasses = [-2,12,5], htics = 'intervals, fill_density = 0.2, dimensions = [460,300])$

Instead of writing the intervals on the x-axis, we can write the end points of the classes.

histogram ( s1, nclasses = 6, htics = endpoints, fill_density = 0.6, dimensions = [400,300])$

Write user defined labels for the classes.

histogram ( s1, nclasses = [-2,12,6], htics = ["Cl-1","Cl-2","Cl-3","Cl-4","Cl-5","Cl-6"], fill_density = 0.2, dimensions = [400,300])$

See what happens when the number of labels is less than the number of classes.

histogram ( s1, nclasses = 6, htics = ["Cl 1","Cl 2","Cl 3"], fill_density = 0.2, dimensions = [400,300])$

Or when the list of labels is empty. See also the relative frequency scale.

histogram ( s1, nclasses = 6, frequency = relative, htics = [], fill_color = navy, fill_density = 0.2, dimensions = [400,300])$

Now we want a scene with two objects, the histogram and the Gaussian density. Since the histogram must be consider as any other draw object, we need to call **histogram_description** to build the complete scene.

/* Simulate a Gaussian sample of size 1000 */ ( load("distrib"), m: 14, /* mean */ s: 2, /* standard deviation */ s2: random_normal(m, s, 1000) ) $ /* See that the order objects are plotted is important */ draw2d( grid = true, dimensions = [400,300], histogram_description( s2, nclasses = 9, frequency = density, fill_density = 0.5), explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s) ) $

Note that **histogram_description** only returns draw directives to be used later by **draw**:

histogram_description( s2, nclasses = 9, frequency = relative, fill_density = 0.5);

[[fill_density = 0.5], xrange = [6.4658, 20.402], yrange = [- 0.013, 0.273], [], bars([7.8594, 0.006, 1.3936], [9.253, 0.021, 1.3936], [10.647, 0.075, 1.3936], [12.04, 0.174, 1.3936], [13.434, 0.26, 1.3936], [14.827, 0.239, 1.3936], [16.221, 0.148, 1.3936], [17.614, 0.066, 1.3936], [19.008, 0.011, 1.3936])]

The histogram function automatically sets the x and y ranges, ignoring options **xrange** and **yrange**:

histogram ( s1, nclasses = 8, xrange = [-5,15], yrange = [-15,40], fill_density = 0.5, dimensions = [400,300])$

If you want specific values for the axes ranges, make use of **histogram_description**:

draw2d( histogram_description ( s1, nclasses = 8, fill_density = 0.2), grid = true, xrange = [-5,15], yrange = [-15,40], dimensions = [400,300]) $

An univariate scatter plot.

load (distrib)$ scatterplot( random_normal(0,1,200), xaxis = true, point_size = 2, dimensions = [600,150])$

A bivariate scatter plot. File **wind.data** is part of the Maxima distribution and contains wind speeds from five Irish weather stations.

s2 : read_matrix (file_search ("wind.data"))$ scatterplot( submatrix(s2, 1,2,3), title = "Data from stations #4 and #5", point_type = diamant, point_size = 2, color = blue, dimensions = [400,300])$

A trivariate scatter plot.

scatterplot( submatrix (s2, 1,2), dimensions = [700,500]);

A multivariate scatter plot.

scatterplot( s2, nclasses = 5, fill_color = blue, fill_density = 0.3, color = red, point_size = 1/2, dimensions = [900,800], xtics = 5) $

Specific options (defaults in parentheses):

**box_width**(3/4): a number between zero and 1**grouping**(**clustered**): or**stacked****groups_gap**(1): distance between clusters, must be a positive integer**ars_colors**([]): list of colors for bars. If the list is shorter than the number of samples, colors are generated randomly**frequency**(**absolute**):**absolute**,**relative**or**percent****ordering**(**orderlessp**):**orderlessp**or**ordergreatp****sample_keys**([]): entries for legend**start_at**(0): starting point on the x axis

Plotting absolute frequencies. File **biomed.data** is part of the Maxima distribution and contains biomedical data, it is a multivariate sampleof six dimensions. The first variable corresponds to the group (A or B) and the second is the age of the patient. Both variables are discrete.

m : read_matrix (file_search ("biomed.data"))$ /* barsplot for ages */ barsplot( col(m,2), box_width = 1, title = "Ages", xlabel = "years", dimensions = [400,300] )$

Barsplot of two discrete variables. Note that sample spaces are different.

barsplot( col(m,1), col(m,2), box_width = 1, bars_colors = [salmon, green], title = "Patient ages and groups", xlabel = "Age - Group", fill_density = 3/4, dimensions = [400,300] ) $

Barsplots in a multiplot context. When plotting complex or multiplot scenes, we need to call **barsplot_description**, this function creates a graphic object to be later plotted by any of the **draw' functions**.

/* Remember that 'gr2d' is a scene object used by package 'draw'. We create two 2d scenes and then we build the multiplot */ bp1 : gr2d( ylabel = "Relative frequencies", color = red, barsplot_description( col(m,1), box_width = 1, bars_colors = [blue], frequency = relative))$ bp2: gr2d( ylabel = "Percentages (%)", color = blue, barsplot_description( col(m,2), box_width = 1, fill_density = 0.5, label_orientation = 'vertical, bars_colors = [red], frequency = percent))$ /* vertical multiplot */ draw( dimensions = [500, 500], bp1, bp2 ) $

Example of an horizontal multiplot.

draw( dimensions = [850, 300], columns = 2, bp1, bp2 ) $

If we want to get an idea on how are ages and groups related, we can plot a stacked barsplot:

agesA: col(subsample (m, lambda([v], v[1] = A)), 2) $ agesB: col(subsample (m, lambda([v], v[1] = B)), 2) $ barsplot( agesA, agesB, groups_gap = 3, bars_colors = [red, blue], grouping = stacked, sample_keys = ["Group A", "Group B"], title = "Groups A and B within ages", ylabel = "# of individuals", dimensions = [600, 400] )$

Comparing two samples:

l1: makelist(random(10),k,1,50)$ l2: makelist(random(10),k,1,100)$ barsplot( l1, l2, box_width = 1, bars_colors = [violet, gray], frequency = percent, sample_keys = ["1st sample", "2nd sample"], dimensions = [400,300] ) $

We now have four samples of a categorical (**Yes**, **No**, **Maybe**) variable sampled in four different cities.

r1: makelist([Yes, No, Maybe][random(3)+1],k,1,150) $ r2: makelist([Yes, No, Maybe][random(3)+1],k,1,50) $ r3: makelist([Yes, No, Maybe][random(3)+1],k,1,75) $ r4: makelist([Yes, No, Maybe][random(3)+1],k,1,200) $ barsplot( r1, r2, r3, r4, title = "Asking for something in four populations", ylabel = "# of individuals", fill_density = 0.5, dimensions = [600, 400], groups_gap = 3, frequency = relative, sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"], yrange = [0, 0.6], /* expanding the y-range upwards to get more place for the legend */ ordering = ordergreatp )$

As above, but with stacked bars.

barsplot( r1, r2, r3, r4, groups_gap = 1, fill_density = 0.5, dimensions = [600, 400], sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"], grouping = stacked, ordering = ordergreatp, bars_colors = [green, blue, yellow, brown], dimensions = [400,300], xrange = [-1,4] /* Now, we expand the x-range rightwards to get more space for the legend */)$

With option **start_at**, we can draw more than one bars diagrams on the x axis. Labels are vertically oriented. In the next plot, we want three independent barsplots in one scene. Again, we need the **barsplot_description** mechanism:

/* Sample simulation */ sample1: makelist([Yes, No][random(2)+1],k,1,20) $ sample2: makelist([Yes, No, Maybe][random(3)+1],k,1,100) $ sample31: makelist([A,B,C,D][random(4)+1],k,1,100) $ sample32: makelist([A,B,C,D][random(4)+1],k,1,100) $ draw2d( label_orientation = 'vertical, dimensions = [400,300], barsplot_description( sample1, bars_colors = [red], sample_keys = ["Are you smoker?"], start_at = 10 ), barsplot_description( sample2, sample_keys = ["Do you like pizza?"], start_at = 1 ), fill_density = 0.3, barsplot_description( sample31, sample32, bars_colors = [orange, blue], sample_keys = ["Political party (men)", "Political party (women)"], start_at = 25 ), xrange = [0, 40], yrange = [0, 50] )$

Specific options (defaults in parentheses):

**sector_colors**([]): list of colors for sectors. If the list is shorter than the number of sectors, colors are generated randomly.**pie_center**([0,0]): a pair of numbers.**pie_radius**(1): a positive number.

A simple pie chart

s1 : read_list (file_search ("pidigits.data"))$ piechart( s1, proportional_axes = xy, xrange = [-2, 4], yrange = [-1.1, 1.1], dimensions = [550,300], title = "Digit frequencies in pi" )$

User defined colors for sectors

s: makelist(random(6)+1, k, 100)$ piechart( s, sector_colors = [red,blue,yellow,green,black,orange], proportional_axes = xy, dimensions = [450,300], xrange = [-1.5, 1.5] ) $

Two pie charts with different radii. For more than one piechart, make use of **piechart_description**.

s: makelist(random(6)+1, k, 100)$ a: makelist(['yes, 'no, "do not ask me"][1+random(3)], k, 1, 75)$ draw2d( proportional_axes = xy, xrange = [-2, 3.5], dimensions = [400,300], piechart_description( s, sector_colors = [red,blue,orange,navy,gray,black]), piechart_description( a, sector_colors = [green,yellow,cyan], pie_center = [-1,1.5], pie_radius = 1/2)) $

Specific options (defaults in parentheses):

**box_width**(3/4): widths for boxes.**box_orientation**(**vertical**):**vertical**or**horizontal**.**range**(**inf**): sets the interval for outliers. See examples below.**outliers_size**(**1**): Point size for isolated outliers.

Comparing the variablility of three samples.

A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2], [8, 10, 7, 9, 12, 8, 10], [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$ boxplot( A, dimensions = [400,300]) $

As above, but horizontally oriented, together with the grid.

boxplot( A, box_orientation = horizontal, grid = true, dimensions = [400,300]) $

Boxplot generates its own tics on the axes. If we want to write our own marks, we have to use the **boxplot_description**
mechanism.

/* Setting tics on the vertical axis */ draw2d( boxplot_description(A, box_orientation=horizontal,grid=true), dimensions = [400,300], ytics = {["Sample 1", 1], ["Sample 2", 2], ["Sample 3", 3]} )$

Fine tunning our plot with draw options.

s2 : read_matrix(file_search("wind.data"))$ boxplot( s2, box_width = 0.2, background_color = light_gray, font = "Times-Roman", font_size = 17, title = "Windspeeds in knots", xlabel = "Number of Station", color = red, line_width = 2, dimensions = [400,300] )$

With option **range** it is possible to handle outliers, which are those observations outside the interval
\([q_1-r \cdot IQR, q_3+r \cdot IQR]\), where \(q_i\) is the *i*-th quartile and \(IQR=q_3-q_1\) is the interquartile range. When outliers are present, they are plotted as isolated points, and whiskers are allocated at the extremes of the rest of observations.

B: [[7, 15, 5, 8, 6, 5, 7, 3, 1], [10, 8, 12, 8, 11, 9, 20], [23, 17, 19, 7, 22, 19]] $ boxplot (B, range=1)$

Outliers in horizontal orientation,

boxplot (B, range=1.5, box_orientation = horizontal)$

Adding a polygonal line. In this case, we make use of function **boxplot_description**, which describes the boxes, and add it to **draw2d** together with other graphic objects.

draw2d( /* set the terminal */ terminal = wxt, /* boxplot */ boxplot_description( B, range = 1.5, line_width = 3, outliers_size = 2, color = red, background_color = light_gray), /* polygonal line joining the medians */ points_joined = true, /* color and line_width have being modified during the boxplot definition */ color = blue, line_width = 2, points(map(median, B)), xtics = {["Low",1],["Medium",2],["High",3]}) $

This stem-and-leaf function is a kind contribution by Len Brin. It admits option **leaf_unit**, which must be a power of ten.

/* Simulating a Gaussian sample */ load(distrib)$ sample: random_normal(15, 6, 100)$ stemplot( sample, leaf_unit = 0.1);

1|69 4|47 5|226 6|8 7|247 8|1344669 9|0678 10|56 11|256678 12|0035899 13|23444789 14|04445 15|224556 16|3458 17|112467899 18|00234 19|45 20|1234 21|46 22|127799 23|19 24|1334 25|12 26|58 28|37 key: 6|3 = 6.3

Changing the value of **leaf_unit**.

stemplot( sample, leaf_unit = 1);

0|2245556777888889999 1|000011222222223333333334444444555566666777777888888888899 2|000012223333344444557789 key: 6|3 = 63

Specific options (defaults in parentheses):

**stars_colors**([]): list of colors for bars. If the list is shorter than the number of samples, colors are generated randomly.**frequency**(**absolute**):**absolute**or**relative**.**ordering**(**orderlessp**):**orderlessp**or**ordergreatp**.**sample_keys**([]): entries for legend.**star_center**([0,0]): a pair of numbers.**star_radius**(1): a positive number.

Simulating two discrete samples with the same sample space.

l1: makelist(random(10),k,1,50)$ l2: makelist(random(10),k,1,200)$ /* Plot based on absolute frequencies. Location and radius defined by the user. */ starplot( l1, l2, stars_colors = [blue,red], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2, dimensions = [400,300] ) $

As above, but plotting relative frequencies. Default location and radius.

starplot( l1, l2, stars_colors = [blue, light_blue], frequency = relative, sample_keys = ["1st sample", "2nd sample"], proportional_axes = xy, line_width = 2, dimensions = [400,300] ) $

Using **starplot_description** in a multiplot context.

pl1: gr2d( starplot_description( l1, l2, stars_colors = [blue, light_blue], sample_keys = ["1st sample", "2nd sample"], star_center = [1,2], star_radius = 4, proportional_axes = xy, line_width = 2, title = "Absolute frequencies") ) $ pl2: gr2d( starplot_description( l1, l2, stars_colors = [blue, light_blue], frequency = relative, sample_keys = ["1st sample", "2nd sample"], proportional_axes = xy, line_width = 2, title = "Relative frequencies") ) $ draw( dimensions = [600, 400], columns = 2, pl1, pl2)$

Statistical charts described above cannot be displayed together with other graphics elements unless you make use of the following mechanism. You can describe an statistical chart without plotting it, and then use this description as any other graphics element when invoking the **draw** function:

s1 : read_list (file_search ("pidigits.data"))$ /* variable scene stores the histogram description*/ scene: histogram_description ( s1, title = "pi digits", xlabel = "digits", ylabel = "Absolute frequency", fill_color = grey, fill_density = 0.6);

\[ \left[ \left[ {\it title}=\mbox{ pi digits } , {\it xlabel}= \mbox{ digits } , {\it ylabel}=\mbox{ Absolute frequency } , {\it fill\_color}={\it grey} , \\ {\it fill\_density}=0.6 \right] , {\it xrange}=\left[ -0.45 , 9.45 \right] , {\it yrange}=\left[ - 0.65 , 13.65 \right] , \left[ \right] , \\ {\it bars}\left(\left[ 0.45 , 8.0 , 0.9 \right] , \left[ 1.35 , 8.0 , 0.9 \right] , \left[ 2.25 , 12.0 , 0.9 \right] , \\ \left[ 3.15 , 12.0 , 0.9 \right] , \left[ 4.05 , 10.0 , 0.9 \right] , \left[ 4.95 , 8.0 , 0.9 \right] , \\ \left[ 5.85 , 9.0 , 0.9 \right] , \left[ 6.75 , 8.0 , 0.9 \right] , \left[ 7.65 , 12.0 , 0.9 \right] , \left[ 8.55 , 13.0 , 0.9 \right] \right) \right] \]

/* line and histogram, in this order */ draw2d( color = blue, explicit(2*x,x,-5,20), scene) $

If we don't want the blue line behind the bars, change the order of the graphic objects. We also add the grid lines:

draw2d( scene, color = blue, grid = true, xrange = auto, xtics = auto, explicit(2*x,x,-5,20) ) $

Finally, this is the method for building multiplots with statistical charts (output in postscript format):

s3 : read_matrix (file_search ("biomed.data"))$ bp1: barsplot_description( col(s3,1), title = "Groups of patients", xlabel = "Group", ylabel = "# of individuals", fill_density = 0.5)$ bp2: barsplot_description( col(s3,2), title = "Ages", xlabel = "years", fill_density = 0.3)$ draw(gr2d(bp1), gr2d(bp2), dimensions = 100*[10, 10], terminal = eps_color)$

© 2011-2016, TecnoStats.