Statistical graphics


Generalities

In general, arguments to statistical graphics are data and options in equation format (option = value). Options are of two types:

  1. specific to each stats plot
  2. general draw options

There are three functions associated to each graph. In the case of histograms, we have:

To run these examples, you must load package descriptive and set some global options:

load("descriptive")$

set_draw_defaults(
  terminal   = png,
  xlabel     = "",
  ylabel     = "")$

Histograms

Specific options (defaults in parentheses):

We begin with an histogram of absolute frequencies. File pidigits.data is part of the Maxima distribution and contains the first one hundred digits of number \(\pi\).

/* read sample */
s1 : read_list (file_search ("pidigits.data")); 

histogram ( 
    s1,

    /* specific histogram options */
    nclasses = 8,

    /* local draw options affecting the histogram */
    fill_density = 0.3,
    dimensions   = [400,300],
    fill_color   = red,
    line_width   = 3) $ 
Gr11

An histogram of percents and more draw options.

histogram (
    s1,

    /* specific histogram options */
    nclasses = 8,
    frequency = percent,

    /* local draw options affecting the histogram */
    fill_density = 0.3,
    fill_color   = green,
    line_width   = 3,

    /* global draw options */
    grid             = true,
    title            = "A green histogram",
    xlabel           = "Pi digit",
    ylabel           = "Percentages (%)",
    dimensions       = [400, 400],
    background_color = cyan) $
Gr12

Fixing the limits of the plot from -2 to 12. The number of classes is 3.

histogram (
    s1,
    nclasses     = [-2,12,3],
    fill_density = 0.2, 
    dimensions   = [400,300])$ 
Gr13

Fixing the limits of the plot from -2 to 12. The number of classes is 10, the default.

histogram (
  s1,
  nclasses     = [-2,12],
  fill_density = 0.2, 
  dimensions   = [400,300])$
Gr14

We can also play with tic marks on the x-axis via the htics option. (This is a contibution by Len Brin.)

fpprintprec: 5$

histogram (
  s1,
  nclasses     = [-2,12,5],
  htics        = 'intervals,
  fill_density = 0.2, 
  dimensions   = [460,300])$ 
Gr15

Instead of writing the intervals on the x-axis, we can write the end points of the classes.

histogram (
  s1,
  nclasses     = 6,
  htics        = endpoints,
  fill_density = 0.6, 
  dimensions   = [400,300])$
Gr16

Write user defined labels for the classes.

histogram (
  s1,
  nclasses     = [-2,12,6],
  htics        = ["Cl-1","Cl-2","Cl-3","Cl-4","Cl-5","Cl-6"],
  fill_density = 0.2, 
  dimensions   = [400,300])$
Gr17

See what happens when the number of labels is less than the number of classes.

histogram (
  s1,
  nclasses     = 6,
  htics        = ["Cl 1","Cl 2","Cl 3"],
  fill_density = 0.2, 
  dimensions   = [400,300])$ 
Gr18

Or when the list of labels is empty. See also the relative frequency scale.

histogram (
  s1,
  nclasses     = 6,
  frequency    = relative,
  htics        = [],
  fill_color   = navy,
  fill_density = 0.2, 
  dimensions   = [400,300])$
Gr19

Now we want a scene with two objects, the histogram and the Gaussian density. Since the histogram must be consider as any other draw object, we need to call histogram_description to build the complete scene.

/* Simulate a Gaussian sample of size 1000 */
( load("distrib"),
  m: 14, /* mean */
  s: 2,  /* standard deviation */
  s2: random_normal(m, s, 1000) ) $ 

/* See that the order objects are plotted is important */
draw2d(
    grid       = true, 
    dimensions = [400,300],
    histogram_description(
        s2,
        nclasses     = 9,
        frequency    = density,
        fill_density = 0.5),
    explicit(pdf_normal(x,m,s), x, m - 3*s, m + 3* s) ) $ 
Gr110

Note that histogram_description only returns draw directives to be used later by draw:

histogram_description(
  s2,
  nclasses     = 9,
  frequency    = relative,
  fill_density = 0.5); 
[[fill_density = 0.5], xrange = [6.4658, 20.402], 
yrange = [- 0.013, 0.273], [], bars([7.8594, 0.006, 1.3936], 
[9.253, 0.021, 1.3936], [10.647, 0.075, 1.3936], [12.04, 0.174, 1.3936], 
[13.434, 0.26, 1.3936], [14.827, 0.239, 1.3936], [16.221, 0.148, 1.3936], 
[17.614, 0.066, 1.3936], [19.008, 0.011, 1.3936])]

The histogram function automatically sets the x and y ranges, ignoring options xrange and yrange:

histogram (
  s1,
  nclasses     = 8,
  xrange       = [-5,15],
  yrange       = [-15,40],
  fill_density = 0.5, 
  dimensions   = [400,300])$ 
Gr111

If you want specific values for the axes ranges, make use of histogram_description:

draw2d(
  histogram_description (
     s1,
     nclasses     = 8,
     fill_density = 0.2),
  grid       = true,
  xrange     = [-5,15],
  yrange     = [-15,40], 
  dimensions = [400,300]) $
Gr112

Scatterplots

An univariate scatter plot.

load (distrib)$

scatterplot(
  random_normal(0,1,200),
  xaxis      = true,
  point_size = 2,
  dimensions = [600,150])$ 
Gr21

A bivariate scatter plot. File wind.data is part of the Maxima distribution and contains wind speeds from five Irish weather stations.

s2 : read_matrix (file_search ("wind.data"))$

scatterplot(
  submatrix(s2, 1,2,3),
  title      = "Data from stations #4 and #5",
  point_type = diamant,
  point_size = 2,
  color      = blue, 
  dimensions = [400,300])$
Gr22

A trivariate scatter plot.

scatterplot(
    submatrix (s2, 1,2),
    dimensions = [700,500]); 
Gr23

A multivariate scatter plot.

scatterplot(
    s2,
    nclasses     = 5,
    fill_color   = blue,
    fill_density = 0.3,
    color        = red,
    point_size   = 1/2,
    dimensions   = [900,800],
    xtics        = 5) $
Gr24

Barsplots

Specific options (defaults in parentheses):

Plotting absolute frequencies. File biomed.data is part of the Maxima distribution and contains biomedical data, it is a multivariate sampleof six dimensions. The first variable corresponds to the group (A or B) and the second is the age of the patient. Both variables are discrete.

m : read_matrix (file_search ("biomed.data"))$

/* barsplot for ages */
barsplot(
  col(m,2),

  box_width  = 1,
  title      = "Ages",
  xlabel     = "years", 
  dimensions = [400,300] )$
Gr31

Barsplot of two discrete variables. Note that sample spaces are different.

barsplot(
  col(m,1), col(m,2),

  box_width    = 1,
  bars_colors  = [salmon, green],
  title        = "Patient ages and groups",
  xlabel       = "Age - Group",
  fill_density = 3/4, 
  dimensions   = [400,300] ) $ 
Gr32

Barsplots in a multiplot context. When plotting complex or multiplot scenes, we need to call barsplot_description, this function creates a graphic object to be later plotted by any of the draw' functions.

 /* Remember that 'gr2d' is a scene object used by package 'draw'.
   We create two 2d scenes and then we build the multiplot */

bp1 :
  gr2d(
    ylabel = "Relative frequencies",
    color = red,
    barsplot_description(
        col(m,1),
        box_width = 1,
        bars_colors = [blue],
        frequency = relative))$
bp2:
  gr2d(
    ylabel = "Percentages (%)",
    color = blue,
    barsplot_description(
        col(m,2),
        box_width = 1,
        fill_density = 0.5,
        label_orientation = 'vertical,
        bars_colors = [red],
        frequency = percent))$

/* vertical multiplot */
draw(
  dimensions = [500, 500],
  bp1,
  bp2 ) $ 
Gr33

Example of an horizontal multiplot.

draw(
  dimensions = [850, 300],
  columns    = 2,
  bp1,
  bp2 ) $
Gr34

If we want to get an idea on how are ages and groups related, we can plot a stacked barsplot:

agesA: col(subsample (m, lambda([v], v[1] = A)), 2) $
agesB: col(subsample (m, lambda([v], v[1] = B)), 2) $

barsplot(
  agesA, agesB,

  groups_gap  = 3,
  bars_colors = [red, blue],
  grouping    = stacked,
  sample_keys = ["Group A", "Group B"],
  title       = "Groups A and B within ages",
  ylabel      = "# of individuals",
  dimensions  = [600, 400] )$
Gr35

Comparing two samples:

l1: makelist(random(10),k,1,50)$
l2: makelist(random(10),k,1,100)$

barsplot(
  l1, l2,

  box_width   = 1,
  bars_colors = [violet, gray],
  frequency   = percent,
  sample_keys = ["1st sample", "2nd sample"], 
  dimensions  = [400,300] ) $
Gr36

We now have four samples of a categorical (Yes, No, Maybe) variable sampled in four different cities.

r1: makelist([Yes, No, Maybe][random(3)+1],k,1,150) $
r2: makelist([Yes, No, Maybe][random(3)+1],k,1,50) $
r3: makelist([Yes, No, Maybe][random(3)+1],k,1,75) $
r4: makelist([Yes, No, Maybe][random(3)+1],k,1,200) $

barsplot(
  r1, r2, r3, r4,

  title        = "Asking for something in four populations",
  ylabel       = "# of individuals",
  fill_density = 0.5,
  dimensions   = [600, 400],
  groups_gap  = 3,
  frequency   = relative,
  sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"],
  yrange      = [0, 0.6], /* expanding the y-range upwards to
                             get more place for the legend */
  ordering    = ordergreatp )$
Gr37

As above, but with stacked bars.

barsplot(
  r1, r2, r3, r4,

  groups_gap  = 1,
  fill_density = 0.5,
  dimensions   = [600, 400],
  sample_keys = ["Betanzos", "Carballo", "Ferrol", "A Coruna"],
  grouping    = stacked,
  ordering    = ordergreatp,
  bars_colors = [green, blue, yellow, brown],
  dimensions  = [400,300],
  xrange      = [-1,4] /* Now, we expand the x-range rightwards
                          to get more space for the legend */)$
Gr38

With option start_at, we can draw more than one bars diagrams on the x axis. Labels are vertically oriented. In the next plot, we want three independent barsplots in one scene. Again, we need the barsplot_description mechanism:

/* Sample simulation */
sample1: makelist([Yes, No][random(2)+1],k,1,20) $
sample2: makelist([Yes, No, Maybe][random(3)+1],k,1,100) $
sample31: makelist([A,B,C,D][random(4)+1],k,1,100) $
sample32: makelist([A,B,C,D][random(4)+1],k,1,100) $ 

draw2d(
  label_orientation = 'vertical,
  dimensions   = [400,300],

  barsplot_description(
    sample1,
    bars_colors = [red],
    sample_keys = ["Are you smoker?"],
    start_at    = 10 ),

  barsplot_description(
    sample2,
    sample_keys = ["Do you like pizza?"],
    start_at    = 1 ),

  fill_density = 0.3,
  barsplot_description(
    sample31, sample32,
    bars_colors = [orange, blue],
    sample_keys = ["Political party (men)", "Political party (women)"],
    start_at    = 25 ),

    xrange = [0, 40],
    yrange = [0, 50] )$
Gr39

Piecharts

Specific options (defaults in parentheses):

A simple pie chart

s1 : read_list (file_search ("pidigits.data"))$

piechart(
    s1,
    proportional_axes = xy,
    xrange            = [-2, 4],
    yrange            = [-1.1, 1.1],
    dimensions        = [550,300],
    title             = "Digit frequencies in pi" )$ 
Gr41

User defined colors for sectors

s: makelist(random(6)+1, k, 100)$

piechart(
    s,
    sector_colors     = [red,blue,yellow,green,black,orange],
    proportional_axes = xy,
    dimensions        = [450,300],
    xrange            = [-1.5, 1.5] ) $ 
Gr42

Two pie charts with different radii. For more than one piechart, make use of piechart_description.

s: makelist(random(6)+1, k, 100)$
a: makelist(['yes, 'no, "do not ask me"][1+random(3)], k, 1, 75)$

draw2d(
    proportional_axes = xy,
    xrange            = [-2, 3.5],
    dimensions        = [400,300],
    piechart_description(
        s,
        sector_colors = [red,blue,orange,navy,gray,black]),
    piechart_description(
        a,
        sector_colors = [green,yellow,cyan],
        pie_center    = [-1,1.5],
        pie_radius    = 1/2)) $
Gr43

Boxplots

Specific options (defaults in parentheses):

Comparing the variablility of three samples.

A : [[6, 4, 6, 2, 4, 8, 6, 4, 6, 4, 3, 2],
     [8, 10, 7, 9, 12, 8, 10],
     [16, 13, 17, 12, 11, 18, 13, 18, 14, 12]]$

boxplot(
    A, 
    dimensions   = [400,300]) $ 
Gr51

As above, but horizontally oriented, together with the grid.

boxplot(
  A,
  box_orientation = horizontal,
  grid            = true, 
  dimensions      = [400,300]) $ 
Gr52

Boxplot generates its own tics on the axes. If we want to write our own marks, we have to use the boxplot_description mechanism.

/* Setting tics on the vertical axis */

draw2d(
  boxplot_description(A, box_orientation=horizontal,grid=true),
  dimensions = [400,300],
  ytics = {["Sample 1", 1], ["Sample 2", 2], ["Sample 3", 3]} )$
Gr53

Fine tunning our plot with draw options.

s2 : read_matrix(file_search("wind.data"))$

boxplot(
  s2,
  box_width = 0.2,

  background_color = light_gray,
  font             = "Times-Roman",
  font_size        = 17,
  title            = "Windspeeds in knots",
  xlabel           = "Number of Station",
  color            = red,
  line_width       = 2,
  dimensions       = [400,300] )$ 
Gr54

With option range it is possible to handle outliers, which are those observations outside the interval \([q_1-r \cdot IQR, q_3+r \cdot IQR]\), where \(q_i\) is the i-th quartile and \(IQR=q_3-q_1\) is the interquartile range. When outliers are present, they are plotted as isolated points, and whiskers are allocated at the extremes of the rest of observations.

B: [[7, 15, 5, 8, 6, 5, 7, 3, 1],
    [10, 8, 12, 8, 11, 9, 20],
    [23, 17, 19, 7, 22, 19]] $

boxplot (B, range=1)$
bp1

Outliers in horizontal orientation,

boxplot (B, range=1.5, box_orientation = horizontal)$
bp2

Adding a polygonal line. In this case, we make use of function boxplot_description, which describes the boxes, and add it to draw2d together with other graphic objects.

draw2d(
        /* set the terminal */
        terminal = wxt,

        /* boxplot */
        boxplot_description(
          B,
          range            = 1.5,
          line_width       = 3,
          outliers_size    = 2,
          color            = red,
          background_color = light_gray),

        /* polygonal line joining the medians */
        points_joined = true,
        /* color and line_width have being modified
           during the boxplot definition */
        color         = blue,
        line_width    = 2,
        points(map(median, B)),
        xtics = {["Low",1],["Medium",2],["High",3]}) $
bp3

Stemplots

This stem-and-leaf function is a kind contribution by Len Brin. It admits option leaf_unit, which must be a power of ten.

/* Simulating a Gaussian sample */

load(distrib)$
sample: random_normal(15, 6, 100)$

stemplot(
  sample,
  leaf_unit = 0.1); 
 1|69
 4|47
 5|226
 6|8
 7|247
 8|1344669
 9|0678
10|56
11|256678
12|0035899
13|23444789
14|04445
15|224556
16|3458
17|112467899
18|00234
19|45
20|1234
21|46
22|127799
23|19
24|1334
25|12
26|58
28|37
key: 6|3 =  6.3 

Changing the value of leaf_unit.

stemplot(
  sample,
  leaf_unit = 1); 
0|2245556777888889999
1|000011222222223333333334444444555566666777777888888888899
2|000012223333344444557789
key: 6|3 =  63

Starplots

Specific options (defaults in parentheses):

Simulating two discrete samples with the same sample space.

l1: makelist(random(10),k,1,50)$
l2: makelist(random(10),k,1,200)$

/* Plot based on absolute frequencies.
   Location and radius defined by the user. */
starplot(
  l1, l2,
  stars_colors = [blue,red],
  sample_keys  = ["1st sample", "2nd sample"],
  star_center  = [1,2],
  star_radius  = 4,

  proportional_axes = xy,
  line_width        = 2, 
  dimensions        = [400,300] ) $ 
Gr81

As above, but plotting relative frequencies. Default location and radius.

starplot(
  l1, l2,
  stars_colors = [blue, light_blue],
  frequency    = relative,
  sample_keys  = ["1st sample", "2nd sample"],

  proportional_axes = xy,
  line_width        = 2, 
  dimensions        = [400,300] ) $
Gr82

Using starplot_description in a multiplot context.

pl1:
  gr2d(
    starplot_description(
      l1, l2,
      stars_colors      = [blue, light_blue],
      sample_keys       = ["1st sample", "2nd sample"],
      star_center       = [1,2],
      star_radius       = 4,
      proportional_axes = xy,
      line_width        = 2,
      title             = "Absolute frequencies") ) $

pl2:
  gr2d(
    starplot_description(
      l1, l2,
      stars_colors      = [blue, light_blue],
      frequency         = relative,
      sample_keys       = ["1st sample", "2nd sample"],
      proportional_axes = xy,
      line_width        = 2,
      title             = "Relative frequencies") ) $

draw(
  dimensions = [600, 400],
  columns    = 2,
  pl1, pl2)$
Gr83

Statistical charts in complex scenes

Statistical charts described above cannot be displayed together with other graphics elements unless you make use of the following mechanism. You can describe an statistical chart without plotting it, and then use this description as any other graphics element when invoking the draw function:

s1 : read_list (file_search ("pidigits.data"))$

/* variable scene stores the histogram description*/
scene: histogram_description (
         s1,
         title        = "pi digits",
         xlabel       = "digits",
         ylabel       = "Absolute frequency",
         fill_color   = grey,
         fill_density = 0.6);

\[ \left[ \left[ {\it title}=\mbox{ pi digits } , {\it xlabel}= \mbox{ digits } , {\it ylabel}=\mbox{ Absolute frequency } , {\it fill\_color}={\it grey} , \\ {\it fill\_density}=0.6 \right] , {\it xrange}=\left[ -0.45 , 9.45 \right] , {\it yrange}=\left[ - 0.65 , 13.65 \right] , \left[ \right] , \\ {\it bars}\left(\left[ 0.45 , 8.0 , 0.9 \right] , \left[ 1.35 , 8.0 , 0.9 \right] , \left[ 2.25 , 12.0 , 0.9 \right] , \\ \left[ 3.15 , 12.0 , 0.9 \right] , \left[ 4.05 , 10.0 , 0.9 \right] , \left[ 4.95 , 8.0 , 0.9 \right] , \\ \left[ 5.85 , 9.0 , 0.9 \right] , \left[ 6.75 , 8.0 , 0.9 \right] , \left[ 7.65 , 12.0 , 0.9 \right] , \left[ 8.55 , 13.0 , 0.9 \right] \right) \right] \]

/* line and histogram, in this order */
draw2d(
  color = blue,
  explicit(2*x,x,-5,20),
  scene) $
gr44

If we don't want the blue line behind the bars, change the order of the graphic objects. We also add the grid lines:

draw2d(
  scene,
  color  = blue,
  grid   = true,
  xrange = auto,
  xtics  = auto,
  explicit(2*x,x,-5,20) ) $
gr45

Finally, this is the method for building multiplots with statistical charts (output in postscript format):

s3 : read_matrix (file_search ("biomed.data"))$
bp1: barsplot_description(
       col(s3,1),
       title        = "Groups of patients",
       xlabel       = "Group",
       ylabel       = "# of individuals",
       fill_density = 0.5)$
bp2: barsplot_description(
       col(s3,2),
       title        = "Ages",
       xlabel       = "years",
       fill_density = 0.3)$
draw(gr2d(bp1),
     gr2d(bp2),
     dimensions = 100*[10, 10],
     terminal   = eps_color)$
gr46

© 2011-2016, TecnoStats.