Data Visualization

Purpose

Exploration

Testing hypotheses is a critical part of the scientific process.

But…you need to understand your data first.

What are some relationships, how are variables distributed, etc.

Visualizing your data can go a long way to helping you explore your data.

Preattentive Processing

We pick up on visual cues much quicker than reading (i.e., what catches our eye).

That is why entire disciplines are devoted to visual processing.

Some things hit us pretty quickly:

color - hue & intensity

shape

space/density

line width & orientation

size

Preattentive Processing In Action

Correlation Matrix

	mpg	cyl	disp	hp	drat	wt	qsec	vs
mpg	1.00	-0.85	-0.85	-0.78	0.68	-0.87	0.42	0.66
cyl	-0.85	1.00	0.90	0.83	-0.70	0.78	-0.59	-0.81
disp	-0.85	0.90	1.00	0.79	-0.71	0.89	-0.43	-0.71
hp	-0.78	0.83	0.79	1.00	-0.45	0.66	-0.71	-0.72
drat	0.68	-0.70	-0.71	-0.45	1.00	-0.71	0.09	0.44
wt	-0.87	0.78	0.89	0.66	-0.71	1.00	-0.17	-0.55
qsec	0.42	-0.59	-0.43	-0.71	0.09	-0.17	1.00	0.74
vs	0.66	-0.81	-0.71	-0.72	0.44	-0.55	0.74	1.00

Corrplot

Heat Map

Inference

Currently, most journals "need" to see test statistics to know if there is a relationship.

The golden p-value is the driver of science!

Not everything needs a test statistic to tell you that a real relationship exists (or not).

Interocular Traumatic Impact

Maybe A Better Fit

Principles And Properties

What Was And What Should Never Be

Color

Color is generally used for the following:

Groups
Intensity

Only use color when it is actually doing something to aid interpretation.

Color - Groups

Color - Intensity

Color - Brewer Palettes

Color - Tol Palettes

Color - Few Palettes

Size

Much like color, size can convey very distinct ideas:

Value size
Frequency

You need to be very explicit to which you are referring!

Size

Shape

Shape is a bit more limited than color and size.

It really only helps to denote group membership.
And only does well when things are sparse.

Shape - Linetypes

Where We Have Trouble…

Shape - Points

Alpha

No, we are not talking about Cronbach.

We probably all know about the RGB color model.

But…did you know that it can have a fourth channel?

Alpha!

Alpha is essentially transparency.

- 0 is completely translucent

How Does Alpha Help

We are frequently fortunate enough to have a considerable amount of data.

Depending on what type of plotting we want to do, we could run into problems very quickly.

Alpha

Jitter

If you have ever held a jumping bean or drank too much coffee, you might already know about jitter.

Jitter is useful if we have some minor overplotting issues.

It would not be a good fix for what we just saw.

Looks Pretty Clear…

After Jittering

An Alternative

Other Ideas

Small Multiples/Faceting

We often want to compare a plot over several different levels.

We have already seen ways to use color and other items to make groups distinct.

If you want to be daring, though, there are other methods.

Small Multiples/Faceting

Multiple Relationships – Very Powerful

Combining Features

We just saw a glimpse about how we can use multiple principles to convey information.

We had a two-variable facet, size, and density estimates.

Combining Features

Other Combinations

Tips (And Maybe Tricks)

Grid Lines

The default is always some version of "No".

When To Use Grid Lines

Use for:

Large plots
Make differences clear
Comparing values on categorical scales
Narrowing focus

When you do use them, they should be very faint.

Large Plots

Make Differences Clearer

Comparing Categorical Scale

Narrowing Focus

Interactivity

Given the confines of publishing, we cannot always default to interactive visualization.

There are journals and outlets making progress on this front!
Journal of Computational and Graphical Statistics anyone?

However, it is a great place to start when exploring data.

It is also great for disseminating information on the web.

Making Old Trouble Disappear

Grid lines…who needs them?

We All Remember This Monster

The Clutter Is Strong With This One

Still A Better Version

Important People

Mike Bostok – D3

Cynthia Brewer – Color Brewer

William Cleveland – Pioneer (Citation Champ)

Stephen Few – All Things Viz

Michael Friendly – Psychologist and Viz Expert

Paul Tol – Physicist and Viz Expert

Edward Tufte – Information Display

Hadley Wickham – Reigning Data Science Heavyweight Champion

Leland Wilkinson – The Grammar of Graphics

Purpose

Exploration

Preattentive Processing

Preattentive Processing In Action

Correlation Matrix

Corrplot

Heat Map

Inference

Interocular Traumatic Impact

Maybe A Better Fit

Principles And Properties

What Was And What Should Never Be

Color

Color - Groups

Color - Intensity

Color - Brewer Palettes

Color - Tol Palettes

Color - Few Palettes

Size

Size

Shape

Shape - Linetypes

Where We Have Trouble…

Shape - Points

Alpha

How Does Alpha Help

Alpha

Jitter

Looks Pretty Clear…

After Jittering

An Alternative

Other Ideas

Small Multiples/Faceting

Small Multiples/Faceting

Multiple Relationships – Very Powerful

Combining Features

Combining Features

Other Combinations

Tips (And Maybe Tricks)

Grid Lines

When To Use Grid Lines

Large Plots

Make Differences Clearer

Comparing Categorical Scale

Narrowing Focus

Interactivity

Making Old Trouble Disappear

We All Remember This Monster

The Clutter Is Strong With This One

Still A Better Version

Important People

Shiny Demo

Thanks!