Statistics

Simple statistics information for radiation research

This section will offer simple information about the types of data, the application of statistical tests and the reporting of data in radiation research. This is not a comprehensive guide, but will provide a useful introductory resource to help you understand your data and what to do with it. We always recommend seeking the advice of a qualified statistician for conducting research as they can help refine your research question, understand the data you will collect, determine any assumptions made in the statistical tests you use and ensure the reporting of results and conclusions are sound.

Identifying types of data

Nominal Scale

These are a categorical data which have no order, for example, blood type. This can be dichotomous data such as gender.

Ordinal Scale

These are categorical data which have a rank, for example, a response to a survey on a Likert scale of ‘Completely Disagree’ to ‘Completely Agree’ can be given a ranking of 1-5. This can be dichotomous data such as sick vs. healthy.

Interval Scale

These data have values which are uniform and meaningful, i.e. the interval between each value is equally split. Temperature in Celsius is a common example as the interval between each degree is uniform, however, the zero point is an arbitrary value.

Ratio Scale

These data have values which are uniform, meaningful and have a meaningful zero point. Kelvin is a temperature scale which has a meaningful zero point. Other measures on the ratio scale include length, mass and duration, where is meaningful ratios can be defined, e.g. 10 cm is twice as long as 5 cm.

A useful resource for types of data and reporting central tendency/statistical dispersion can be found here.

It is important to be aware that the first two scales include data which are discrete or categorical data, where values will fall into specific categories, for example, heads or tails when flipping a coin – if you flip a coin you can either get 1 head or 1 tail, not 0.5 of either. The same can be said with toxicity grades, which are ordinal scale – the toxicity must be scored to fall within a score on the grading system, e.g. grade 2.

The second two scales are continuous data, where a value can fall anywhere on the scale. So a temperature can be measured as anything right down to -273.15^oC (the lower bound of the Celsius scale, or absolute zero on the Kelvin scale). We can round data and it can still be considered as continuous, e.g. rounding to the nearest kg. However, we need to be careful when rounding that we may end up with data that is more appropriately treated as ordinal, e.g. rounding patient weights to the nearest 10 kg or grouping patients by age (40-49, 50-59, etc).

Other useful approaches to classifying data can be found here:

Measurement Scales and Data Types (www.statsdirect.com)

Australian Bureau of Statistics.

Determining the distribution of your data

If you have discrete or continuous data it is useful to plot a histogram to determine the distribution of your data. If your data can satisfy the assumptions of a parametric distribution, then a range of parametric tests are available for you to analyse your data. These tests offer more statistical power than non-parametric tests but only can be used if the assumptions are correctly met, otherwise the results may be misleading.

Non-parametric distributions are data distributions which cannot be described by a mathematical equation. If your data has a non-parametric distribution, then a non-parametric test which is comparable to a relevant parametric test will be available, but with less assumptions made about the data.

The most common parametric distribution used in statistics is the normal distribution, and is probably the one you’re most familiar with. If your data satisfies the assumptions of a normal distribution, then much of the nature of your data is easily determined and a greater range of tests are available to use. Common ways to assess if your data comes from a normal distribution are using Q-Q plots or the Shapiro-Wilk test.

A good summary of distributions and some discussion about bounded data can be found here.

Flowchart for Deciding Which Statistical Test to Use (Gerwien, A Painless Guide to Statistics, Bates College, USA)

Other useful tests for radiation research

A common question in radiation research is quantifying the inter-observer variation in recording/reporting measures utilising a piece of equipment. These will require an inter-rater reliability test, which can use the Pearson’s r or Spearman’s test as listed in the flowchart above, or can use more specific tests such as the Kappa tests, intra-class correlation coefficient or Bland-Altman methods. (Department of Health Sciences, University of York)

Descriptive statistics

Descriptive statistics are basic measures that summarise or represent a large group of data. Tutorials that outline mean, median, mode, range, standard deviation and variance and how to present your data in a box- whisker plot can be viewed at the Khan Academy.

Other useful resources

The Wikibook of Statistics

Discovering Statistics using IBM SPSS Statistics -A Field, Sage Publications

R Commander: An Introduction – N Karp, The Comprehensive R Archive Network

R with Rcmdr: Basic Instructions – M Logan, Monash University