You may be trying to access this site from a secured browser on the server. Please enable scripts and reload this page.

Successful schools and teachers make use of multiple sources of data. They understand that using this information effectively can improve both instruction and student learning and wellbeing outcomes.

This glossary is intended to assist in understanding commonly used terms and concepts related to using data in schools.

**Aggregation**

**Average**

**Bar graphs**

**Baseline data**

**Bias**

**average** student performance in maths from one class (lets say the top maths class), then using this as the class average of all students at the school (the **population**) would be a biased estimate of the average performance in maths.

**Box and whisker plots**

**dataset** and the **variability**. The lower whisker represents the bottom 25% of scores, the middle box the 50% of scores and the upper whisker the top 25% of scores. The line across the box shows the **median** score for the data set. These are also called box plots.

**Causation**

**correlation**, a weaker relationship between two events or variables. Causation is more difficult to prove than correlation.

**Census**

A census is a survey conducted on every individual in a given **population**.

**Central tendency**

**Cluster**

**Code**

**qualitative datasets**) to be grouped into related categories for analysis.

**Coding**

**qualitative data** (text, images, etc.). Data that display similar characteristics are labelled with the same code.

**Cohort**

A group of individuals sharing a common characteristic. For example, age, gender or year level.

**Column graphs**

**Content analysis**

**qualitative data**) and to compare their various characteristics. The process involves measuring the **frequency** and prominence of specific words and/or phrases.

**Convenience sample**

**randomly selected sample** and the data they provide will not be representative of any larger **population**. A convenience sample, for example, might be all the students in a particular classroom. They may differ from students in other classrooms or schools.

**Correlation**

**Cumulative**

Data values that collected over time and added to the running total.

**Data**

**qualitative** and **quantitative data**).

**Data informed practice**

**Data literacy**

**Data visualisation**

**trends** which may not be visible in the raw numerical data. For example, charts, graphs, infographics, and maps can visually transform large amounts of data into comprehensible information.

**Dataset**

All of the data collected for a particular purpose or analysis.

**Dependent variable**

**independent variable**.

**Disaggregation**

Disaggregation is the separation of data that has been combined to reveal individual values.

**Dichotomous**

These data are **nominal** and only have only two categories, for example, yes/no.

**Effect size**

**variables** and the impact of one variable on another variable. For example a school introduces extracurricular programs to increase student engagement in science. The effect size describes the amount of change in student engagement that is attributable to the extracurricular program.

**Frequency**

The number of times an event occurs in a test or analysis of data.

**Frequency distribution**

**survey**.

**Generalisability**

**population**.

**Histogram**

**skewness** and **outliers**. Each bar represents a range of the data. It differs to a **bar chart** because it focuses on a single **variable** that is continuous where as a bar chart may represent groups of discrete values.

**Independent variable**

**dependent variable**. For example, a school introduces extracurricular programs to increase student engagement in science. The extracurricular program is the independent variable and student engagement in science is the dependent variable.

**Interval data**

**scale**. These are numerical and ordered so we know the exact difference between the values at each point. For example, a thermometer measuring the temperature.

**Interquartile range**

**median**. The IQR is the difference between the upper (Q3) and lower (Q1) **quartiles**, and describes the middle 50% of values when ordered from lowest to highest. They can be represented by a distribution curve and **box and whisker plot**.

**Linear correlation**

**correlation** is visible in how well data points fit a straight line when plotted together. When all the points fall on the line it is called a perfect correlation. When the points are scattered all over the graph and there is no trend or pattern there is no correlation.

**Likert scale**

**Likert-type scale**

**Line graphs**

**trends** over equal intervals of time such as weeks, terms or school years. These can also be used to compare changes of more than one group over the same period of time.

**Longitudinal study**

**Mean**

**average**, and a **measure of central tendency**. It is calculated by adding all the numbers in a dataset and dividing the sum by how many numbers there are. Means can be calculated on continuous data, not categorical data (for example we can calculate the mean of a series of numbers, but not a group of preferences).

**Measures of central tendency**

**Median**

**measure of central tendency**. It is the middle score of a dataset when its scores are placed from lowest to highest.

**Mode**

**measure of central tendency**. It is the value that occurs most often in a dataset. It is possible for a dataset to have multiple modes.

**Negative correlation**

**Nominal data**

Nominal data is categorical. Data represent counts of categories of objects.

**Observation**

An occurrence of a specific data item that is recorded about a variable.

**Operationalisation**

**Ordinal data**

Ordinal data is categorical. These data can be ranked.

**Outlier**

An extreme, or atypical data value(s) that is substantially different from the rest of the data.

**Percentile**

**Pie charts**

Circular charts which are divided into slices to visualise numerical proportions.

**Population**

**Probability**

The measure of the chance that an event will occur.

**Probability/random sampling**

**cluster** sampling. Randomisation increases the likelihood that the results will be generalisable to a wider population.

**Purposive (or non-random) sampling**

**Qualitative data**

Data that are non-numerical and often in the form of text, images, and objects.

**Quantitative data**

Data that are numerical. Numbers are used to represent values or counts.

**Quartiles**

**Quota sampling**

**random sampling**, because it requires that the individuals are chosen out of a specific subgroup (such as age, gender and ethnicity).

**Random sampling**

See **probability sampling**.

**Range**

The difference between the smallest value and the largest value in a dataset.

**Rate**

The occurrence of events over an interval of time, or the frequency of a phenomenon of interest.

**Ratio**

Compares the frequency of a variable compared to the frequency of another.

**Ratio data**

**Reliability**

**populations** at different points in time.

**Sample**

**Sampling**

**Saturation**

The moment during data collection when new data no longer reveals new information.

**Scale (measurement)**

**nominal**, **ordinal**, **interval**, **ratio** and **Likert type**.

**Scale (graphs and charts)**

The subdivision of each axis. The scale may be numerical or categorical.

**Skew**

**Snowball sampling**

**Standard deviation**

**variation** in the data. It's the average distance from the mean of each unit of data.

**Statistical literacy**

**Survey**

**Thematic analysis**

**qualitative data** by identifying patterns of meaning. It involves labeling chunks of data using **codes**, which can then be grouped together into themes.

**Timeliness**

**Trend**

**variable**. For example, the change in student assessment results over time. These data can be displayed graphically.

**Triangulation**

**T-test**

**Validity**

**Variable**

A variable is any characteristic, number, or quantity that can be measured or counted.

**Variance**

**mean** (**average**). Variance can be used to measure **variability** (**volatility**) and spread. There are four commonly used measures of variability: **range**, **mean**, variance and **standard deviation**.

**Volatility**

The measure of how much data fluctuates over time. For example, stock prices going up and down.

**Variability**

**X-axis**

The horizontal number line on a graph.

**Y-axis**

The vertical number line on a graph.

**Z-scores**

**standard deviations** a value is from the mean. For example, in a class of students with an **average** test score of 30 and a standard deviation of 5, if a particular student's test mark is 40, the z-score for that student would be 2 ((40-30)/5).