Data Types
Data comes in many different types, and these types directly correspond to how we can and should visualize the information.
Qualitative: Non-numeric, descriptive. (e.g. survey responses, observable features). Within Qualitative, there is:
- Nominal: Distinct, separate categories, in which there is no inherent order or rank. (e.g. hair color, marital status)
- Ordinal: Distinct, separate categories that have a meaningful order, but the distance between them is not uniform or quantifiable. (e.g. grades, satisfaction level)
Quantitative: Numeric, measurable. (e.g. test score, distance). Within quantitative, there is:
- Discrete: This type of data can only take specific, fixed values and is usually obtained by counting. (e.g. number of children)
- Continuous: This data can take on any value within a given range and is obtained by measuring. The values can be divided and measured to more and more precise levels, including fractions and decimals (e.g. height, weight)
Scales
Plot has many different scales; we categorize them by their input (domain) and output (range).
The domain is the abstract values that the scale expects as input. For quantitative or temporal data, it is typically expressed as an extent such as [start, end], [cold, hot], or [min, max]. For ordinal or nominal data, it is an array of values such as names or categories. The type of input values corresponds to the type scale option (e.g., linear or ordinal).
The range is the visual values that the scale generates as output. For position scales, it is typically an extent such as [left, right] or [bottom, top]; for color scales, it might be a continuous extent [blue, red] or an array of discrete colors. The type of values that a scale outputs corresponds to the name of the scale (e.g., x or color).
Source: Observable Plot reference on scales.
Continuous Scales
Simple x scale, with a domain of 0 to 100:
Reverse the domain, from 100 to 0:
Stretch the range to the full width of the page:
The domain will automatically update to the data type, like with dates:
Discrete Scales
For discrete data, a point or band scale is required.
Point Scale: A point scale divides the space into uniformly-spaced discrete values. This is the default scale type for ordinal data on the x and y scale.
Band Scale: A band scale, however, divides the space into uniformly-spaced and -sized discrete intervals. This is how we can set the sizing for bar charts. In order to visualize these bands, we can use the cell mark.
Data Types <> Scales <> Marks
Not only does data and scales have a relationship, but scales to marks have a relationship. Some marks can only be rendered with certain scales, which means they can only be rendered with certain data types. Let's do some examples with the diamonds dataset.
We have a combination of continuous (measured) and ordinal (distinct) data. The continuous data includes carat, depth, price, etc. The ordinal data is cut and color.
Let's first make something with a combination of two continuous dimensions. Using x (width) and y (height), we can plot it with a dot mark.
We can also make something with one continuous and one ordinal scale -- like a bar chart. Let's use a transform to count how many of each cut exist in the dataset.
We can expand this even further by adding another grouping to our transformation. Not only can we use the transformation to group by a certain x channel for the reducer, we can pass another reducer to group on the same channel for the fill.
Now, back to exploring the mark types per scales, let's try making something with two ordinal data types. We can plot both of the ordinal data types, and this is usually using the Plot.cell() mark. Because we are working with ordinal values, both sides of the scale require some reducer to create something to plot. We can start with just the count of diamonds that fit each ordinal bucket:
Let's make this more interesting and actually fill the cells with a helpful color to see the distribution of prices per cut and color: