Meaning and Intelligence
In a previous post, we explored the concept of the ‘meaning’ of a measurement. What exactly is it we are measuring? How does this relate to the quantity we actually want to know?
This led to a discussion in the last post of ‘direct’ and ‘indirect’ measurements, and the conclusion that almost all measurements today are of the indirect type. This has importance when we look more closely at errors, as indirect measurement of a quantity involves more actual measurements, each of which has its associated errors. These errors propagate through the reduction process in interesting ways, leading to interesting errors in the derived values. But that discussion must come after we have discussed some other topics.
In the post where we considered the ‘meaning’ of a measurement, we considered the apparently simple situation of measuring the length of a table. Things got progressively more complex as we thought through exactly what we were measuring and how it related to what we wanted for the ‘length’ of the table. However, what use is the length of the table by itself?
We rarely make measurements in isolation, because we rarely need to know a single quantity that can be directly measured. We either need to know something that necessarily involves more than one measurement (e.g., the area of the tabletop), or we are using indirect measurements and we need multiple measurements to get the desired value (e.g., using measured slope, temperature and pressure, as well as the instrument calibration, to determine the horizontal distance between two points using EDM).
So how do the various measurements connect with each other, and so allow us to determine an estimate of the desired quantity? In some cases, we develop processes to connect sets of measurements together, such as the EDM example, or a flight of levels. We also develop mathematical models to represent the measurement process and allow the measurement values to be combined and so allow the derivation of the desired value. Surveying reduction formulae and the various processes for leveling, traversing and the like are examples of this.
Reduction formulae are designed for individual measurements or estimates (e.g., the EDM example), but we can also consider the situation of a topographic survey. In most cases, whatever measurement system we use (total station, stadia, GPS, airborne LiDAR, terrestrial laser scanner, photogrammetry), our first level of reduction is a series of 3-D points in space: a point cloud.
In a topographic survey, we have two issues coming from the ‘meaning’ of a measurement. The first is that the measurements are indirect and need to be reduced to get the 3-D point, and those issues have been discussed above and in previous posts. The second is answering the question ‘What does this 3-D point represent?’
As we discover when confronted with a large point cloud, determining what each point represents becomes critically important after we leave the field. In traditional surveying, we had relatively few points, and so could individually label them in the field book, or assign feature codes in a data recorder as each point was measured by a total station. As we move to scanning millions of 3-D points per second, individual point labeling is no longer possible.
This is attribute data, but it is critical to making any use of a point cloud, regardless of its size. With large point clouds, much of this meaning is added as part of data reduction process, which is why it is hard to automate the data processing side of terrestrial laser scanning if we want anything much beyond forming/fitting a surface to parts of the point cloud.
Attribute data is derived from a measurement, i.e., something that reduces the uncertainty about knowing what the value for the attribute is. Sometimes, the attribute is a very binary quantity, e.g., it’s land or it’s sea, with only a small mount of uncertainty (the coastline). On other occasions, such as soil type or land use, it can be highly variable and with differing levels of uncertainty. Even the question of who owns (or has certain rights over) a piece of land is not certain, and we always have gross errors to consider.
More advanced questions might include ‘If the attribute measurements we make contain errors, what are the distributions associated with those errors?’ and ‘How do those errors combine with the spatial measurement errors to affect the outcome of all these measurements?’ Those questions must be dealt with later, after we have covered a lot of statistical ground, but they should be kept in the back of one’s mind.
While we have discussed the ‘meaning’ of measurements, the other critical part of the business of combining measurements is what we can call ‘intelligence.’ This is knowing how the measurements combine to allow more complex things to be derived.
While combining measurements to get a single derived value, as in the EDM example, is perhaps considered more to be concerned with the ‘meaning’ of individual measurements, ‘intelligence’ is more about how sets of measured values, e.g., 3-D points, combine to allow determination of more complex objects. For example, we may collect a series of 3-D points with a total station and use them to define a road, by knowing not only which points are the edges and centerline of the road, but also how each point connects to the next in the representation. We often tailor our data collection processes and point codes to suit automation of later derivation and plotting of the data for topographic surveys.
As the representation (e.g., a map) that we are trying to develop becomes more complex, we get multiply-connected points, where each point may represent a critical point in more than one class of attributes or objects. We can handle those things in our heads, as that is the way we have to represent the real world in our minds, but persuading software to do the same thing is non-trivial. Similarly, attempting to deal with the propagation of errors through this representation is closer to a nightmare, as we have to deal with errors in many more objects and attributes than simple 3-D points.
As we combine points, the complexity increases exponentially. How fast this increases is not always readily understood, but a simple example would be having a single data point in the database, and then doubling the number of points at each iteration. After 30 steps, the number of data points is over ten million. The number of combinations of connections between the points in the database has gone from zero (with just one point) to a number that is far too large to readily calculate (more than 10500, which is what we get with a mere 253 points after 8 iterations).
Spatial data is usually multiply connected, and traditional topographic maps are among the most information-dense of all graphics. ‘Intelligence’ in spatial data is therefore something critical to be aware of, especially as the size of our data sets continues to grow. We do not have answers for a lot of these questions, but we must still ask them and try for some answers. These blog posts are part of the attempt to develop some answers to the more complex cases.