Lecture 1.2: Data

Data Sources

Simulations
ex: CFD, environmental modeling, virtual crash tests

Sensors/Scanners
ex: medical diagnosis, satellites, emissions monitors

Surveys/Records
ex: census, consumer tracking, polls, observational studies

Equations
ex: math, health effects models

Data Characteristics

Continuity

Continuous: nature is continuous (for most purposes), but only implicit reps

Discrete: anything sampled or stored on digital media
representation error
possible aliasing
artifacts of sampling

Structure

Definitions

Topology: connectivity (triangle)

Geometry: realization of topology (coordinates)

Elements

Points: located where data value known (geom)

Cells: set up interpolation parameters (topology)
common types: point, line, triangle, quad, tetra, voxel

Structured: inherent spation relationship among points
relatively efficient storage: topology is implicit

regular
can be represented implicitly (3x3: dimension, origin, aspect)
ex: medical data

rectilinear
can be represented semi implicitly (nx + ny + nz)
ex: CFD  refinement around objects

curvilinear
geometry represented explicitly (3*nx*ny*nz)
ex: CFD  flow along river
ease of computation
wide array of visualization algorithms

Unstructured: no (or unknown) spatial relationship among points
ex: FEM, structural analysis, census, monitor devices
flexibility
often reality
more limited array of visualization algorithms

Dimension: # of independent variables (2D, 3D, etc)
usually means number of spatial/temporal dimensions

Multiple

scalar: single value per position
multivariate: multiple values per position
multiple scalars
vector
tensor

Type

Scale

Nominal: just names or categories or identifiers
can say "this one is different from that one"
ex: county, land use, ethnicity or race, tissue type

Ordinal: values are ordered
can say "this one is bigger than that one"
ex: preference, ranking

Interval: constant step size
can say "the difference between these two is the same as the difference between those two"
ex: test scores, degrees Fahrenheit

Ratio: meaningful zero
can say "this one is twice as big as that one"
ex: degrees Kelvin, income, percent below poverty line, wind speed

Data Representation

Compact: efficient memory use
structured schemes, unstructured schemes, sparse matrices, shared verts

Efficient: computationally accessible; retrieve and store in constant time
structured schemes

Mappable: straightforward conversions
native > rep: simple conversion, no lost info
rep > graphics prim: esp for interactive display

Minimal coverage: manageble # options
few variants which work for a wide range of data sets

Simple
easier to use
easier to optimize
errors less likely

Data Transformations

Interpolation

Aggregation

Smoothing

Simplification

Data Quality

Missing data

Uncertain data

Representation error

Sampling artifacts