# Statistical modeling of sensor technology data

A sensor is an artificial implementation of what is called a sense in biology. Sensor data are the output from a device or a machine that detects changes, and responds to, some type of input from the physical environment, sending the information to a computer processor. Sensors translate measurements from the real world into data for the digital domain. Data arising from sensor technology are typically noisy and measured at a very high sampling rate. Often describing them as a function is an appropriate solution to reduce the noise by choosing a smoothing technique, data reduction and clustering.

Functional Data Analysis (FDA) is a field of statistics in which data are represented by functions or curves which are observations of a random variable (or random variables) taken over a continuous interval. In its most general form, under an FDA framework each sample element is considered to be a function. FDA is an appealing option for overcoming the aforementioned problems with such data. Correlations among neighboring measurements can be advantageous in FDA, which smooths such measurements into curves, effectively reducing the dimension of the data. Importantly, the dimension of smooth data representations can be controlled selecting the type and number of basis functions employed, while roughness penalties (e.g., on the total curvature of a function) allow continuous control over smoothness. By representing the data as functions, FDA also reduces the impact of non-trivial noise and “fills in” missing values, improving statistical power. In addition to improving signal-to-noise ratios, and hence power, smoothing can unveil information and biological insights missed by multivariate techniques, as long as the assumption of smoothing is reasonable (Froslie et al., 2013).

This work demonstrates the power of functional data analysis (FDA) for analysing and modeling sensor data. Numerous FDA methods are applied to two datasets: (1) a dataset which contains electroencephalogram (EEG) recordings of brain activity in two groups of individuals - a control group and a group of alcoholic patients, and (2) a dataset that is collected from a production line manufacturing contact lenses in Johnson and Johnson (J&J) VisionCare. The EEG dataset is used to demonstrate key FDA techniques and these are then deployed on the J&J data to identify key business insights from the production data.

## History

## Faculty

- Faculty of Science and Engineering

## Degree

- Master (Research)

## First supervisor

Norma Bargary## Second supervisor

Andrew Simpkin## Department or School

- Mathematics & Statistics