Conventions Used in This Toolbox¶
The idea is to make the regular use cases easy and the hard ones possible.
Common Vocabulary¶
A common vocabulary is important when different parties talk about complicated concepts to make sure everyone fully understands what the other is talking about.
Dimensions and Axes¶
Talking about dimensions in context of numpy arrays can be a bit confusing
especially when coming from a mathematical background. We use the following
convention: A point in the 3D Space (x, y, z)
is an array with one dimension
of length 3. An array of n
such points would be an array with two
dimensions, the first axis (dimension) with the length of n
, and the exond
axis with the length of 3
.
>>> import numpy as np
>>> a = np.arange(20)
>>> a
array([ 0, 1, 2, 3, ..., 17, 18, 19])
>>> a.ndim
1 # one dimension (or axis)
>>> a.shape
(20,) # of lenght 20
>>> a = a.reshape(4, 5)
>>> a
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]])
>>> a.ndim
2 # two dimensions
>>> a.shape
(4, 5) # of length 4 and 5
Data Structures¶
Wyrm uses one data structure wyrm.types.Data
to wrap the different data
during the processing. Data is very generic and thus flexible on purpose. It
can contain all kinds of data and tries to stay out of the way. Data is also
self-explaining in a sense that it does not only contain the raw data but also
meta-data about the axes, like names and units and the values of the axes (for a
complete overview on Data please refer to the documentation).
Most of Wyrm’s toolbox methods are expecting a Data
object as an argument.
Since Data
is very flexible and does not impose for example the order of the
axes, it is important to abide a certain convention:
- Continuous Data
- Continuous Data is usually (but not limited to) raw EEG data. It has two
axes:
[time, channel]
. Channel should always be the last axis, time the second last. - Epoched Data
Epoched Data is often Continuous Data split into several equally long chunks (epochs). Each epoch usually belongs to a class. The axes in this case are
[class, time, channel]
. Class should always be the first axis, time the second last and channel the last one. This is consistent with Continuos Data.Epoched Data can also contain different data than (e.g. data in the frequency domain), but the class axis should always be the first.
- Feature Vector
- In the later steps of the data processing, one often deals no more with
continuous data but with feature vectors. Feature Vectors are similar to
Epoched data, since each vector usually belongs to a class. Thus the
axes are:
[class, fv]
.
You are free to follow the convention or not. If you do, most methods will work out of the box – off course you still have to think if a certain method makes sense on the current object at hand.
If you create non-conventional Data
objects, the methods will still work (if
they make sense), but you have to provide the methods an extra parameter, with
the index of the axis (or axes).
Associating Samples to Timestamps¶
The time marks the time at the beginning of the sample.
Example:
Time [ms] 0 10 20 30 ...
| | | |
Sample [#] [ 0 ][ 1 ][ 2 ][ 3 ]
The interpretation is that sample 0 contains the data from [0, 10)
, sample 1
contains [10, 20)
, and so on.
Intervals¶
Whenever you encounter a time interval with a start and stop value, the
convention is [start, stop)
(i.e. start is included, stop is excluded).
Example:
Time [ms] 0 10 20 30 ...
| | | |
Sample [#] [ 0 ][ 1 ][ 2 ][ 3 ]
Interval (0, 30) returns the samples 0, 1, 2