Anomaly Detection from Fetal ECG — A Case Study of IOT Anomaly Detection using GAN | Hacker Noon

Author profile picture

@sharmi1206Sharmistha Chatterjee

In this blog, we discuss about the role of Variation Auto Encoder in detecting anomalies from fetal ECG signals.

Variational Auto Encoder ways to accurately determine anomalies from seasonal metrics occurring at
regular intervals ( i.e. daily/weekly/bi-weekly/monthly or periodic events at finer granular levels of mins/secs) so as to facilitate timely actions from the concerned team. Such timely actions help to recover from serious issues such as predictive maintenance) in the field of web applications, retail, IoT, telecom, and healthcare industry.

The metrics/KPIs that plays an important role in determining anomalies are composed of noises that are assumed to be independent, zero-mean Gaussian at every point. In fact, the seasonal KPIs comprises of seasonal patterns with local variations, and statistics of the Gaussian noises.

This article is published at

Role of IoT/Wearables

Portable low-power fetal ECG collectors like wearables have been designed for research and analysis and, which can collect maternal abdominal ECG signals in real time. The ECG data can be sent to a smartphone client via Bluetooth to individually analyse signals captured from fetal brain and maternal abdomen . The extracted fetal ECG signals can be used to detect any anomaly in fetal behavior.

Variation Auto-Encoder

Deep Bayesian networks employ black-box learning patterns with neural networks to express the relationships between variables in the training dataset. Variational Auto Encoders are nothing but Deep Bayesian Networks which are often used in training and prediction, uses Neural Networks to model posteriors of the distributions.

Variational Auto Encoders (VAEs) supports optimization by setting a lower bound on the likelihood via a reparameterization of the Evidence Lower Bound (ELBO). The ELBO method uses a 2 step process of maximizing the log-likelihood, the likelihood tries to make the generated sample (image/data) more correlated to the latent variable, which makes the model more deterministic. In addition, it minimizes the KL divergence between the posterior and the prior.

Characteristics/Architecture of DoNut

The Donut recognizes the normal pattern of a partially abnormal x, and find a good posterior in order to estimate how well x follows the
normal pattern. The fundamental characteristic of Donut is to enhance its ability to find good posteriors by reconstructing normal points within abnormal windows. This property is infused in its training property by M-ELBO (Modified ELBOW), that turns out to be superior, in contrast to excluding all windows containing anomalies and missing points from the training data.

Thus summarizing the three techniques employed in VAE based anomaly detection algorithm in Donut architecture includes the following:

Modified ELBO – Ensures that an average, a certain minimum number of bits of information are encoded per latent variable, or per group of the latent variable. This helps to increase the information capacity and reconstruction accuracy.

Missing Data Injection for training – A kind of data augmentation procedure used to fill the missing points as zeros. It amplifies the effect of ELBO by injecting the missing data before the training epoch starts and recovering the missing points after the epoch is finished.

MCMC Imputation for better anomaly detection – Improves posterior estimation by synthetically generated missing points.

The network structure of Donut. Gray nodes are random variables, and white nodes are layers.  Source (Unsupervised Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications

The data preparation stage deals with 

StandardizationMissing value Injection and grouping data in terms of Sliding Window (length say (W) over key metrics), where each point xt is being processed as xt−W +1, . . . , x. The training process encompasses Modified ELBO and Missing Data Injection. In the final prediction stage, MCMC Imputation (as shown in the figure below) is applied to yield a better posterior distribution.

MCMC Imputation and Anomaly Detection Source

To know more about ELBO in VATE check out or refer to the references below.

File Imports

import numpy as np
from donut import complete_timestamp, standardize_kpi
import pandas as pd
import csv
import matplotlib.pyplot as plt
import seaborn as sns
sns.set(rc={'figure.figsize':(11, 4)})
from sklearn.metrics import accuracy_score
import mne
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Loading and TimeStamping the data

Here we add timestamps to the Fetal ECG data, under the assumption that each data point is recorded at an interval of 1 second, (although the data-set source suggests that the signal are recorded at 1 Khz.). We further resample the data at an interval of 1 minute by taking an average of 60 samples.

data_path = '../abdominal-and-direct-fetal-ecg-database-1.0.0/'
file_name = 'r10.edf'

edf =
header = ','.join(edf.ch_names)
np.savetxt('r10.csv', edf.get_data().T, delimiter=',', header=header)

df = pd.read_csv('r10.csv')
periods = df.shape[0]

dti = pd.date_range('2018-01-01', periods=periods, freq='s')
print(dti.shape, df.shape)
df['DateTs'] = dti

df.index = pd.to_datetime(df.index, unit='s')
df1 = df.resample('1T').mean()

Once the data is indexed by time-stamps we plot the individual features and try to explore seasonality patterns if any. We also add a label feature metric, signifying potential anomalies that could be present in the input data by considering at high-level of brain signal fluctuations (>= .00025 and <= -.00025). We chose the brain signal, as it closely resembles the signal curves and spikes of 4 other abdominal signals.

Data Labelling and Plotting the Features

As there are total 5 signals (one from fetal brain and 4 from abdomen)

df1.rename_axis('timestamp', inplace=True)

df1['label'] =  np.where((df1['# Direct_1'] >= .00025) | (df1['# Direct_1'] <= -.00025), 1, 0)

for i in range(0, len(cols)):
    if(cols[i] != 'timestamp'):
        plt.figure(figsize=(20, 10))
        plt.plot(df1[cols[i]], marker='^', color='red')
        plt.savefig('figs/f_' + str(i) + '.png')

Training the data using Adversarial Networks

df2 = df1.reset_index()
df2 = df2.reset_index(drop=True) #drop the index, instead use as it as a feature vector before discovering the missing data points

# Read the raw data for 1st feature Direct_1
timestamp, values, labels = df2['timestamp'], df2['# Direct_1'], df2['label']
# If there is no label, simply use all zeros.
labels = np.zeros_like(values, dtype=np.int32)

# Complete the timestamp, and obtain the missing point indicators.
timestamp, missing, (values, labels) = 
    complete_timestamp(timestamp, (values, labels))

# Split the training and testing data.
test_portion = 0.3
test_n = int(len(values) * test_portion)
train_values, test_values = values[:-test_n], values[-test_n:]
train_labels, test_labels = labels[:-test_n], labels[-test_n:]
train_missing, test_missing = missing[:-test_n], missing[-test_n:]

# Standardize the training and testing data.
train_values, mean, std = standardize_kpi(
    train_values, excludes=np.logical_or(train_labels, train_missing))
test_values, _, _ = standardize_kpi(test_values, mean=mean, std=std)

import tensorflow as tf
from donut import Donut
from tensorflow import keras as K
from tfsnippet.modules import Sequential
from donut import DonutTrainer, DonutPredictor

# We build the entire model within the scope of `model_vs`,
# it should hold exactly all the variables of `model`, including
# the variables created by Keras layers.
with tf.variable_scope('model') as model_vs:
    model = Donut(
            K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
            K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
            K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
            K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),

trainer = DonutTrainer(model=model, model_vs=model_vs, max_epoch=512)
predictor = DonutPredictor(model)

with tf.Session().as_default():, train_labels, train_missing, mean, std)
    test_score = predictor.get_score(test_values, test_missing)

    pred_score = np.array(test_score).reshape(-1, 1)
    print(len(test_missing), len(train_missing), len(pred_score), len(test_values))
    y_pred = np.argmax(pred_score, axis=1)

The model is trained with default parameters as listed below:

grad_clip_norm=10.0 #Clip gradient by this norm.

The model summary with its trainable parameters, number of hidden layers can be obtained as :

Trainable Parameters (24,200 in total)
donut/p_x_given_z/x_mean/bias (120,) 120
donut/p_x_given_z/x_mean/kernel (50, 120) 6,000
donut/p_x_given_z/x_std/bias (120,) 120
donut/p_x_given_z/x_std/kernel (50, 120) 6,000
donut/q_z_given_x/z_mean/bias (5,) 5
donut/q_z_given_x/z_mean/kernel (50, 5) 250
donut/q_z_given_x/z_std/bias (5,) 5
donut/q_z_given_x/z_std/kernel (50, 5) 250
sequential/forward/_0/dense/bias (50,) 50
sequential/forward/_0/dense/kernel (5, 50) 250
sequential/forward/_1/dense_1/bias (50,) 50
sequential/forward/_1/dense_1/kernel (50, 50) 2,500
sequential_1/forward/_0/dense_2/bias (50,) 50
sequential_1/forward/_0/dense_2/kernel (120, 50) 6,000
sequential_1/forward/_1/dense_3/bias (50,) 50
sequential_1/forward/_1/dense_3/kernel (50, 50) 2,500

This model is obtained from the following code snippet:

model = Donut(
K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),
K.layers.Dense(50, kernel_regularizer=K.regularizers.l2(0.001),

This DoNut Network contains uses The variational auto-encoder (“Auto-Encoding Variational Bayes”,Kingma, D.P. and Welling) which is a deep Bayesian network, with observed variable x and latent variable z. The VAE is generated using TFSnippet (library for writing and testing tensorflow models). The generative process of Auto-Encoder is initiated with parameter z with prior distribution p(z), and a hidden network h(z), then uses observed variable x with distribution p(x | h(z)). The posterior inference p(z | x)variational inference techniques are adopted, to train a separated distribution q(z | h(x)).

Here each Sequential function creates a multi-layer perception, with 2 hidden layers of 50 units and RELU activation. The 2 distributions “h_for_p_x” and “h_for_q_z“, are created using the same Sequential function (as evident from Model Summary (Sequential and Sequential_1) and they represent the hidden networks for “p_x_given_z” and “q_z_given_x”.

Epoch 12/512, Step 100, ETA 41.77s] step time: 0.008595s (±0.04177s); valid time: 0.1461s; loss: 132.838 (±3.72495); valid loss: 321.406 (*)
[Epoch 20/512, Step 180, ETA 28.82s] Learning rate decreased to 0.0005625000000000001
[Epoch 23/512, Step 200, ETA 27.52s] step time: 0.002694s (±0.0006261s); valid time: 0.003591s; loss: 113.566 (±5.31845); valid loss: 453.668
[Epoch 30/512, Step 270, ETA 23.7s] Learning rate decreased to 0.00042187500000000005
[Epoch 34/512, Step 300, ETA 22.59s] step time: 0.002731s (±0.0006566s); valid time: 0.003669s; loss: 104.204 (±3.14462); valid loss: 385.86
[Epoch 40/512, Step 360, ETA 20.9s] Learning rate decreased to 0.00031640625000000006
[Epoch 45/512, Step 400, ETA 20s] step time: 0.002748s (±0.0006577s); valid time: 0.003666s; loss: 99.7013 (±2.49588); valid loss: 345.38
[Epoch 50/512, Step 450, ETA 19.14s] Learning rate decreased to 0.00023730468750000005
[Epoch 56/512, Step 500, ETA 18.36s] step time: 0.002783s (±0.0006466s); valid time: 0.003579s; loss: 97.9879 (±2.45387); valid loss: 337.236
[Epoch 60/512, Step 540, ETA 17.83s] Learning rate decreased to 0.00017797851562500002
[Epoch 67/512, Step 600, ETA 17.14s] step time: 0.0028s (±0.0006429s); valid time: 0.003572s; loss: 96.7373 (±2.23035); valid loss: 327.604
[Epoch 70/512, Step 630, ETA 16.84s] Learning rate decreased to 0.00013348388671875002
[Epoch 78/512, Step 700, ETA 16.18s] step time: 0.002809s (±0.000654s); valid time: 0.004116s; loss: 95.7726 (±2.18217); valid loss: 328.016
[Epoch 80/512, Step 720, ETA 16s] Learning rate decreased to 0.00010011291503906251
[Epoch 89/512, Step 800, ETA 15.35s] step time: 0.002728s (±0.0005963s); valid time: 0.0034s; loss: 95.1639 (±2.41546); valid loss: 330.26
[Epoch 90/512, Step 810, ETA 15.28s] Learning rate decreased to 7.508468627929689e-05
[Epoch 100/512, Step 900, ETA 14.67s] step time: 0.002801s (±0.000503s); valid time: 0.002966s; loss: 94.8704 (±2.12291); valid loss: 329.362
[Epoch 100/512, Step 900, ETA 14.67s] Learning rate decreased to 5.631351470947266e-05
[Epoch 110/512, Step 990, ETA 14.11s] Learning rate decreased to 4.22351360321045e-05
[Epoch 112/512, Step 1000, ETA 14.08s] step time: 0.002881s (±0.0005895s); valid time: 0.004087s; loss: 94.6089 (±2.419); valid loss: 329.868
[Epoch 120/512, Step 1080, ETA 13.59s] Learning rate decreased to 3.167635202407837e-05
[Epoch 123/512, Step 1100, ETA 13.5s] step time: 0.002734s (±0.000673s); valid time: 0.003972s; loss: 94.2929 (±2.28663); valid loss: 330.661
[Epoch 130/512, Step 1170, ETA 13.24s] Learning rate decreased to 2.3757264018058778e-05
[Epoch 134/512, Step 1200, ETA 13.13s] step time: 0.003223s (±0.0005896s); valid time: 0.00483s; loss: 94.2901 (±2.48745); valid loss: 333.082
[Epoch 140/512, Step 1260, ETA 12.96s] Learning rate decreased to 1.7817948013544083e-05
[Epoch 145/512, Step 1300, ETA 12.83s] step time: 0.003464s (±0.000643s); valid time: 0.003785s; loss: 94.0924 (±2.36603); valid loss: 331.732
[Epoch 150/512, Step 1350, ETA 12.59s] Learning rate decreased to 1.3363461010158061e-05
[Epoch 156/512, Step 1400, ETA 12.34s] step time: 0.002833s (±0.000543s); valid time: 0.003443s; loss: 94.1853 (±2.36213); valid loss: 330.787
[Epoch 160/512, Step 1440, ETA 12.14s] Learning rate decreased to 1.0022595757618546e-05
[Epoch 167/512, Step 1500, ETA 11.84s] step time: 0.002776s (±0.0006271s); valid time: 0.00391s; loss: 93.9636 (±2.63707); valid loss: 332.806
[Epoch 170/512, Step 1530, ETA 11.7s] Learning rate decreased to 7.51694681821391e-06
[Epoch 178/512, Step 1600, ETA 11.39s] step time: 0.002823s (±0.00107s); valid time: 0.008962s; loss: 93.8932 (±2.20515); valid loss: 333.171
[Epoch 180/512, Step 1620, ETA 11.32s] Learning rate decreased to 5.637710113660432e-06
[Epoch 189/512, Step 1700, ETA 10.96s] step time: 0.002832s (±0.000561s); valid time: 0.002961s; loss: 93.6804 (±1.92295); valid loss: 333.581
[Epoch 190/512, Step 1710, ETA 10.9s] Learning rate decreased to 4.228282585245324e-06
[Epoch 200/512, Step 1800, ETA 10.51s] step time: 0.002724s (±0.0006145s); valid time: 0.004207s; loss: 94.0701 (±2.31256); valid loss: 332.533
[Epoch 200/512, Step 1800, ETA 10.51s] Learning rate decreased to 3.171211938933993e-06
[Epoch 210/512, Step 1890, ETA 10.1s] Learning rate decreased to 2.3784089542004944e-06
[Epoch 212/512, Step 1900, ETA 10.07s] step time: 0.002673s (±0.0005291s); valid time: 0.003851s; loss: 93.9893 (±2.47393); valid loss: 332.373
[Epoch 220/512, Step 1980, ETA 9.709s] Learning rate decreased to 1.7838067156503708e-06
[Epoch 223/512, Step 2000, ETA 9.62s] step time: 0.002534s (±0.0005212s); valid time: 0.003205s; loss: 93.8731 (±2.53016); valid loss: 333.33
[Epoch 230/512, Step 2070, ETA 9.325s] Learning rate decreased to 1.337855036737778e-06
[Epoch 234/512, Step 2100, ETA 9.209s] step time: 0.002822s (±0.0007249s); valid time: 0.004339s; loss: 93.8581 (±2.22449); valid loss: 333.22
[Epoch 240/512, Step 2160, ETA 8.984s] Learning rate decreased to 1.0033912775533336e-06
[Epoch 245/512, Step 2200, ETA 8.812s] step time: 0.002827s (±0.0007307s); valid time: 0.003549s; loss: 93.9701 (±2.11498); valid loss: 333.636
[Epoch 250/512, Step 2250, ETA 8.612s] Learning rate decreased to 7.525434581650002e-07
[Epoch 256/512, Step 2300, ETA 8.439s] step time: 0.003023s (±0.0007664s); valid time: 0.00407s; loss: 93.8546 (±2.38006); valid loss: 334.049
[Epoch 260/512, Step 2340, ETA 8.323s] Learning rate decreased to 5.644075936237502e-07
[Epoch 267/512, Step 2400, ETA 8.093s] step time: 0.003219s (±0.001158s); valid time: 0.004055s; loss: 93.7858 (±2.05185); valid loss: 333.107
[Epoch 270/512, Step 2430, ETA 7.985s] Learning rate decreased to 4.233056952178126e-07
[Epoch 278/512, Step 2500, ETA 7.737s] step time: 0.003088s (±0.0006289s); valid time: 0.004418s; loss: 93.8444 (±2.4014); valid loss: 332.4
[Epoch 280/512, Step 2520, ETA 7.666s] Learning rate decreased to 3.1747927141335945e-07
[Epoch 289/512, Step 2600, ETA 7.384s] step time: 0.003188s (±0.0007085s); valid time: 0.004121s; loss: 93.8303 (±2.79036); valid loss: 332.357
[Epoch 290/512, Step 2610, ETA 7.348s] Learning rate decreased to 2.3810945356001957e-07
[Epoch 300/512, Step 2700, ETA 7.005s] step time: 0.002907s (±0.000593s); valid time: 0.003156s; loss: 93.8036 (±2.13392); valid loss: 334.06
[Epoch 300/512, Step 2700, ETA 7.005s] Learning rate decreased to 1.7858209017001467e-07
[Epoch 310/512, Step 2790, ETA 6.666s] Learning rate decreased to 1.33936567627511e-07
[Epoch 312/512, Step 2800, ETA 6.63s] step time: 0.002883s (±0.0005195s); valid time: 0.003597s; loss: 93.8148 (±2.01193); valid loss: 333.854
[Epoch 320/512, Step 2880, ETA 6.325s] Learning rate decreased to 1.0045242572063325e-07
[Epoch 323/512, Step 2900, ETA 6.246s] step time: 0.002773s (±0.0006961s); valid time: 0.00378s; loss: 93.9129 (±2.12316); valid loss: 333.108
[Epoch 330/512, Step 2970, ETA 5.969s] Learning rate decreased to 7.533931929047494e-08
[Epoch 334/512, Step 3000, ETA 5.854s] step time: 0.002614s (±0.0006314s); valid time: 0.003484s; loss: 93.7884 (±2.0612); valid loss: 333.269
[Epoch 340/512, Step 3060, ETA 5.622s] Learning rate decreased to 5.650448946785621e-08
[Epoch 345/512, Step 3100, ETA 5.47s] step time: 0.002697s (±0.0006147s); valid time: 0.003524s; loss: 93.7978 (±2.12356); valid loss: 334.431
[Epoch 350/512, Step 3150, ETA 5.279s] Learning rate decreased to 4.237836710089216e-08
[Epoch 356/512, Step 3200, ETA 5.094s] step time: 0.002814s (±0.0006715s); valid time: 0.003473s; loss: 93.8834 (±2.06846); valid loss: 333.328
[Epoch 360/512, Step 3240, ETA 4.943s] Learning rate decreased to 3.178377532566912e-08
[Epoch 367/512, Step 3300, ETA 4.727s] step time: 0.002899s (±0.000598s); valid time: 0.00413s; loss: 93.7865 (±2.3518); valid loss: 332.528
[Epoch 370/512, Step 3330, ETA 4.618s] Learning rate decreased to 2.3837831494251838e-08
[Epoch 378/512, Step 3400, ETA 4.359s] step time: 0.002847s (±0.0006048s); valid time: 0.003002s; loss: 93.9225 (±2.33301); valid loss: 333.769
[Epoch 380/512, Step 3420, ETA 4.285s] Learning rate decreased to 1.787837362068888e-08
[Epoch 389/512, Step 3500, ETA 3.997s] step time: 0.002976s (±0.0005665s); valid time: 0.004052s; loss: 93.8367 (±2.03933); valid loss: 333.584
[Epoch 390/512, Step 3510, ETA 3.959s] Learning rate decreased to 1.3408780215516658e-08
[Epoch 400/512, Step 3600, ETA 3.63s] step time: 0.002724s (±0.0007187s); valid time: 0.004206s; loss: 93.9743 (±2.20482); valid loss: 331.93
[Epoch 400/512, Step 3600, ETA 3.63s] Learning rate decreased to 1.0056585161637493e-08
[Epoch 410/512, Step 3690, ETA 3.3s] Learning rate decreased to 7.542438871228119e-09
[Epoch 412/512, Step 3700, ETA 3.265s] step time: 0.002767s (±0.0006224s); valid time: 0.00375s; loss: 93.9645 (±2.37689); valid loss: 333.563
[Epoch 420/512, Step 3780, ETA 2.972s] Learning rate decreased to 5.656829153421089e-09
[Epoch 423/512, Step 3800, ETA 2.901s] step time: 0.002742s (±0.0006882s); valid time: 0.003976s; loss: 93.9097 (±2.12219); valid loss: 333.391
[Epoch 430/512, Step 3870, ETA 2.649s] Learning rate decreased to 4.242621865065817e-09
[Epoch 434/512, Step 3900, ETA 2.547s] step time: 0.003172s (±0.0007754s); valid time: 0.004006s; loss: 93.8162 (±2.21242); valid loss: 331.59
[Epoch 440/512, Step 3960, ETA 2.329s] Learning rate decreased to 3.1819663987993622e-09
[Epoch 445/512, Step 4000, ETA 2.183s] step time: 0.002705s (±0.0005584s); valid time: 0.003788s; loss: 94.0521 (±2.10116); valid loss: 333.506
[Epoch 450/512, Step 4050, ETA 2.002s] Learning rate decreased to 2.386474799099522e-09
[Epoch 456/512, Step 4100, ETA 1.822s] step time: 0.002839s (±0.0007087s); valid time: 0.004977s; loss: 94.0687 (±2.09307); valid loss: 333.308
[Epoch 460/512, Step 4140, ETA 1.68s] Learning rate decreased to 1.7898560993246414e-09
[Epoch 467/512, Step 4200, ETA 1.465s] step time: 0.003049s (±0.0007399s); valid time: 0.004228s; loss: 93.8364 (±2.19231); valid loss: 333.25
[Epoch 470/512, Step 4230, ETA 1.359s] Learning rate decreased to 1.3423920744934811e-09
[Epoch 478/512, Step 4300, ETA 1.106s] step time: 0.00306s (±0.0007565s); valid time: 0.003902s; loss: 94.0478 (±2.44438); valid loss: 333.487
[Epoch 480/512, Step 4320, ETA 1.034s] Learning rate decreased to 1.0067940558701108e-09
[Epoch 489/512, Step 4400, ETA 0.746s] step time: 0.002718s (±0.00059s); valid time: 0.004008s; loss: 93.7664 (±2.16592); valid loss: 331.328
[Epoch 490/512, Step 4410, ETA 0.7099s] Learning rate decreased to 7.550955419025831e-10
[Epoch 500/512, Step 4500, ETA 0.3865s] step time: 0.002643s (±0.0005873s); valid time: 0.003841s; loss: 93.9062 (±2.45867); valid loss: 332.162
[Epoch 500/512, Step 4500, ETA 0.3865s] Learning rate decreased to 5.663216564269373e-10
[Epoch 510/512, Step 4590, ETA 0.06433s] Learning rate decreased to 4.24741242320203e-10
[Epoch 512/512, Step 4600, ETA 0.02862s] step time: 0.002872s (±0.000836s); valid time: 0.003878s; loss: 93.8652 (±2.60094); valid loss: 333.333

Plotting the Anomalies/Non-Anomalies together or Individually

We plot the anomalies (in red) together with non-anomalies (green) and also try to superimpose both of them together in the same graph so as to analyse the combined impact.

In the Donut prediction, the higher the prediction score the data is less anomalous. We prefer to choose (-3) as the threshold margin of predicting anomalous points.

We also compute the number of inliers and outliers and plot them against a time-stamped values along the x axis.

 plt.figure(figsize=(20, 10))
    split_test  = int((test_portion)*df.shape[0])

    anomaly = np.where(pred_score > -3, 0, 1)

    df3 = df2.iloc[-anomaly.shape[0]:]
    df3['outlier'] = anomaly

    print(df3.head(2), df3.shape)
    print("Split", split_test, df3.shape)
    di = df3[df3['outlier'] == 0]
    do = df3[df3['outlier'] == 1]

    di = di.set_index(['timestamp'])
    do = do.set_index(['timestamp'])

    print("Outlier and Inlier Numbers", do.shape, di.shape, di.columns, do.columns)

    outliers = pd.Series(do['# Direct_1'], do.index)
    inliers = pd.Series(di['# Direct_1'], di.index)

    plt.plot(do['# Direct_1'], marker='^', color='red', label="Anomalies")
    plt.plot(di['# Direct_1'],  marker='^', color='green', label="Non Anomalies")

    plt.legend(['Anomalies', 'Non Anomalies'])
    plt.title('Anomalies and Non Anomalies from Fetal Head Scan')

    di = di.reset_index()
    do = do.reset_index()
    plt.figure(figsize=(20, 10))

    do.plot.scatter(y ='# Direct_1', x = 'timestamp', marker='^', color='red', label="Anomalies")

    plt.xlim(df3['timestamp'].min(), df3['timestamp'].max())
    plt.ylim(-.0006, .0006)
    plt.title('Anomalies from Fetal Head Scan')
    plt.figure(figsize=(20, 10))
    di.plot.scatter(y='# Direct_1', x='timestamp', marker='^', color='green', label="Non Anomalies")
    plt.legend(['Non Anomalies'])
    plt.xlim(df3['timestamp'].min(), df3['timestamp'].max())
    plt.ylim(-.0006, .0006)
    plt.title('Non Anomalies from Fetal Head Scan')

Anomaly Plots for Direct electrocardiogram recorded from fetal head

The three consecutive plot displays anomalous and non-anomalous points plotted against each other or separately as labeled, especially for signals obtained from Fetal Head Scan.

Anomaly Plots for Direct electrocardiogram recorded from maternal abdomen

The three consecutive plot displays anomalous and non-anomalous points plotted against each other or separately as labeled, especially for signals obtained from Fetus’s Maternal Abdomen.

The three consecutive plot displays anomalous and non-anomalous points plotted against each other or separately as labeled, especially for signals obtained from Fetus’s Maternal Abdomen.


Some of the key. learnings of the Donut Architecture are:

Dimensionality reduction based anomaly detection techniques needs to use reconstruction mechanism to identify the variance and consequently identify the anomalies.Anomaly detection with generative models needs to train with both normal and abnormal data.Not relying on data imputation by any algorithm weaker than VAE, as this may degrade the performance.In order to discover the anomalies fast, the reconstruction probability for the last point in every window of x is computed.

We should also explore other variants of Auto Encoders (RNN, LSTM, LSTM with Attention Networks, Stacked Convolutional Bidirectional LSTM) in discovering anomalies for IoT devices.

This article is also published at

The complete source code is available at


  1. Anomaly Detection via Variational Auto-Encoder for Seasonal KPIs in Web Applications –
  2. Don’t Blame the ELBO! A Linear VAE Perspective on Posterior Collapse :
  3. — Installation and API
  4. UsageUnderstanding disentangling in β-VAE
  5. A Fetal ECG Monitoring System Based on the Android Smartphone :
Author profile picture

Read my stories


The Noonification banner

Subscribe to get your daily round-up of top tech stories!

read original article here