Sculpt Wrapper: Regression¶
In this tutorial we show how to wrap a simple regression model with Capsa Sculpt wrapper in Tensorflow.
We will demonstrate how the Sculpt-wrapped model is able to provide valuable insights into the model and data, beyond what a typical deep neural network model is capable of.
Suppose we are given a training dataset consisting of observation pairs \((x_i,y_i)\) for \(i = 1, 2,\dots\) where the outputs \(y_i\) are modelled as
The variable \(\epsilon\) represents noise (or volatility) in the output \(y\), and measures deviation from the base model \(y = x + \sin(2x)\). This noise is heteroscedastic since its variance \(\sigma(x)^2\) varies with \(x\); specifically, we consider
This type of model and noise can approximate some real world applications. For example, financial markets tend to follow relatively cyclic patterns, punctuated by periods of high volatility (risk).
Step 1: Initial Setup¶
To begin, we’ll generate some training data from the model described above.
import tensorflow as tf
import math
import matplotlib.pyplot as plt
import numpy as np
tf.random.set_seed(1)
np.random.seed(1)
np.set_printoptions(threshold=10)
def fn(x):
y = np.sin(2*x) + x
high_risk_x = (x > 0) & (x < np.pi / 2)
noise = np.random.normal(0, 1, x.shape) * high_risk_x + np.random.normal(0, 0.2, x.shape) * ~high_risk_x
y += noise
return y
x = np.linspace(-math.pi,math.pi,1000)[:, np.newaxis]
y = fn(x)
x = tf.convert_to_tensor(x, dtype=tf.float32)
y = tf.convert_to_tensor(y, dtype=tf.float32)
When plotting the data, we can see the dramatic change in the volatility (aleatoric risk) between \(x=0\) and \(x=\pi/2\).
plt.scatter(x, y, label='Ground truth data', s=5, c='blue',alpha=0.5)
plt.vlines([0,np.pi/2], -5, 5, linestyle='dashed', color='k')
plt.xlabel('x')
plt.ylabel('y')
plt.title('Training Data')
plt.show()
In this tutorial we will model the data using a simple sequential deep learning model consisting of 4 fully-connected layers.
initializer = tf.keras.initializers.GlorotNormal(seed=123)
W1 = tf.Variable(initializer(shape=(1, 64)))
W2 = tf.Variable(initializer(shape=(64, 64)))
W3 = tf.Variable(initializer(shape=(64, 64)))
W4 = tf.Variable(initializer(shape=(64, 1)))
initializer = tf.keras.initializers.Constant(0.05)
B1 = tf.Variable(initializer(shape=(64)))
B2 = tf.Variable(initializer(shape=(64)))
B3 = tf.Variable(initializer(shape=(64)))
B4 = tf.Variable(initializer(shape=(1)))
@tf.function
def model(x):
x = tf.matmul(x, W1) + B1
x = tf.nn.relu(x)
x = tf.matmul(x, W2) + B2
x = tf.nn.relu(x)
x = tf.matmul(x, W3) + B3
x = tf.nn.relu(x)
x = tf.matmul(x, W4) + B4
return x
Step 2: Wrapping the model¶
Wrapping the model is as simple as importing sculpt
from the capsa_tf
and wrapping the model with a sculpt.Wrapper
object. This object requires a single argument, the distribution type with which to model the aleatoric risk. In this case, we use the normal (Gaussian) distribution, sculpt.Normal
. The wrapper will produce better risk estimates if the selected distribution is close to the actual noise distribution, but the normal distribution is a good approximation in many cases.
from capsa_tf import sculpt
# Wrap model with sculpt wrapper
dist = sculpt.Normal
wrapped_model = sculpt.Wrapper(dist)(model)
When used normally, the wrapped model functions exactly the same way as the original model.
original_predictions = model(x)
wrapped_predictions = wrapped_model(x)
if tf.equal(tf.cast(tf.math.reduce_all(original_predictions == wrapped_predictions), tf.int32), 1):
print('Wrapped model outputs match original model')
else:
print('Wrapped model outputs DO NOT match original model')
Wrapped model outputs match original model
However, the wrapped model takes an optional keyword argument, return_risk
, which outputs aleatoric risk when set to True
, as seen below.
pred, risk = wrapped_model(x, return_risk=True)
if tf.equal(tf.cast(tf.math.reduce_all(original_predictions == pred), tf.int32), 1):
print('Wrapped model outputs still match original model')
else:
print('Wrapped model outputs DO NOT match original model')
print('Predictions:', tf.transpose(pred)) # transposed for prettier printing
print('Risk:', tf.transpose(risk))
Wrapped model outputs still match original model
Predictions: tf.Tensor([[0.15077989 0.15070572 0.15063186 ... 0.68953186 0.69050485 0.6914777 ]], shape=(1, 1000), dtype=float32)
Risk: tf.Tensor([[2.1531339 2.1536212 2.1541088 ... 2.410312 2.4097764 2.4092405]], shape=(1, 1000), dtype=float32)
Step 3: Training¶
Most wrappers require re-training or finetuning in order to produce accurate risk estimates. Sculpt is one of these wrappers: it will not provide useful outputs unless the model is re-trained or finetuned. Furthermore, it requires a one-line modification to the original training procedure, which we demonstrate below.
Note
Read the wrapper documentations in order to pick the wrapper most suitable for your use case.
def train_step(x, y):
with tf.GradientTape() as tape:
y_hat, risk = wrapped_model(x, return_risk=True)
### NEW ###
# Using the sculpt wrapper requires using a distribution-specific loss function, like this:
loss = dist.loss_function(y, y_hat, risk)
### The rest of the code in this block is typical Tensorflow training code ###
# The following line is optional:
# loss += _YOUR_ORIGINAL_LOSS_FUNCTION
trainable_vars = tape.watched_variables()
gradients = tape.gradient(loss, trainable_vars)
optimizer.apply_gradients(zip(gradients, trainable_vars))
return loss
# train
optimizer = tf.optimizers.Adam(learning_rate=1e-4)
total_loss = []
n_epochs = 2500
for ep in range(n_epochs):
loss = train_step(x, y)
print(f'epoch: {ep+1}/{n_epochs}, mean loss: {loss}', end='\r')
total_loss.append(loss)
epoch: 2500/2500, mean loss: -0.6947721838951111
Now that we’ve trained the (wrapped) model, we can plot its predictions to verify that it learned a good approximation of the base function. We’ll use some new test data generated from the same model.
test_x = np.linspace(-math.pi,math.pi,2000)[:, np.newaxis]
test_y = fn(test_x)
test_x = tf.convert_to_tensor(test_x, dtype=tf.float32)
test_y = tf.convert_to_tensor(test_y, dtype=tf.float32)
test_predictions = wrapped_model(test_x)
plt.scatter(test_x, test_y, label='Training Data', s=5, c='blue',alpha=0.5)
plt.plot(test_x, test_predictions, label='Model Prediction', c='red')
_ = plt.legend(loc='best')
Step 4: Risk Evaluation¶
Finally, let’s take a look at the wrapped model’s risk estimates. Since we used the Sculpt wrapper, we expect that the risk will represent the aleatoric uncertainty (or “volatility”) in the outputs. More specifically, it represents the standard deviation of the output.
Since we are using a Normal distribution approximation, we expect that 95% of the output data should lie within 2 standard deviations of the predicted output. We visualize and test this below.
y_pred, risk = wrapped_model(test_x, return_risk = True)
# Convert to numpy for easier plotting
x_np, y_np, y_pred_np, risk_np = test_x.numpy(), test_y.numpy(), y_pred.numpy(), risk.numpy()
plt.fill_between(x_np.ravel(), y_pred_np.ravel()+2*risk_np.ravel(),y_pred_np.ravel()-2*risk_np.ravel(),color='orange',alpha=1, label='95% Confidence Interval')
plt.scatter(x_np, y_np, label='Ground truth data', s=5, c='blue',alpha=0.5)
plt.plot(x_np, y_pred_np, label='Prediction', c='red')
plt.legend(loc='best')
within_two_std = (np.abs(y_np - y_pred) < 2 * risk_np).sum()
perc_within_two_std = within_two_std / y_np.size
print(f'{perc_within_two_std*100:.1f}% of the data points are within two standard deviations')
94.2% of the data points are within two standard deviations
The results above suggest that the Sculpt wrapper accurately captures the distribution of the outputs. In fact, since we know the true volatility (risk) function, we can evaluate the predicted risk directly, as we do below, and see that the risk estimates are nearly perfect.
plt.plot(x_np, risk_np, label='Estimated Risk', c='green')
plt.plot([-np.pi, 0, 0, np.pi/2, np.pi/2, np.pi], [0.2,0.2,1,1,0.2,0.2], 'k--', label='Actual Risk')
plt.ylim([0,None])
plt.legend(loc='best')
_ = plt.title('Sculpt Wrapper Risk (Volatility) Estimate')
Note that slight deviation between the estimated and actual volatility estimates are expected due to the stochasticity of training, the limited training data, and the inability of the model to perfectly capture the instantaneous change (step function) in volatility. With a larger model and more training data, the estimates would improve even further.