segmentation loss

From original article:

IoU loss

Focal loss

Focal loss (FL) [2] tries to down-weight the contribution of easy examples so that the CNN focuses more on hard examples.
FL can be defined as follows:
When γ=0, we obtain BCE.
This time we cannot use weighted_cross_entropy_with_logits to implement FL in Keras. We will derive instead our own focal_loss_with_logits function.
And the implementation is then:
def focal_loss(alpha=0.25, gamma=2):
  def focal_loss_with_logits(logits, targets, alpha, gamma, y_pred):
    weight_a = alpha * (1 - y_pred) ** gamma * targets
    weight_b = (1 - alpha) * y_pred ** gamma * (1 - targets)
    return (tf.log1p(tf.exp(-tf.abs(logits))) + tf.nn.relu(-logits)) * (weight_a + weight_b) + logits * weight_b 

  def loss(y_true, y_pred):
    y_pred = tf.clip_by_value(y_pred, tf.keras.backend.epsilon(), 1 - tf.keras.backend.epsilon())
    logits = tf.log(y_pred / (1 - y_pred))

    loss = focal_loss_with_logits(logits=logits, targets=y_true, alpha=alpha, gamma=gamma, y_pred=y_pred)

    return tf.reduce_mean(loss)

  return loss

Overlap measures

Dice Loss / F1 score

The Dice coefficient is similar to the Jaccard Index (Intersection over Union, IoU):
where TP are the true positives, FP false positives and FN false negatives. We can see that DCIoU.
The dice coefficient can also be defined as a loss function:
where p{0,1}n and 0p^1.
def dice_loss(y_true, y_pred):
  numerator = 2 * tf.reduce_sum(y_true * y_pred)
  # some implementations don't square y_pred
  denominator = tf.reduce_sum(y_true + tf.square(y_pred))

  return numerator / (denominator + tf.keras.backend.epsilon())
Since p is either 1 or 0, the numerator will always be one times the predicted probability of the foreground pixel (1). Hence, when p is a background pixel (0), the numerator will be 0.

Tversky loss

Tversky loss (TL) is a generalization of Dice loss. TL adds a weight to FP and FN.
Let β=12. Then
which is just Dice loss. In the paper [4], the authors square the predicted probability in the denominator, but e.g. the paper [5]keeps the term as it is.
def tversky_loss(beta):
  def loss(y_true, y_pred):
    numerator = tf.reduce_sum(y_true * y_pred)
    denominator = y_true * y_pred + beta * (1 - y_true) * y_pred + (1 - beta) * y_true * (1 - y_pred)

    return numerator / (tf.reduce_sum(denominator) + tf.keras.backend.epsilon())

  return loss


DL and TL simply relax the hard constraint p^{0,1}n in order to have a function on the domain [0,1]. The paper [6] derives instead a surrogate loss function.
An implementation of Lovász-Softmax can be found on github. Note that this loss requires the identity activation in the last layer. A negative value means class A and a positive value means class B.
In Keras the loss function can be used as follows:
def lovasz_softmax(y_true, y_pred):
  return lovasz_hinge(labels=y_true, logits=y_pred)

model.compile(loss=lovasz_softmax, optimizer=optimizer, metrics=[pixel_iou])


