Skip to content

After the feedback we got last week we now have a solid understanding of the concept behind triplet loss, so we decided to go ahead and work on the implementation.

We ran into lots of questions about the way the data should be set up. We look at Anastasija's implementation of triplet loss for an example. We used a similar process but with images as the data and ResNet as our model.

Our biggest concerns are making sure we are passing the data correctly and what the labels should be for the images. We grouped the images by anchor, positive, and negative, but other than that they don't have labels. We are considering using the time the image was taken as the label.

We have a theory that the labels we pass in to the model.fit() don't matter (??). This is based on looking at Anastasija's triplet loss function, which takes parameters y_true and y_pred, where it only manipulates the y_pred, and doesn't touch y_true at all.

def triplet_loss(y_true, y_pred):
    size = y_pred.shape[1] / 3

    anchor = y_pred[:,0:size]
    positive = y_pred[:,size: 2 * size]
    negative = y_pred[:,2 * size: 3 * size]
    alpha = 0.2
    
    pos_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, positive)), 1)
    neg_dist = tf.reduce_sum(tf.square(tf.subtract(anchor, negative)), 1)
    basic_loss = tf.add(tf.subtract(pos_dist, neg_dist), alpha)
    loss = tf.reduce_mean(tf.maximum(basic_loss, 0.0), 0)
    return loss

We are thinking that the loss function would be the one place that the labels (aka y_true) would matter. Thus if the labels aren't used here, they can just be arbitrary.

In Anastasija's model she adds an embedding layer, but, since we are using ResNet and not our own model, we are not. We are assuming this will cause problems, but we aren't sure where we would add it. We are still a little confused on where the output of the embedding network is. Will the embedded vector simply be the output of the network, or do we have to grab the embedding from somewhere in the middle of the network. If the embedded vector is the net's output, why do we see an 'Embedding' layer here in the beginning of the network Anastasija uses:

    model = Sequential()
    model.add(Embedding(words_len + 1,
                     embedding_dim,
                     weights = [word_embedding_matrix],
                     input_length = max_seq_length,
                     trainable = False,
                     name = 'embedding'))
    model.add(LSTM(512, dropout=0.2))
    model.add(Dense(512, activation='relu'))
    model.add(Dense(out_dim, activation='sigmoid'))
    ...
    model.compile(optimizer=Adam(), loss = triplet_loss)
 

If the embedding vector that we want actually is in the middle of the network, then what is the net outputting?

We tried to fit out model, but ran into an out of memory issue which we believe we can solve.

This week we hope to clear up some of our misunderstandings and get our model to fit successfully.