Skip to content

TripAdvisor Reviews + Images — Triplet Loss Tests

Step-by-step description of the process:

  • Load the data in the following format

  • Create an ID for each hotel based on hotel name for training purposes
  • Remove hotels without names and keep hotels with at least 50 reviews
  • Load GloVe
  • Create sequence of embeddings and select maximum sentence length (100)

  • Select anchors, positives, and negatives for the triplet loss. Anchor and positive have to be reviews coming from the same hotels, whereas the negative has to be a review from a different hotel
  • ||f(A) - f(P)||^2 <= ||f(A) - f(N)||^2   if A=anchor, P=positive, N=negative
    d(A,P) <= d(A,N)
  • LOSS = max(||f(A) - f(P)||^2 - ||f(A) - f(N)||^2 + α, 0)
  • COST = sum of losses for the trainings set of different triplets
  • Train a model with triplet loss in 5 epochs
  • Plot training and validation loss

  • Potentially having more epochs could reduce the loss
  • However, based on the test results we can see that triplet loss doesn't make any valuable results. Especially since we are looking at just one location, it is easier for reviews to cluster based on things they mention rather than on the hotels they are based on

4 thoughts on “TripAdvisor Reviews + Images — Triplet Loss Tests

  1. astylianou

    The fact that the training loss bounces around makes me wonder if maybe your learning rate is too high. There may be something else going on, but it might be useful to try lower learning rates. This is a really amazing set of notes about deep learning in general, but the section under "Babysitting the Learning Process" specifically dives into what happens when your learning rate is too high or too low: http://cs231n.github.io/neural-networks-3/

    One other thing to play around with is to take a subset of your data -- like ~100 examples total from a few classes -- and train on just that, to make sure you can get your loss to go to zero. If your loss can't go to zero in this overfitting case, there's something wrong. I often will do this "overfitting" test to make sure that I don't have any big, obvious bugs. It is also the case that the learning rate that gets you to zero in this overfitting case is also the right (or close) learning rate for the "full" training.

    One specific thing I noticed in your loss plot is that the loss converges to 0.2 for both train and validation. Was your α = 0.2 * batch_size? If you're pushing everything to exactly the same place, then (a-p)^2 - (a-n)^2 + α would result in a loss per triplet of 0 + α. I think I recall encountering this when the learning rate was too high, but I don't remember the specifics of why the high learning rate encourages everything getting pushed to the same place.

    Reply
  2. pless

    I think that Abby is right to be suspicious of the error converging to exactly the margin.

    I'm confused about the following:

    "Create sequence of embeddings and select maximum sentence length (100)"

    Does max. sentence length of 100 mean you use at most 100 words? letters?
    What is the sequence of embeddings that this refers to?

    Also, why are there gaps in the sequence length distribution where there are no sentences of a bunch of lengths, in a seemingly regular pattern?

    Reply
  3. anastasija

    Yes, my alpha is 0.2. All the code is in here: https://github.com/amensiko/tripadvisor_scraping/blob/master/NN/reviews_nn_triplet_loss.ipynb

    I'm not pushing everything to the same place, but rather distributing everything as anchors, positives, and negatives.
    I've never studied this before, and there is very limited information on triplet loss/siamese neural networks online, so I'm kind of just floating around trying to understand what's happening. I will play around with it more.

    Yes, the gaps in the sequence length distribution are my error, which I fixed. I'm having trouble again using some libraries, but once I figure it out I will push the updated version. That graph is pretty much only used to see the general length of reviews. 100 and below occur most frequently, so everything that might be of larger length is cut at that threshold. Whatever review has a shorter length, its embedding sequence get padded with 0's at the front.

    Reply
    1. astylianou

      "I'm not pushing everything to the same place, but rather distributing everything as anchors, positives, and negatives." That's the input to your loss function, and (you know this, but just to re-iterate) the idea is for the network to learn an embedding where the anchor and positive examples are more similar than the anchor and negative. But the "best" thing the network can learn to do (especially if the learning rate is too high) may be to push all of the training examples to embed at the same location. Can you compute the average distance between the embedded locations for different training examples (so, pick 100 or so random pairs of inputs, and compute the distance between their feature vectors)?

      I also noticed in your code that you're not defining the learning rate in this line:

      model.compile(optimizer=Adam(), loss = triplet_loss)

      It looks like the default learning rate is 0.001. I would recommend trying a few different options several orders of magnitude smaller than that:

      model.compile(optimizer=Adam(lr=0.00001), loss = triplet_loss)
      model.compile(optimizer=Adam(lr=0.000005), loss = triplet_loss)
      model.compile(optimizer=Adam(lr=0.0000001), loss = triplet_loss)

      I also want to re-up something I said when we first talked about this in person, which is that this is a *crazy* hard problem. When I look at the reviews in the spreadsheet, I cannot imagine that I could ever figure out how to map the reviews to the hotel identity.

      But nonetheless -- if you pick a small enough subset of your dataset to overfit on (so train it using the exact same code, but with just only a few classes and few examples per class), the network should be able to learn an embedding w/ 0 loss. Have you tried that test?

      Re: there not being much material about triplet loss online: you may well have already seen this, but in case not, I remember this being a useful post early on when I was trying to understand and implement triplet loss: https://omoindrot.github.io/triplet-loss

      Reply

Leave a Reply

Your email address will not be published. Required fields are marked *