I have been tempted to try word2vec-like techniques on e-commerce shopping carts...

cgearhart · on March 27, 2022

It works…for some definition of “works”. It’s been applied to all kinds of problems—including graphs (Node2Vec) and many other cases where the input isn’t “words”—to the point that I’d consider it a weak baseline for any embedding task. In my experience it is unreasonably effective for simple problems (make a binary classifier for tweets), but the effectiveness drops quickly as the problem gets more complicated.

In your proposed use case I would bet that you will “see” the kind of similarity you’re looking for based on vector similarity, but I also expect it to largely be an illusion due to confirmation bias. It will be much harder to make that similarity actionable to solve the actual business use case. (Like 30% of the time it’ll work like magic; 60% of the time it’ll be “meh”; 10% of the time it’ll be hilariously wrong.)

rdedev · on March 27, 2022

Ive been looking at a way to use transformer based models on tabular data. The hope is that these models have a much better contextual understanding of words. So embeddings from these models should be of better quality than just word2vec ones

kevin948 · on March 27, 2022

Same here. Find any good resources? I've been leaning on auto-encoders to encode better than word-2-vec and its ilk.

VHRanger · on March 27, 2022

Network node embeddings are the best for tabular data. I maintain a library on it here, but there's plenty of good alternatives:

https://github.com/VHRanger/nodevectors

rdedev · on March 28, 2022

My idea is to use make a table row into a textual description and feed it into a transformer and get a get effectively a sentence embedding. This is effectively a query embedding. Then make a couple of value embeddings for the target you are trying to predict and use cosine similarity to predict the right value embedding and feed that to the ml model as part of the feature set. It works if the categorical values in your table are entities that the model might have learned.

I tried this approach and it did improve the overall performance. The next step would be fine tuning the transformer model. I want to see if I could do it without disturbing the existing weights too much. Here's the library I used to get get the embeddings

https://www.sbert.net/

VHRanger · on March 27, 2022

For sparser data, you should just do normal network node embeddings.

Look into node2vec libraries for instance

OccamsRazr · on March 27, 2022

You may find this airbnb paper relevant. They use skip-grams to generate feature vectors for their listings.

https://www.kdd.org/kdd2018/accepted-papers/view/real-time-p...

random314 · on March 27, 2022

I have applied it to ecommerce shopping carts and it works quite well :). The itemids(words) viewed in sequence in a session can be thought of as a sentence.

laughy · on March 27, 2022

I have applied it to the names in a population database. It learnt interesting, and expected structure. Visualized with UMAP it clustered by gender first, and then something that probably could be described as cultural origin of name.