Complete Outfit Recommendations via TensorNet

« According to a Trunkclub report examining the wardrobes of 2,000 men and women, ‘wardrobe panic’ is experienced 36 times a year by an average person because they have the feeling they have nothing to wear. Another study made by the International Journal of Clothing Science and Technology shows that women can try on almost 20 pairs of jeans before making a complete outfit. Finding a complete outfit has become an important issue and firms aim at finding a solution to this problem and adapting to their clients’ needs. Fashion recommendation exists to provide a response to this problem and to understand the visual and aesthetic compatibility among objects. There are three main challenges in fashion recommendation: the multi-item compatibility, the exploitation of the visual signals (local and global), and the scalability (faster and cheaper machine learning model in time and memory).

Currently, the French luxury company Louis Vuitton uses different recommendation systems to advise their clients and help their sales advisors. Recommendation can be based on business rules. In fact, the website and sales advisors can recommend products according to their popularity or the latest research of the client. Another way to do recommendations is visual similarity. For example, if a client clicked on a bag, the website can advise him to look at other bags that are similar. To do this, Louis Vuitton uses a neural network to embed items. Finally, Louis Vuitton has a very advanced recommendation system called wardrobing. It generates complete looks with clothes and accessories. We will focus on this specific recommendation system because the goal is to improve it thanks to the TensorNet model. The TensorNet model allows to extend the model of recommendation by advising a global look (full silhouette) with the notion of style and compatibility. The compatibility is dual: whether local or global. It can provide a new framework of test in the recommendation and to a certain extent lead to an automatization of wardrobing.

Given the advances in recommendation systems, especially fashion recommendation, how can we automate, scale and advise millions of users ?

We will introduce the model TensorNet made by Huiyuan Chen, Fei Wang, Yusan Lin and Hao Yang and described in their article “Tops, Bottoms, and Shoes: Building Capsule Wardrobes via Cross-Attention Tensor Network” in Chen et al. (2021). This model was presented at the RecSys 2021, a conference about recommendation systems. It responds to the following question: Given an inventory of clothing items, how can one assemble them together to be as fashionable as possible? This is a visual recommendation challenge. TensorNet is an innovative approach that gives advice for full-body outfits that match. It studies the visual compatibility of a triplet composed of tops, bottoms, and shoes. TensorNet goes further than other recommendation systems because it is able to determine the global compatibility of the entire outfits as well as the local compatibility (for example between two areas of the clothing items). To determine the local compatibility, the authors divide the challenge in three parts: a region-aware feature extraction, cross-attention message passing for paths (top, bottom, and shoes), and visual gated units to filter irrelevant signals. Then, they determine 1 the global compatibility of the outfit using a wide and deep tensor network. TensorNet is ground-breaking and this new recommendation system is a real opportunity for fashion sale advisors and can help recommend complete outfits for clients in real life and online.

Firstly, we will present the context of our article and the important notions of personalization, compatibility, and tensor. Then, we will introduce how the local compatibility is determined thanks to the region-wise feature map approach, the cross-attention message passing, and the visual gated units. Secondly, we will deal with global compatibility captured by a Wide and Deep Neural Tensor. Finally, we will see if some improvements can be made to the model. »