Credit: Credit: G. Savcisens et al.
A schematic individual-level data representation for the life2vec model. (A) We organize socioeconomic and health data from the Danish national registers from 1st January 2008 until 31st December 2015 into a single chronologically ordered life-sequence. Each database entry becomes an event in the sequence, where an event has associated positional and contextual data. The contextual data include variables associated with the entry (e.g., industry, city, income, job type). The positional data includes the person’s age (expressed in full years), absolute position (number of days since January 1st, 2008). The raw life-sequence is then passed to the model described in panel (B). The model consists of multiple stacked encoders. The first encoder combines contextual and positional information to produce a contextual representation of each life event. The following encoders output deep contextual representations of each life event (considering the overall content of the lifesequence). The final encoder layer fuses the representations of life-events to produce the representation of a life-sequence. The decoder uses the latter to make predictions.