When this article was originally published in Psychological Medicine it contained a couple of errors in the methods section of the manuscript and Supplementary Table 1.
In the following paragraph in the methods section under the “Predictor engineering and lookbehind window” subheading:
“This model is bound by a maximum input sequence length of 512 tokens. For each patient, the first 512 tokens from each clinical note within the 180 days lookbehind prior to a prediction time were extracted and input to the model, yielding a contextualized embedding of the text with 384 dimensions.”
“512 tokens” has been replaced with “128 tokens”, such that the paragraph will read as follows:
“This model is bound by a maximum input sequence length of 128 tokens. For each patient, the first 128 tokens from each clinical note within the 180 days lookbehind prior to a prediction time were extracted and input to the model, yielding a contextualized embedding of the text with 384 dimensions.”
In Supplementary Table 1, under “Text features/embeddings”, the following text:
“Embeddings, each with 384 dimensions, were generated from the first 512 tokens of each note within the 180 days window These were then averaged to create an aggregate embedding with 384 dimensions.”
Has been replaced with:
“Embeddings, each with 384 dimensions, were generated from the first 128 tokens of each note within the 180 days window. These were then averaged to create an aggregate embedding with 384 dimensions.”
The authors apologise for this error.