GEO data unlocks vs paper publication
Occasionally, when a paper is published with a new interesting dataset, the GEO accession is locked as ‘private’ until a certain release date. When you submit a dataset, you are asked to specify a release date at some point in the next four years. Once a paper is published, the submitter needs to sign in to GEO and move up the release date to make the data public.
I don’t encounter this very often, but it does happen from time to time. I wondered how common delayed release dates are. By cross-referencing GEO submission and availability data with PubMed data about paper publication times we can compare how data availability dates relate to the publication dates.
More than half of datasets are available even before the paper is published by a journal, with a peak of datasets being made available between journal acceptance of a paper and the publication of the paper. Within a week of paper publication, 71% of datasets are available.
There is a tail of 18% of datasets that aren’t released for more than a month past the publication of the paper.
Digging in to the journals these papers and data accession links were sampled from, there is a gradient of how many datasets have delayed availability.
Clinical journals appear to be enriched for delayed data releases, with about 30% of datasets not being available within a month of publication.
Strong data sharing policies seem to be enforced by corresponding journal editors. Journals that explicitly dictate data must be available only have ~5% of datasets with releases delayed for over a month. I suspect these are cases where journal editors remind the authors to sign in to GEO to unlock the data. Between all the chaos involved with finishing a publication, it’s easy to forget about the data submission you did over half a year earlier.





