Abstract: Diverse inbred mouse strains are among the foremost models for biomedical research, yet genome characterization of many strains has been fundamentally lacking in comparison to human genomics research. In particular, the discovery and cataloging of structural variants is incomplete, limiting the discovery of potentially causative alleles for phenotypic variation across individuals. Here, we utilized long-read sequencing to resolve genome-wide structural variants (SVs, variants ≥ 50 bp) in 20 genetically distinct inbred mice. We report 413,758 site-specific SVs that affect 13% (356 Mbp) of the current mouse reference assembly, including 510 previously unannotated variants which alter coding sequences. We find that 39% of SVs are attributed to transposable element (TE) variation accounting for 75% of bases altered by SV. We then utilized this callset to investigate the impact of TE heterogeneity on mouse embryonic stem cells (mESCs), and find multiple TE classes that influence chromatin accessibility across loci. We also identify strain-specific transcription start sites originating in polymorphic TEs that modify gene expression. Our work provides the first long-read based analysis of mouse SVs and illustrates that previously unresolved TEs underlie epigenetic and transcriptome differences in mESCs.

Journal Link: 10.1101/2022.09.26.509577 Journal Link: Publisher Website Journal Link: Download PDF Journal Link: Google Scholar