Discovery of an expanded set of avian leukosis subroup E proviruses in chickens using Vermillion, a novel sequence capture and analysis pipeline.

Transposable elements (TEs), such as endogenous retroviruses (ERVs), are common in the genomes of vertebrates. ERVs result from retroviral infections of germ-line cells, and once integrated into host DNA they become part of the host’s heritable genetic material. ERVs have been ascribed positive effects on host physiology such as the generation of novel, adaptive genetic variation and resistance to infection, as well as negative effects as agents of tumorigenesis and disease. The avian leukosis virus subgroup E family (ALVE) of endogenous viruses of chickens has been used as a model system for studying the effects of ERVs on host physiology, and approximately 30 distinct ALVE proviruses have been described in the Gallus gallus genome. In this report we describe the development of a software tool, which we call Vermillion, and the use of this tool in combination with targeted next-generation sequencing (NGS) to increase the number of known proviruses belonging to the ALVE family of ERVs in the chicken genome by 4-fold, including expanding the number of known ALVE elements on chromosome 1 (Gga1) from the current 9 to a total of 40. Although we focused on the discovery of ALVE elements in chickens, with appropriate selection of target sequences Vermillion can be used to develop profiles of other families of ERVs and TEs in chickens as well as in species other than the chicken.