Background: For the plant pathogenic phytoplasmas, as well as for several fastidious prokaryotes, axenic cultivation is extremely difficult or not possible yet; therefore, even with second generation sequencing methods, obtaining the sequence of their genomes is challenging due to host sequence contamination.
Objective: With the Phytoassembly pipeline here presented, we aim to provide a method to obtain high quality genome drafts for the phytoplasmas and other uncultivable plant pathogens, by exploiting the coverage differential in the ILLUMINA sequences from the pathogen and the host, and using the sequencing of a healthy, isogenic plant as a filter.
Validation: The pipeline has been benchmarked using simulated and real ILLUMINA runs from phytoplasmas whose genome is known, and it was then used to obtain high quality drafts for three new phytoplasma genomes.
Conclusion: For phytoplasma infected samples containing >2-4% of pathogen DNA and an isogenic reference healthy sample, the resulting assemblies can be next to complete. The Phytoassembly source code is available on GitHub at https://github.com/cpolano/phytoassembly.
Keywords: ILLUMINA, Candidatus phytoplasma, NGS, Second generation sequencing, Endophytes, Genome draft.