Many genes important during early development in vertebrates are regulated by sequences located at large distances from the protein coding region. Clues to the location of these long-range gene-regulatory elements can be obtained from comparing genomic sequences of evolutionarily distant species such as the mouse and human. However, identifying them functionally remains a major challenge. Analysis of distal regulatory sequences is important not only for a complete understanding of regulation of the gene in specific tissues, but also for exploring mechanisms and possible therapeutic strategies for diseases linked to variations in those sequences. Polymorphisms existing in far away regulatory sequences that are linked statistically to a disease can be associated with a gene only when such sequences are functionally implicated in regulating the expression of that gene. The mechanistic pathway that connects the disease to the malfunction of the gene can then be identified, and possible therapeutic interventions explored. A strategy that uses transgenic mice developed with a GFP-reporter gene tagged BAC clone to functionally identify such long-range regulatory sequences in the cardiac specific Nkx2-5 gene is illustrated. A combinatorial approach using the full length Nkx2-5 GFP-BAC and several of its truncations, chosen on the basis of cross-species genomic sequence alignment of highly conserved non-coding DNA, as transgenes helped delimit the boundaries of transcriptional regulation to sequences 27 kb upstream of the Nkx2- 5 gene. Identifying sequences involved in the regulation of genes distant to them are discussed in view of their potential for exploring mechanistic pathways for disease and possible therapeutic interventions.
Keywords: Polymorphisms, BAC clone, P1-derived artificial chromosomes, post genome-sequencing, locP Transposons, Nkx2-5 gene