The human endogenous retrovirus family HTDV/HERV-K codes for the viral particles observed in teratocarcinoma cell lines. Two types of proviral genomes exist; these differ in the presence or absence of a stretch of 292 nucleotides. This sequence comprises the amino-terminal part of the env gene, the putative signal peptide, which overlaps in part with the carboxy terminus of the pol gene. Type 2 genomes containing this sequence presumably more closely reflect the structure of the infectious, replication-competent retrovirus ancestors of the HERV-K family than do type 1 genomes that lack the sequence. In human teratocarcinoma cell lines, both variants are expressed. Type 1 genomes, in which pol and env genes are fused, are deficient in splicing. Type 2 transcripts are spliced to subgenomic env mRNA and smaller messages. A doubly spliced transcript encodes a short open reading frame, preliminarily designated cORF (R. Löwer, K. Boller, B. Hasenmeier, C. Korbmacher, N. Mueller-Lantzsch, J. Löwer, and R. Kurth, Proc. Natl. Acad. Sci. USA 90:4480-4484). The genomic organization of cORF resembles that of nonprimate lentivirus rev genes: the first exon comprises nearly the entire signal peptide of env, and the second exon is derived from a different reading frame in the 3' part of the genome. A nucleolar localization signal, which is also a putative RNA binding domain, as well as a sequence with similarities to the Rev effector domain consensus sequence is present in the first exon. Secondary structure analysis reveals similarities to basic helix-loop-helix proteins. cORF is a small protein with an apparent molecular mass of 14 kDa which accumulates in the nucleolus as has been described for Rev proteins.