Screening Of Hiv1 Variant With Undefined Genotype Biology Essay

Published: November 2, 2015 Words: 4411

I have been replacing carry out with alternative words while I read through this chapter. This is because I dont think this phrase suit the genre of a thesis. However, it maybe suitable in the medical context. So please feel free to change it back.

I have noticed that you started many sentence with "Briefly", I am not quite sure what it actually means within the sentences. However, I don't understand the topic whatsoever and there maybe a meaning in this context of this word I am not aware of. Same comment applies to "Besides"

You have used "use" a lot throughout the chapter. Not sure if you would want to use some other alternative words.

Tenses - I have changed all present tense to past tense. Please go through the chapter again to see if present tense is actually necessary in those sentences.

From 2003 to 2008, the first available EDTA whole-blood samples or archived plasma samples were collected from 1045 HIV-1 serology positive patients at the Integrated Treatment Centre, Department of Health, in Hong Kong. All HIV-1 patients included in this study had never received any antiretroviral treatment and the samples were defined as treatment naïve. Routine genotypic resistance testing was requested for each sample.

For the EDTA whole blood samples, 2 tubes of 3mL EDTA whole blood sample were collected from individual patient. Plasma was separated by centrifugation at 3,000 x g for 5 minutes at room temperature. The separation procedure was performed within 8 hours after blood collection. The separated plasma samples, together with the archived plasma samples, were stored at -80OC for further testing.

Screening of HIV-1 Variant with undefined genotype

Overview

In order to study the prevalence and the genomic structure of HIV-1 URFs in Hong Kong, the nucleotide sequence of HIV-1 pol gene generated by ViroSeq HIV-1 Genotyping System version 2.0 (Celera Diagnostics, CA) or an in-house genotyping method as described previously (Chen et la., 2007) for individual patient was used as the study material to screen out the samples with undefined genotype. All the pol sequences incorporated entire protease (99 codons) and partial reverse transcriptase (276 codons) region, with a total length of 1126 base pairs. The HIV-1 genotypes were then determined by the phylogenetic analyses.

pol Phylogenetic Analyses

Reference Sequence Set used

To resolve the HIV-1 genotype from the pol sequence, a reference sequence set of HIV-1 group M (2009) in the NCBI Viral Genotyping Tool (http://www.ncbi.nlm.nih.gov/projects/genotyping) was obtained which represents subtypes A-D, F-H, J-K and CRF01-25, 28-37, 39-40, 42-43. Besides those reference sequences of group M, a reference sequence of group N (Genebank Accession Number AY532635) is also included as an out-group sequence in the phylogenetic analyses.

Multiple Alignments

The pol sequences, the reference sequences of different subtypes and CRFs, were multiple aligned by submitting the sequences to the EMBL-EBI CLUSTAL W Server (http://www.ebi.ac.uk/clustalw). The aligned sequences were further edited manually in the BIOEDIT, version 7 (www.mbio.ncsu.edu/BioEdit/bioedit.html). Only the sequence region of pol gene mentioned in Section 2.2.1 was included in the multiple alignments for both sample and reference sequences in which the effect of length in different sequences on the analyses could be ignored. Gaps in the alignment were redundant for the analyses and were therefore removed. The edited alignment was exported and saved as NEXUS file format for phylogenetic tree plotting.

Phylogenetic Tree Plotting

Using the neighbor joining (NJ) algorithms, the phylogenetic tree was constructed from the sequence alignment with PAUP*, version 4.0b10 (Hasegawa et la., 1985).. The clustering structures in the NJ tree were confirmed with 1000 bootstrap replicates and only those clusters with bootstrap values higher than 70 were considered to be reliable clusters. The NJ tree diagram was then exported to FigTree, version 1.1.2 (http://tree.bio.ed.ac.uk/software/figtree) and ADOBE ILLUSTRATOR, version CS4 (Adobe Systems Inc., CA, USA) for graphic editing.

Identification of Samples with Undefined HIV-1 genotype

From the NJ tree of phylogenetic analyses, all sample sequences grouping to the known subtypes or CRFs reference sequences with a bootstrap value higher than 70 were considered to have defined HIV-1 genotype. For those sample sequences which formed out-group from the reference sequences or grouped to reference sequences with a bootstrap value lower than 70 were defined as samples with unclear HIV-1 genotype.

env Genotyping of HIV-1 Variants with undefined genotype

Overview

For those samples presenting unclear HIV-1 genotype with pol sequence in Section 2.2.3., classical HIV-1 genotyping method targeting env gene were performed. By phylogenetic analyses of env sequences, samples clustering as an out-group to the reference sequences were selected for further analyses using full genome approach. The results from both env and pol genotyping systems were also compared to explore if any samples showed different genotypes in these two genotyping systems, which could also be selected for genomic analyses.

Protocol for env Genotyping

Viral RNA Extraction

Before viral RNA extraction, viral particles were concentrated from each plasma sample by high speed centrifugation. 500μL patient plasma was centrifuged at a high speed of 21,500 x g for 90 minutes to obtain concentrated plasma plasma sample which contained remnants of 140μL of the supernatant. Viral RNA was then extracted from this concentrated plasma using QIAamp Viral RNA Mini Kit (Qiagen, Hilden, Germany). The concentrated 140μL sample was lysed with 560μL AVL buffer containing carrier RNA. The mixture was then vortex-mixed for 15 seconds and incubated at room temperature for 10 minutes. 560μL absolute ethanol was then added to the lysate with vortexing. This mixture was afterwards loaded to the spin column and underwent centrifugation at 6,000 x g for 1 minute. This step was repeated until all lysate was loaded to and centrifuged through the spin column, which was next washed twice with washing buffer AW1 and AW2 to remove contaminates in the membrane. Following the two washing steps, the spin column was subject to an additional centrifugation at 20000 x g for 1 minute to remove all ethanol residues. The viral RNA was finally eluted from the spin column by adding 60μL AVE Buffer with incubation for 1 minute followed by centrifugation at 6000 x g for 1 minute. The purified viral RNA extract was ready to use as the template for amplification.

Primers for env Genotyping

The primers used for the amplification of the C2V3V4 region of env gp41 protein were described previously (Leung et al., 2008). All the primers for the PCR and DNA sequencing reactions were synthesized by the Sigma-Proligo (Proligo Singapore Pty Ltd., Singapore). The primers sequences were listed in Table 2.1.

Table 2.1. Primers for env C2V3V4 region amplification and sequencing

Name#

Sequence*

Product Size (bp)

Envout-F (+)

5'-ACAGTRCARTGYACACATGG-3'

714

Envout-R(-)

5'-CACTTCTCCAATTGTCCITC-3'

Envin-F (+)

5'-CTGTTIAATTGGCAGICTAGC-3'

539

Envin-R (-)

5'-RATGGGAGGRGYATACAT-3'

* IUB codes: A - Adenine; G - Guanine; C - Cytosine; T - Thymine;

I - deoxyinosine; R - A or G; Y - C or T

# (+) Sense primer ; (-) Anti-sense primer

One-Step Reverse Transcription-Polymerase Chain Reaction (RT-PCR)

The one-step RT-PCR reaction was performed using C. therm. Polymerase One-Step RT-PCR System (Roche Applied Science, Mannheim, Germany). 50μL reaction mixture contained 5μL viral RNA extract, 10μL 5x RT-PCR reaction buffer with 12.5mM MgCl2 and 10% DMSO, 2.5μL of 100% DMSO, 2.5μL 100mM DTT, 0.8μL of 10mM dNTP mix (Fermentas, Ontario, Canada), 0.5μL Protector RNase Inhibitor (Roche Applied Science, Mannheim, Germany), 2μL of 10mM sense primer (Envout-F), 2μL of 10mM anti-sense primer (Envout-R) and 2μL of C. therm. Polymerase mixture. The RT-PCR reaction was preformed using GeneAmp 9700 PCR System (Applied Biosystems, CA, USA).

The RT was undertaken at 60OC for 30 minutes. The reverse transcriptase function of the enzyme was denatured at 94OC for 2 minutes after the cDNA synthesis. This high temperature denaturation also activated the DNA polymerase. The PCR was then initialized with 35 cycles at 94OC for 30 seconds, 55OC for 30 seconds and 72 OC for 1 minute, followed by a final extension for 7 minutes at 72OC. The reaction was finally soaked at 4 OC after the reaction was completed.

The Nested PCR

One microlitre of reaction product from one-step RT-PCR was used as a template in the nested PCR reaction. The nested PCR was achieved using FastStart High Fidelity PCR System (Roche Applied Science, Mannheim, Germany). 50μL reaction mixture contained 5μL of 10x FastStart High Fidelity Reaction Buffer with 18 mM MgCl2, 3μL of 100% DMSO, 1μL of 10mM dNTP mix (Fermentas, Ontario, Canada), 2μL of 10mM sense primer (Envin-F), 2μL of 10mM anti-sense primer (Envin-R) and 0.5μL of FastStart High Fidelity Enzyme. The nested PCR reaction was preformed using GeneAmp 9700 PCR System (Applied Biosystems, CA, USA).

The nested PCR was initialized with an enzyme activation step at 95 OC for 10 minutes, followed by 30 cycles at 95 OC for 30 seconds, 55 OC for 30 seconds and 72 OC for 1 minute, plus a final extension for 10 minutes at 72 OC. The reaction was finally soaked at 4 OC after the reaction was completed.

Post- PCR Analyses and Purification

Five microlitres of the PCR product from the nested PCR reaction were used to determine if the correct target was amplified or not. This was performed by gel electrophoresis using 1.5% Tris-Borate-EDTA agarose gel with 100 bp DNA Ladder Plus Marker (Fermentas, Ontario, Canada) as a size reference.

The PCR products showing a visible band with correct band size at 539 bps on the agarose gel were purified by QIAquick PCR Purification kit (Qiagen, Hilden, Germany) according to the manufacturer's instruction. Briefly, 45μL nested PCR product was mixed thoroughly with 225μL PBI Buffer and the mixture was loaded to the QIAquick spin column and centrifuged at 17,900 x g for 1 minute. 750μL of PE Buffer was then added to the spin column and centrifuged at 17,900 x g for 1 minute. An additional centrifugation step at 17,900 x g for 1 minute was applied to the spin column to remove all ethanol residues. Finally, 50μL EB Buffer was added to the centre of the membrane of spin column with 1 minute incubation, followed by centrifugation at 17,900 x g for 1 minute. The DNA concentrations of elutes were estimated by using ND-1000 Spectrophotometer (NanoDrop Technologies, DE, USA). The purified PCR products were stored at -20 OC for further analysis.

Cycle Sequencing Reaction

For env genotyping, both sense and anti-sense directions of the purified PCR product from the nested PCR were subject to sequencing analysis. The in-house cycling sequencing reaction was performed using BigDye Terminator version 1.1 Cycling Sequencing Kit (Applied Biosystems, CA, USA). To save the reagent cost, the 1/4 dilution protocol recommended by the manufacturer was used. For the final reaction volume of 20μL, it contained 2μL pf BigDye Terminator Ready Reaction Mix version 1.1, 3μL of 5X BigDye Terminator version 1.1/3.1 Sequencing Buffer, 1μL of 3.2μM sense or anti-sense primer and 100ng of purified PCR products.

The cycling sequencing reaction was performed using GeneAmp 9700 PCR System (Applied Biosystems, CA, USA). The reaction was initialized with an enzyme activation step at 96 OC for 1 minute, followed by 25 cycles at 96 OC for 10 seconds, 50 OC for 5 seconds and 60 OC for 4 minutes. The final sequencing products were soaked at 4 OC.

Post-Sequencing Purification and DNA Sequencer Loading

The unincorporated dye terminators from cycling sequencing reaction were removed using DyeEx 96 Kit (Qiagen, Hilden, Germany). The procedure involved two centrifugation steps at 1,000 x g for 3 minutes. The first centrifugation step removed the storage buffer from the columns of the DyeEx 96 plate. After loading the sequencing products, the second centrifugation step removed the unincorporated dye terminators from the sample and the purified sequencing products were then ready for DNA sequencer loading.

Before loading sample to the DNA sequencer, 10μL purified sequencing product were mixed thoroughly with 12μL of Hi-Di Formamide (Applied Biosystems, CA, USA). The mixture was then denatured at 96 OC for 3.5 minutes followed by chilling on ice immediately. The denatured sequencing product was then loaded to ABI Prism 3130xl DNA Analyzer (Applied Biosystems, CA, USA) and the sequencing signals were captured using mobility file for BigDye Terminator version1.1.

Sequencing Analysis

The resulted sequencing electrograms of both sense and anti-sense primers for individual sample were analyzed using Staden Package, version 2003 b1 (Staden et al., 2000). By using the Pregap4 and Gap4 modules in Staden Package, the sequencing electrograms were assembled and edited manually and the consensus nucleotide sequence was exported as FASTA file format for further analysis.

Phylogenetic Analyses

Reference Sequence Set used

For phylogenetic analyses using env sequence, a reference sequence set from Los Alamos HIV Sequence Database (http://www.hiv.lanl.gov) was obtained which included major subtypes and CRFs of HIV-1 group M. Apart from those reference sequences of group M, a reference sequence of group N (Genebank Accession Number AY532635) was also included in the reference sequence set as an out-group sequence.

Multiple Alignments

Sample env sequences were aligned with the reference sequences and edited manually using BIOEDIT, version 7 (www.mbio.ncsu.edu/BioEdit/bioedit.html). The sequence regions, exlucding the C2V3V4 region of env gene in the multiple alignments, were trimmed in which only the target region for env genotyping was included in phylogenetic analyses. The edited alignment was exported and saved as NEXUS file format for phylogenetic tree plotting.

Phylogenetic Tree Plotting

The phylogenetic tree was constructed from the sequence alignment with PAUP*, version 4.0b10 (Hasegawa et la., 1985) using the neighbor joining (NJ) algorithms with 1000 bootstrap replicates. The confirmation of clustering in the NJ tree, the editing and the graphic modification of the tree diagram were the same as mentioned in Section 2.2.2.3.

Identification of Interesting Clusters and Samples for HIV-1 Full-Length genome analysis

From the NJ tree of env sequence in Section 2.3.3, any sample sequences clustering as the out-groups with high bootstrap value supported (>70) were selected out as the target URF samples for full genome analysis to study molecular characterization and recombination. Samples with special epidemiological background appearing to have chance as URF might also be selected for further analysis.

Epidemiological Analysis and Genomic Characterization of HIV-1 URFs in Hong Kong

Overview

To explore the epidemiological relationship and the genomic characterization of HIV-1 URF sample selected from Section 2.3.4, a reliable and sensitive in-house system for amplification and sequencing of nearly full-length HIV-1 genome directly from viral RNA extract is essential. In this study, a protocol of nearly full-length HIV-1 genome sequencing was validated to target the HIV-1 strain that circulating in our region, especially the URFs which may have mosaic genomic structure.

The resulted genomic sequences were then subject to phylogenetic analyses and recombination detection for the study of prevalence and genomic characterization of HIV-1 URFs in Hong Kong.

Optimization of HIV-1 Full-length Genome Sequencing System

Optimization of Reverse Transcription

Since viral RNA was used as the study material for the HIV-1 full-length genome sequencing, a good quality of complementary DNA (cDNA) is an essential element to achieving success in the full-length genome sequencing system. cDNA is synthesized from viral RNA through the process of RT and acted as the template for viral genome amplification. In this section, three different types of oligonucleotide were validated to produce a cDNA for nearly full-length viral RNA genome,

1) random hexamers,

2) Oligo dT and

3) specific primer for HIV-1 genome.

Different clinical samples with known HIV-1 genotypes that commonly found in our region were used for the optimization to evaluate the ability of these three types of oligonucleotide for priming different HIV-1 genotypes. Apart from the primers used, the amount of reverse transcriptase (200U or 400U) employed in the reaction was also evaluated to improve the efficiency of RT-PCR. A serial dilution (10000, 5000, 2000, 1000, 500 copies/mL) of a clinical HIV-1 isolate was prepared to determine the sensitivity of RT-PCR using different amount of reverse transcriptase.

Protocol for HIV-1 Full-length genome sequencing

Viral RNA Extraction

The same viral RNA extracts from the section 2.3.2 were used for HIV-1 full-length genome sequencing. Extracted viral RNA was stored at -80 OC freezers before full-length genome analyses to minimize the RNA degradation.

Reverse Transcription

The reverse transcription from viral RNA to cDNA was performed using SuperScriptTM III First-Strand Synthesis System for RT-PCR (Invitrogen, CA, USA). To optimize the reverse transcription, different types of primers and different amount of reverse transcriptase were used in the synthesis of cDNA. The reaction was performed according to the manufacturer's instruction. Briefly, the mixture containing 3μL viral RNA extract, 1μL of 50 ng/μL random hexamer / 50μM oligodT / 2μM gene-specific primer, 1μL of 10mM dNTP mix and 5μL DEPC-treated water was prepared plus an incubation at 65OC for 5 minutes followed by chilling on ice for 1 minute. Then, the cDNA Synthesis Mix containing 2μL of 10X RT buffer, 4μL of 25mM MgCl2, 2μL of 0.1M DTT, 1μL of RNaseOUT TM and 1μL / 2μL of SuperScriptTM III RT enzyme was prepared and added to the RNA/primer mixture with gentle mixing. For the reaction using random haexamer, an initial incubation at 25OC for 10 minutes was applied before the process of reverse transcription. The reverse transcription was then performed at 50OC for 120 minutes followed by 85OC for 5 minutes. The final RT product was chilled on ice immediately after the reaction. The RNA templates were then removed from the cDNA/RNA hybrid molecule by applying 1μL RNase H enzyme which carried out at 37OC for 20 minutes. The cDNA was at this time ready for PCR amplification of viral genome.

Strategy for PCR amplifications and sequencing reactions

The strategy for PCR amplifications of the viral genome was adopted from the previous publication (Nadai et al., 2008) and was showed in Figure 2.1. Briefly, the cDNA of full-length viral RNA genome was used as the template of PCR amplification. Three nested PCR reactions (F1 - F3) were performed on the different regions of the viral genome to cover the nearly full-length viral genome. The amplicon of F1 PCR was 2.6kb in size and consisted of gag and part of pol corresponding to nucleotide position 769 to 3338 of HXB2 reference sequence (Genebank Accession Number K03455). The amplicon of F2 PCR was 3.7kb in size and covered the regions of pol, vif, vpr part of tat, part of rev and vpu corresponding to nucleotide position 2483 to 6231 of HXB2. The F3 PCR amplicon was 3.3kb in size and spanned env to nef corresponding to nucleotide position 5861 to 9181 of HXB2. The primers sequences of three nested PCR reactions were listed in Table 2.2.

For the strategy of sequencing viral genome, 80 cycling sequencing reactions were applied to sequence the amplicons of three nested PCR reactions. The sequencing primers used were site-specific to different regions of viral genome, targeting either the sense and anti-sense strands of amplicons (Nadai et al., 2008). The primers of sequencing reactions were listed in Table 2.3.

Manufacturing of Primers

All primers for the PCR and DNA sequencing reactions were synthesized by the Sigma-Proligo (Proligo Singapore Pty Ltd., Singapore).

Figure 2.1. Schematic diagram of PCR amplifications cover the nearly full-length HIV-1 genome. Single RT reaction was carried to synthesize the cDNA of nearly full-length viral RNA genome. Three nested PCR amplifications were then performed to cover nearly full-length viral genome.

Table 2.2. Primers for the amplifications of nearly full-length viral genome

Name#

Sequence*

Product Size

F1

msf12b(1+)

5'-AAATCTCTAGCAGTGGCGCCCGAACAG-3'

RT3474R(1-)

5'- GAATCTCTCTGTTTTCTGCCAGTTC-3'

2.6 kb

f2nst(2+)

5'- GCGGAGGCTAGAAGGAGAGAGATGG-3'

proRT(2-)

5'- TTTCCCCACTAACTTCTGTATGTCATTGACA-3'

POLoutF1(1+)

5'- CCTCAAATCACTCTTTGGCARCGAC-3'

F2

VIF-VPUoutR(1-)

5'- GGTACCCCATAATAGACTGTRACCCACAA-3'

3.7 kb

POLinF1(2+)

5'- AGGACCTACRCCTGTCAACATAATTGG-3'

VIF-VPUinR1(2-)

5'- CTCTCATTGCCACTGTCTTCTGCTC-3'

ENVoutF1(1+)

5'- AGARGAYAGATGGAACAAGCCCCAG-3'

F3

UNINEF 7'(1-)

5'- GCACTCAAGGCAAGCTTTATTGAGGCTT-3'

3.3 kb

ENVinF1(2+)

5'- TGGAAGCATCCRGGAAGTCAGCCT -3'

nefyn05(2-)

5'- GTGTGTAGTTCTGCCAATCAGGGAA -3'

* IUB codes: A - Adenine; G - Guanine; C - Cytosine; T - Thymine;

R - A or G; Y - C or T

# (1+) Sense primer of 1st PCR ; (-) Anti-sense primer of 1st PCR

(2+) Sense primer of 2nd PCR ; (-) Anti-sense primer of 2nd PCR

Table 2.3. Primers for sequencing of nearly full-length viral genome

Amplicon

Name

Sequence (5'->3')

F1

F2NST

GCGGAGGCTAGAAGGAGAGAGATGG

DD

GTATGGGCAAGCAGGGAGCTAGAA

JL19

CTTCTATTACTTTTACCCATGC

JL17

CATTCTGCAGCTTCCTCATTGAT

HH

ATGAGGAAGCTGCAGAATGGG

II

ATAATCCACCTATCCCAGTAGGAGAAAT

POLCLO1-

GAGAGACAGGCTAATTTTTTAGGGAA

SP2AS

GGTGGGGCTGTTGGCTCTG

BJPOL1

ACAGGAGCAGATGATACAGTA

SP3AS

CCTCCAATTCCCCCTATCATTTTTGG

SP4AS

AGTATTGTATGGATTTTCAGGCCC

AZT9

TGGATGTGGGTGATGCATA

SP5S

GGATTAGATATCAGTACAATGTGC

AZT5

TCAGATCCTACATACAAATCATCCATGTATTG

AZT4

TATAGGCTGTACTGTCCATTT

proRT

TTTCCCCACTAACTTCTGTATGTCATTGACA

F2

POLinF1

AGGACCTACRCCTGTCAACATAATTGG

AZT3

CCAGGAATGGATGGACCAA

SP4S

GGGCCTGAAAATCCATACAATACT

SP4AS

AGTATTGTATGGATTTTCAGGCCC

AZT9

TGGATGTGGGTGATGCATA

POLC-

CTAGGTATGGTAAATGCAGTATA

AZT10

CCTACATACAAATCATCCATGTATTG

AZT6

CAATACATGGATGATTTGTATGTAGG

POLP

GGATGGGATATGAACTCCATCC

POLEE-

TGTATGTCATTGACAGTCCAGCTG

DGPOLF7

GGAATATATTATGACCCATCAAAAGAC

POLSEQ3

GATATGWCCACTGGTCTTGCCC

DGPOL3R

GTATTGACAAACTCCCAGTCAGGAAT

POLU

ACTTTCTATGTAGATGGGGCAGC

POLI

GAGCAGTTAATAAAAAAGGAA

POLI-

TTTGTGTGCTGGTACCCATGCCAG

POLJ

GAAGCCATGCATGGACAAGTAGA

POLT-

GCAGTCTACTTGTCCATGCATGGC

POLK

ACGGTTAAGGCCGCCTGTTGGTGG

SP1AS

GGATGAATACTGCCATTTGTACTGC

POLSEQ2

CGGGTTTATTACAGGGACAGC

DGPOL2R

CACTATTGTCTTGTATTACTAC

ACC1

TTCAGAAGTATACATCCCACTAGG

ACC2

AGGGTCTACTTGTGTGYTATAT

VIFB

ATATAGCACACAAGTAGACCCT

VIFC

GAYAAAGCCACCTTTGCCTAGTGTT

ACC6

GCTTGTTCCATCTRTCYTCTGTYAG

ACC5

TGAAACTTAYGGGGATACTTGG

Table 2.3. (con't) Primers for sequencing of nearly full-length viral genome

Amplicon

Name

Sequence (5'->3')

F2

ACC4

CCAAGTATCCCCRTAAGTTTCA

ED3

TTAGGCATCTCCTATGGCAGGAAGAAGCGG

ACC8R

TCTCCGCTTCTTCCTGCCATAG

VIF-VPUinR1

CTCTCATTGCCACTGTCTTCTGCTC

F3

ENVinF1

TGGAAGCATCCRGGAAGTCAGCCT

ED3

TTAGGCATCTCCTATGGCAGGAAGAAGCGG

GP1205-

AGAGCAGAAGACAGTGGCAATGA

ES33

CATTGCCACTGTCTTCTGCTC

Z1F

TGGGTCACAGTCTATTATGGGGTACCT

JL99

TTTAGCATCTGATGCACAAAATAG

ENVSEQ22

GTGTACCCACAGACCCCAGCCCACAAG

ZFF

GGGATCAAAGCCTAAAGCCATGTGTAA

793SEQ1

AACACCTCAGTCATTACACAGGCC

AENVSEQ4

CAAGCTTGTGTAATGGCTGAGG

E16

CCAATTCCCATACATTATTGTG

TUE3

TCCTTCTGCTAGACTGCCATTTA

E15

GTAGAAATTAATTGTACAAGACCC

OFM54

TTTAATTGTGGAGGGGAATTTTTCT

JL98

AGAAAAATTCCCCTCCACAATTAA

E13

ACAAATTATAAACATGTGGCAGG

JL102

GATGGGAGGGGCATACAT

EDS8

CACTTCTCCAATTGTCCCTCA

JL109

GTGAATTATATAAATATAAAGTAG

TUG

GTCTGGTATAGTGCAACAGCA

TUH

GCCCCAGACTGTGAGTTGCAACAGATG

FM116

CAGAGATTTATTACTCCAACTA

ZLF

GGGATAACATGACCTGGATGCAGTGGG

JL104

GGAGGCTTGATAGGTTTAAGAATA

ENVSEQ6

CCTGCCTAACTCTATTCAC

JL106

TTCAGCTACCACCGCTTGAGAGACT

E8

CTCTCTCTCCACCTTCTTCTTC

NEF7

TAAGATGGGTGGCAAGTGGTCCAAAA

JL71

TTTTGACCACTTGCCACCCAT

NEF6

AGCAGCAGATGGGGTGGGAGCAG

JL89

TCCAGTCCCCCCTTTTCTTTTAAAAA

NEFYN05

GTGTGTAGTTCTGCCAATCAGGGAA

* IUB codes: A - Adenine; G - Guanine; C - Cytosine; T - Thymine;

R - A or G; Y - C or T; W - A or T

PCRs for nearly full-length viral genome amplification

All PCR reactions for nearly full-length viral genome amplification were carried out using Expand Long Template PCR System (Roche Applied Science, Mannheim, Germany). For all 1st PCR reactions, a 50μL reaction mixture containing 5μL of cDNA acted as the template of the reaction, 5μL of 10X Expand Long Template buffer 1 with 17.5mM MgCl2, 1.75μL of 10mM dNTP mix (Fermentas, Ontario, Canada), 2μL of 10mM sense primer, 2μL of 10mM anti-sense primer and 1μL of Expand Long Template Enzyme mix. For all 2nd PCR reactions, the amounts of ingredients were the same as the 1st PCR with the exception of 2μL of 1st PCR product was used as the template.

For the 1st and 2nd PCR of F1 region, the reactions were initialized with an enzyme activation step at 94OC for 2 minutes followed by 10 cycles of 94 OC for 10 seconds, 60 OC for 30 seconds and 68 OC for 3 minutes, 20 cycles of 94 OC for 10 seconds, 55 OC for 30 seconds and 68 OC for 3 minutes, plus a final extension of 68 OC for 10 minutes. The final PCR product was soaked at 4 OC. For those PCR reactions of F2 and F3 region, the cycling condition was the same as that of F1 region, except the incubation at 68 OC in each cycle was for 4 minutes instead of 3 minutes.

Post- PCR Analysis and Purification

Five microlitres of the PCR products were used to confirm the amplicons with correct band size. This was performed by gel electrophoresis using 1.0% Tris-Borate-EDTA agarose gel with 1 Kb Plus DNA Ladder (Invitrogen, CA, USA) as a size reference. The PCR products with correct band size were purified using QIAquick PCR Purification kit (Qiagen, Hilden, Germany) and the concentration of purified products were estimated as described in Section 2.3.2.

Cycle Sequencing Reaction, Post-Sequencing Purification and DNA Sequencer Loading

Eighty sequencing reactions were performed to cover the entire viral genome, using BigDye Terminator version 1.1 Cycling Sequencing Kit (Applied Biosystems, CA, USA). The procedures of cycling sequencing reaction, post-sequencing purification and DNA Sequencer loading were the same as mentioned in Section 2.3.2.

Sequencing Analysis

The electrograms generated by 80 sequencing primers of individual sample were assembled and edited manually using Staden Package, version 2003 beta1 (Staden et al., 2000). Due to a huge amount of sequence data needed to be analyzed at the same time, a HXB2 sequence was incorporated in the sequencing analysis acting as the reference sequence to facilitate the sequence assembly and editing. The edited and assembled nucleotide sequence was exported as FASTA file format for further analysis.

Phylogenetic Analyses

Comparison of two Reference Sequence Sets for Phylogenetic Analyses of HIV-1 genome

In this study, two different reference sequence sets were compared for the phylogenetic analyses of HIV-1 genome.. Apart from the reference sequence set of Los Alamos HIV Sequence Database (Los Alamos; http://www.hiv.lanl.gov) which was commonly used in the study of hiv-1 genomic analysis, a newly introduced HIV-1 group M 2009 reference sequence set in the NCBI Viral Genotyping Tool (NCBI; http://www.ncbi.nlm.nih.gov/projects/genotyping) was also included for the analyses.

The sample sequences were aligned separately with two different reference sequence sets mentioned above through EMBL-EBI CLUSTAL W Server (http://www.ebi.ac.uk/clustalw). The aligned sequences were further edited manually in the BIOEDIT, version 7 (www.mbio.ncsu.edu/BioEdit/bioedit.html). The unrelated regions of the reference sequences which located at both ends were trimmed while Gaps in the alignment were redundant for the analyses and were therefore removed. The phylogenetic trees were then constructed from these two sequence alignments with PAUP*, version 4.0b10 (Hasegawa et la., 1985) using the neighbor joining (NJ) algorithms with 1000 bootstrap replicates. The NJ tree diagrams were modified and edited as mentioned in section 2.2.2.3.

For each sample with full genome analysis, the NJ trees from these two sequence alignments were compared to identify which reference sequence set could provide a higher resolution in HIV-1 genotyping using genomic sequence. Besides, any sample(s) that could not be clustered with known subtypes or CRFs reference sequences or formed an out-group in the both NJ trees from the two reference sequence sets were defined as HIV-1 URFs.

Epidemiological Analysis of HIV-1 URFs

For all samples subject to phylogenetic analyses using nearly full-length viral genome, the epidemiological background was traced back to the Integrated Treatment Centre, Department of Health, in Hong Kong. This included data of gender, age, ethnicity, route of transmission, date of first HIV serology positive, plasma sampling date, viral load and CD4 cell count. By combining this data and the results of phylogenetic analyses, the possibility of HIV-1 URFs transmission in Hong Kong was investigated

Genomic Characterization of HIV-1 URFs

For the samples identified as HIV-1 URFs in section 2.4.4, bootscanning analyses were performed using SimPlot software (Salminen et al., 1995) to identify the recombination breakpoints and determine the molecular structure of the recombinants. The analyses were based on the alignment used for phylogenetic analyses from section 2.4.4 in which methods of phylognentic analyses from the PHYLIP package were applied on a moving window of 200 bps along the alignment with 50 bps increments. The bootstrap values for the sample sequences were plotted at the midpoint of each window. The recombination breakpoints could be determined directly from the plot of bootstrap value with different reference sequences.