PROBLM SOLVING ASSIGNMENT 2023
PROBLM SOLVING ASSIGNMENT 2023
Student Name:
Student Number:
Due Date: 25th August 2023 11.59pm.
Note: late submission will receive a penalty as per the assessment policy
Course weighting: 50%
Mark = /100
Your report should be your own work and use of artificial intelligence and paraphrasing generators are not authorised to be used in this course. All sources of information should be acknowledged by using the APA7 referencing system. Reports will be submitted via TURNITIN.
QUESTION 1 [ 8 marks total] /8
You have sent three cloned fragments for Sanger sequencing. You receive the readout and chromatogram for your unknown fragment as shown below.
Explain the quality of each sanger sequencing chromatogram. /6
Which of these sequences will you used for downstream analysis. /2
Sequence 1:
Sequence 2:
Sequence 3:
QUESTION 2 [ 6 marks total] /6
You received a plant sample for diagnostic purposes. You isolate the pathogen from the symptoms in the plant, perform DNA extraction and PCR. Then you send the sample for sanger sequencing. You receive following DNA sequence.
> Sequence of unknown isolate
CCAGATCGTATGATGGCTACCTTCTCCGTCGTTCCATCGCCAAAGGTTTCCGATACCGTTGTCGAGCCAT
ATAACGCAACTCTCTCTGTTCATCAATTGGTTGAAAACTCTGACGAGACCTTCTGTATCGATAACGAGGC
TCTTTACGATATTTGCATGAGAACCTTGAAGCTCAGCAACCCATCTTACGGAGATCTTAACCACTTGGTT
TCCGCCGTCATGTCCGGTGTCACCACCTGTCTTCGTTTCCCTGGTCAACTTAACTCAGATCTCCGAAAGT
TGGCTGTTAACATGGTTCCATTCCCCCGTCTCCATTTCTTCATGGTTGGATTTGCTCCTTTGACCAGTCG
TGGCGCACACTCTTTCCGTGCTGTCACTGTTCCAGAGTTGACTCAACAAATGTACGACCCTAAGAACATG
ATGGCCGCTTCCGATTTCCGTAACGGTCGTTACTTGACCTGCTCTGCCATTTTGTAAGTCTGCCCTATAA
TGAATCTGCCAAAATTTTGTAGATACTAACTTTATATAAGCCGTGGTAAGGTTTCCATGAAGGAGGTTGA
GGACCAAATGCGCAACGTCCAAAACAAGAACTCATCCTACTTCGTTGAGTGGATCCCCAACAACGTCCAA
ACCGCCCTTTGCTCCATTCCTCCCCGTGGTCTCAAGATGTCCTCCACCTTCGTTGGTAACTCGACATCCA
TCCAAGAACTTTTCAAGCGTGTCGGTGATCAATTCACTGCTATGT
Which gene have you sequenced? /2
What species does the sequenced gene belong to? /2
Based on your search, are you confident the sequence is unique to the pathogen, why? /2
QUESTION 3 [ 5 marks total] /5
You are a molecular biologist studying a particular eukaryotic genomic DNA sequence shown below:
5CGCATGTATTCGCACGTGTTATCACATATAAACGTTGATAGTGGTCTTCGTCGCCGAGGGGAAGACTACCATATTACTTTTTAATTG 3
You are told that this DNA sequence is known to be associated with the synthesis of a relatively short polypeptide and that it is the coding strand (which you confirm by locating a TATA box upstream of the initiation site). There are no introns in this gene.
Underline the putative TATA box in the sequence above /1
Highlight the most likely initiation codon in yellow/1
Bold the most likely stop codon/1
Determine the amino acid composition of the peptide produced and write this below using the single letter amino acid codes (See Appendix). / 2
QUESTION 4 [ 4 marks total] /4
Here is a nucleotide sequence of synthetic construct clone IMAGE:7601942.
>Image: 7601942
GTACAAAAAAGTTGGCACCATGGCTCTTTGGATGCAGTGTCTGCCCCTGGTACTTGTGCTCCTTTTCTCT
ACACCCAACACCGAAGCTCTAGCTAACCAACACCTGTGTGGGTCTCACCTGGTAGAAGCCCTGTATCTAG
TATGTGGGGATCGAGGCTTCTTCTACTACCCCAAGATCAAACGGGACATCGAACAAGCAATGGTCAATGG
ACCCCAGGACAACGAGTTGGATGGAATGCAGCTCCAGCCTCAGGAGTACCAGAAAATGAAGAGGGGAATT
GTGGAGCAATGCTGCCACAGCACATGTTCTCTCTTCCAGCTGGAGAGCTACTGCAACTTGCCAACTTTCT
TGTAC
What is the function of the gene encoded by this sequence? /1
Summarise how the Humulin was produced. Refer to this article (https://www.ncbi.nlm.nih.gov/pmc/articles/PMC8152450/) /3
QUESTION 5 [ 8 marks total] /8
You have just sequenced a short segment of DNA. You wish to analyse this DNA sequence to determine whether it could encode a protein.
5' TCAATGTAACGCGCTACCCGGAGCTCTGGGCCCAAATTTCATCCACT3'
Find the longest open reading frame (ORF). Remember, there are six possibilities./2
Label which strands on the DNA will be the sense strand, and which will be antisense when this DNA is transcribed. /2
Transcribe this ORF into mRNA, indicating the 5' and 3' ends. /2
Translate this mRNA into amino acids (See Appendix). /2
QUESTION 6 [ 4 marks total] /4
Use ClustalW for multiple alignment of these four genbank sequences MG642291.1, MG642294.1, MG642295.1 and MG642293.1. /1
What species and gene do these sequences represent? /1
Identify single nucleotide polymorphisms among the four strains? Report the positions and the nucleotide change observed. /2
QUESTION 7 [ 6 marks total] /6
In GenBank locate the following two sequences KR711648.1 and JQ929758.1. Enter these sequences into the BLAST two sequence alignment boxes. Answer the following questions.
What are these sequences and to what species do they belong? /2
What is the % identity between the two sequences? /1
What is the query coverage for the alignment? /1
How many gaps had to be created to best align the two sequences (in the region of overlap)?/1
Explain the E-value for this comparison./1
QUESTION 8 [ 5 marks total] /5
Perform a BLAST search with the following amino acid sequence:
>CAA47375
MWSWKCLLFWAVLVTATLCTARPSPTLPEQAQPWGAPVEVESFLVHPGDLLQLRCRLRDDVQSINWLRDG
VQLAESNRTRITGEEVEVQDSVPADSGLYACVTSSPSGSDTTYFSVNVSDALPSSEDDDDDDDSSSEEKE
TDNTKPNRMPVAPYWTSPEKMEKKLHAVPAAKTVKFKCPSSGTPNPTLRWLKNGKEFKPDHRIGGYKVRY
ATWSIIMDSVVPSDKGNYTCIVENEYGSINHTYQLDVVERSPHRPILQAGLPANKTVALGSNVEFMCKVY
SDPQPHIQWLKHIEVNGSKIGPDNLPYVQILKTAGVNTTDKEMEVLHLRNVSFEDAGEYTCLAGNSIGLS
HHSAWLTVLEALEERPAVMTSPLYLEIIIYCTGAFLISCMVGSVIVYKMKSGTKKSDFHSQMAVHKLAKS
IPLRRQVTVSADSSASMNSGVLLVRPSRLSSSGTPMLAGVSEYELPEDPRWELPRDRLVLGKPLGEGCFG
QVVLAEAIGLDKDKPNRVTKVAVKMLKSDATEKDLSDLISEMEMMKMIGKHKNIINLLGACTQDGPLYVI
VEYASKGNLREYLQARRPPGLEYCYNPSHNPEEQLSSKDLVSCAYQVARGMEYLASKKCIHRDLAARNVL
VTEDNVMKIADFGLARDIHHIDYYKKTTNGRLPVKWMAPEALFDRIYTHQSDVWSFGVLLWEIFTLGGSP
YPGVPVEELFKLLKEGHRMDKPSNCTNELYMMMRDCWHAVPSQRPTFKQLVEDLDRIVALTSNQEYLDLS
MPLDQYSPSFPDTRSSTCSSGEDSVFSHEPLPEEPCLPRHPAQLANGGLKRR
What protein has this amino acid sequence? /1
What is the function of this protein in human? /1
Are the homologous sequences identified mostly paralogs or orthologs? /1
Define the two terms paralogs and orthologs./2
QUESTION 9 [ 4 marks total] /4
You submit your sample for sequencing and received your file in the below format.
Which file format is this? /1
What does the four lines in the file indicate? /2
Explain the quality of the sample sequenced? /1
QUESTION 10 [ 5 marks total] /5
The quality of your Illumina read was accessed using FastQC and you received following results.
What does x-plot and y-plot refer to? /1
What does the colour code in the background refer to? /1
Explain the quality of the read in two ends of the plot. /2
What step would you take to improve the quality of the read? /1
QUESTION 11 [ 11 marks total] /11
A cost-effective qPCR-based technique was developed to detect SARS-CoV-2.
Detection of PCR products in real-time can be accomplished by using fluorescent dyes or probes. What is the difference between the two?
/2
How would you design standard curves for qPCR? /2
Explain the two plots shown above of qPCR of RdRp gene used for SARSCoV2 detection. /4
What is Cq value? /1
What are the differences between PCR and qPCR techniques. /2
QUESTION 12 [ 4 marks total] /4
You are working in a pathology lab and carry out sequencing of microbial DNA in a patient faeces sample. Based on the symptoms of the patient, you recommended three different diagnostic tests and received these sequences below.
>Sequence A
CAAGCCTGATGCAGCCATCCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGG
AAGGTGTTGTGGTTAATAACCGCAGCAATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCA
GCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTC
TGTCAAGTCGGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATTCGAAACTGGCAGGCTTGAGTCTT
GTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAG
GCGGCCCCCTGGACAAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGG
TAGTCCACGCCGTAAACGATGTCTACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGT
TAAGTAGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGC
GGTGGAGCATGTGGTTTAATTCGATGCAACGCGAAGAACCTTACCTGGTCTTGACATCCACAGAAGAATC
CAGAGATGGATTTGTGCCTTCGGGAACTG
> Sequence B
TTATTGGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCCTAATACATGCAAGTCGAGCGAA
TGGATTAAGAGCTTGCTCTTATGAAGTTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCCATAA
GACTGGGATAACTCCGGGAAACCGGGGCTAATACCGGATAACATTTTGAACCGCATGGTTCGAAATTGAA
AGGCGGCTTCGGCTGTCACTTATGGATGGACCCGCGTCGCATTAGCTAGTTGGTGAGGTAACGGCTCACC
AAGGCAACGATGCGTAGCCGACCTGAGAGGGTGATCGGCCACACTGGGACTGAGACACGGCCCAGACTCC
TACGGGAGGCAGCAGTAGGGAATCTTCCGCAATGGACGAAAGTCTGACGGAGCAACGCCGCGTGAGTGAT
GAAGGCTTTCGGGTCGTAAAACTCTGTTGTTAGGGAAGAACAAGTGCTAGTTGAATAAGCTGGCACCTTG
ACGGTACCTAACCAGAAAGCCACGGCTAACTACGTGCCAGCAGCCGCGGTAATACGTAGGTGGCAAGCGT
TATCCGGAATTATTGGGCGTAAAGCGCGCGCAGGTGGTTTCTTAAGTCTGATGTGAAAGCCCACGGCTCA
ACCGTGGAGGGTCATTGGAAACTGGGAGACTTGAGTGCAGAAGAGGAAAGTGGAATTCCATGTGTAGCGG
TGAAATGCGTAGAGATATGGAGGAACACCAGTGGCGAAGGCGACTTTCTGGTCTGTAACTGACACTGAGG
CGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGAGTGCTAAGT
GTTAGAGGGTTTCCGCCCTTTAGTGCTGAAGTTAACGCATTAAGCACTCCGCCTGGGGAGTACGGCCGCA
AGGCTGAAACTCAAAGGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATTCGAAGCAAC
GCGAAGAACCTTACCAGGTCTTGACATCCTCTGAAAACCCTAGAGATAGGGCTTCTCCTTCGGGAGCAGA
GTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCGTGAGATGTTGGGTTAAGTCCCGCAACGAGCGCA
ACCCTTGATCTTAGTTGCCATCATTAAGTTGGGCACTCTAAGGTGACTGCCGGTGACAAACCGGAGGAAG
GTGGGGATGACGTCAAATCATCATGCCCCTTATGACCTGGGCTACACACGTGCTACAATGGACGGTACAA
AGAGCTGCAAGACCGCGAGGTGGAGCTAATCTCATAAAACCGTTCTCAGTTCGGATTGTAGGCTGCAACT
CGCCTACATGAAGCTGGAATCGCTAGTAATCGCGGATCAGCATGCCGCGGTGAATACGTTCCCGGGCCTT
GTACACACCGCCCGTCACACCACGAGAGTTTGTAACACCCGAAGTCGGTGGGGTAACCTTTTTGGAGCCA
GCCGCCTAAGGTGGGACAGATGATTGGGGTGAAGTCGTAACA
>Sequence C
TAAATTGAGAGTTTGATCCTGGCTCAGGATGAACGCTGGCGGCGTGCTTAACACATGCAAGTCGAGCGAT
GAAGTTTCCTTCGGGAAACGGATTAGCGGCGGACGGGTGAGTAACACGTGGGTAACCTGCCTCATAGAGT
GGAATAGCCTTCCGAAAGGAAGATTAATACCGCATAACGTTGAAAGATGGCATCATCATTCAACCAAAGG
AGCAATCCGCTATGAGATGGACCCGCGGCGCATTAGCTAGTTGGTGGGGTAACGGCCTACCAAGGCGACG
ATGCGTAGCCGACCTGAGAGGGTGATCGGCCACATTGGGACTGAGACACGGCCCAGACTCCTACGGGAGG
CAGCAGTGGGGAATATTGCACAATGGGGGAAACCCTGATGCAGCAACGCCGCGTGAGTGATGAAGGTTTT
CGGATCGTAAAGCTCTGTCTTTGGGGAAGATAATGACGGTACCCAAGGAGGAAGCCACGGCTAACTACGT
GCCAGCAGCCGCGGTAATACGTAGGTGGCGAGCGTTATCCGGATTTACTGGGCGTAAAGGGAGCGTAGGC
GGATGATTAAGTGGGATGTGAAATACCCGGGCTCAACTTGGGTGCTGCATTCCAAACTGGTTATCTAGAG
TGCAGGAGAGGAGAGTGGAATTCCTAGTGTAGCGGTGAAATGCGTAGAGATTAGGAAGAACACCAGTGGC
GAAGGCGACTCTCTGGACTGTAACTGACGCTGAGGCTCGAAAGCGTGGGGAGCAAACAGGATTAGATACC
CTGGTAGTCCACGCCGTAAACGATGAATACTAGGTGTGGGGGTTTCAACACCTCCGTGCCGCCGCTAACG
CATTAAGTATTCCGCCTGGGGAGTACGGTCGCAAGATTAAAACTCAAAGGAATTGACGGGGACCCGCACA
AGTAGCGGAGCATGTGGTTTAATTCGAAGCAACGCGAAGAACCTTACCTACACTTGACATCCCTTGCATT
ACTCTTAATCGAGGAAATCCCTTCGGGGACAAGGTGACAGGTGGTGCATGGTTGTCGTCAGCTCGTGTCG
TGAGATGTTGGGTTAAGTCCCGCAACGAGCGCAACCCTTGTCGTTAGTTACTACCATTAAGTTGAGGACT
CTAGCGAGACTGCCTGGGTTAACCAGGAGGAAGGTGGGGATGACGTCAAATCATCATGCCCCTTATGTGT
AGGGCTACACACGTGCTACAATGGCTGGTACAGAGAGATGCAATACCGCGAGGTGGAGCCAAACTTAAAA
ACCAGTCTCAGTTCGGATTGTAGGCTGAAACTCGCCTACATGAAGCTGGAGTTACTAGTAATCGCGAATC
AGAATGTCGCGGTGAATACGTTCCCGGGTCTTGTACACACCGCCCGTCACACCATGAGAGTTGGCAATAC
CCGAAGTCCGTGAGCTAACCGCAAGGAGGCAGCGGCCGAAGGTAGGGTCAGCGATTGGGGTGAAGTCGTA
ACAAGGTAGCCGTAGGAGAACCTGCGGCTGGATCACCTCCTTT
Identify the species that each of the DNA sequences (A-C) are from
/3
Sequence A =
Sequence B =
Sequence C =
What do you recommend to the patient based on these results? /1
QUESTION 13 [ 11 marks total] /11
Here is a fasta file of a fungal mating type gene.
> mating-type gene MAT-1
ATGTCGGCCGAGGGTCTTAATGCGACCATGATACGAGACCCTACCAACGCAGAGATCGCCGAGTTCCTTG
CCAATCGCAGCGGTGCTCAAATGCTGCAGCTGATGCGCTACTACATCGACATTCCCGCCTTCAAGTCATG
GCCTATGAAGAAGCTATCCAATTTGCTTGGAATCTTGTGGGAGTCAGACCCAAACAAGCCACTCTGGTCG
TTGCTGGCAAAGGCTTGGTCCATGCTCCGTGATCAGCTCACCAAAGAGGTGGCTCCGCTGGATGAATTCT
TCGGTATCGTGTGCACTGAGCTCGGCGTGCCCCATCCCGACTGGTACCTCGAGGCCAATGGATGGATACT
GTCTCATGACCAAGATGGTAACCCAACGCTCTCGCGCGAGGCTGGTATCGAACCAACGGTTGCCACTGCT
GCAGACAAGGCGCTCTCGGTTGAAGACATCATCCGCGTTGCTCAAGGCAAACTATTCGCCATGACCTACA
AGCACTGCTTAAACGGTTCATCGACCTTCCTTGCTGAGCGCATGGCTCCACACAACACATCTCAGGATGA
CAGAACTCCTGCTGAGCTTCTTGCTTTCGAGGCAGCGCTTCAGGCTGAGATTCACAACATTCACGAGTAC
ATGATCACACATCAGAATGGGCCTGTTCATGAGACGTCTAATCACTCTGCAACTCTTCCGAATGGCGAGC
ACAATCCGCTTTACAATCAAATCATGGTTTCCCTCGCTGAAGCAGCTTCCGATGCCGATGCCGATGCCGG
TGCTCCAGCTGCCAATGTCGGTGCTCCCGCCGCTGCAGTGATTCCTGCTTTCAATGCCGGTGCTCCCGTT
GCGATCGCTGCAGGAATACCTGATTTTGATAACAGTGTTCCTGCTGCTGTTGCTGAAGGAATGTCCGATT
TCAATGACGGTGCTCCCGCGTTTGGTATGGAAACTGGATTCACCGCTCCCTCTACCCCCTTCTCCATGGG
CGAGGCCTTCCACCAGTCCACTTTTGGCAATGCCGTCCCTTATGAAAACGACGCTTTCCGTGTTGGTGCC
GACGAGGATGCCACTTTGCCTACTTTCGATGGCGCTAACAACGCCTAA
Design a primer pair that will amplify the following region of DNA using Primer-Blast NCBI. /2
How will you label this pair of primers for ordering? Write below. /1
PCR machines cycle between three temperatures.What is the purpose of each stage in the PCR cycle. /3
Design a basic PCR thermocycle protocol suitable for use with this pair of primers, indicating the temperature at each of the three PCR steps.
/3
STEP 1. _____
STEP 2. _____
STEP 3. _____
After 10 rounds of amplification approximately how many molecules of the amplified region should you have theoretically. /2
QUESTION 14 [ 10 marks total] /10
You send the bacterial DNA for whole genome sequencing using Illumina sequencing. After adaptor removal and filtering, you assembled the genome. This is the statistics of your genome assembly using BUSCO and QUAST.
What is single and pair-end reads? /2
What is contigs and scaffolds? /2
How many contigs do you have in this sequence? /1
What is the approximate genome size of your genome? /1
What does N50 stands for ? /1
What is BUSCO used for? /1
Is this good assembly? Why ? /2
QUESTION 15 [ 9 marks total] /9
OrthoVennis a web platform for comparison and annotation of orthologous gene clusters among multiple species. Analyse protein sequences of three Acetobacter bacterial species (A. cerevisiae, A. malorum, A. pasteurians).
What is the difference between core and accessory proteins in pan genomes. /1
How many core proteins are shared between three species? /1
Which two species shared the greatest number of proteins? /2
How many proteins are unique to A. cerevisiae? /1
What does GO annotation means? /1
How many GO annotation biological and molecular processes were observed in A. pasteurians? /2
Pick GO annotation number of your interest and list three genes associated with that process. /1
APPENDIX left63500
****************END OF ASSIGNMENT*************************