MCDB231L RNAseq data analysis Assignment

Subject Code :
MCDB231L

To integrate all the data analysis steps you have learned, your final assignment consists of the analysis of actual RNAseq data, all the way from the raw reads to the differential gene expression, and write a brief report on your findings. You should have most of the scripts from the assignments we did over the course of the semester. The final analysis and the report are individual work.

The data files you will get for this are:

control read files
experimental read files (tissue dissections)
some information on the intended dissection
a planarian transcriptome
a transcriptome annotation table

If you are taking the course for graduate credit please make sure that you have contacted me about the datasets you are planning to use instead. In this case you will also have to find your own transcriptome (or in a few cases genome) to map to.

As a reminder, the steps to take for the analysis are the following:

quality control to identify any potential issues (fastQC)
trimming and filtering to remove poor quality data (cutadapt)
generate a transcriptome index (bowtie2)
mapping the reads to the transcriptome (bowtie2)
summarize the mapping to obtain a counts table (samtools)
calling differential [removed]DESeq2)
merging in the annotations (R)
making some plots or other summaries of the data (R)

The most typical analysis will be the comparison between the gene expression in one experimental condition and the control, but if you wish you can include multiple experimental conditions, or include other aspects.

In the report, make sure to include the following sections:

Abstract

Brief summary of what you did and what you found. Typically no more than 300 words.

Introduction and Aim

Briefly describe the background to the project, what type of analysis you are planning to do, and what you hope to learn from this analysis; ~1.5 page.

Sample description

What samples you have used for your analysis (what organism, what treatments, what might you expect); ~0.5 page.

Data processing

What steps you have taken in the cleanup and processing (and include plots or statistics where informative); ~0.5 page.

Differential expression analysis

What comparison are you making for the differential expression? What stands out? Show informative plots or summaries of the resulting data

Typically ~4 pages.

Conclusions

Your interpretation of the outcome, ideally tying it back to the introduction and aim; ~1.5 page.

(Page indications are for single spaced 12pt font, 0.75’’ margins)

Please also submit your annotated R code for the analysis. This can include more graphing and analysis than what made it into your report.

The report and script are due at the end of finals week, on Dec 21 at 5pm, by email to josien.van.wolfswinkel@yale.edu and yan.cheng@yale.edu. Please include your name in the names of the documents. We will start evaluating the reports after the Xmas weekend, and only look at the last submitted version.

read files

The files provided are bulk RNAseq data from the planarian

Schmidtea mediterranea

. All libraries are generated from polyA-selected RNA. There is whole animal RNAseq (control), and several tissue subsets (“treatments”).

The sequencer output (fastq.gz) files are located in

/gpfs/ycga/project/mcdb231l/mcdb231l_jv434/dataFiles

There is a list of 25 files in this folder. control dataset:

These are 3 files that all start with “ctl_” There are 3 replicates. All are just single end.

experimental datasets:

Naming for the experimental files is as follows:

sample1_rep1_1

sample1

_rep1

.fastq.gz

sample name

replicate

read (of pair)

For each of the conditions there are replicates, but the number varies: most have 3 replicates, but samples 1 and 5 have only 2 replicates. Some are single end, some are paired end. You can choose whether to do a paired end or single end analysis.

Each of the conditions is a tissue isolation of unknown purity. Potential intended tissue isolations are intestine, epidermis, stem cells, pharynx, and brain. Stem cells have been isolated by Fluorescence Assisted Cell Sorting (FACS); the other tissues have been isolated by microsurgery. Once you have completed your analysis you can suggest what isolation you think the sample was, and comment on its predicted purity based on the RNAseq

Order New Solution

Uploaded By : Katthy Wills
Posted on : January 10th, 2023
Downloads : 0
Views : 404

Order New Solution

Choose a Plan

Premium

80 USD

All in Gold, plus:
30-minute live one-to-one session with an expert
- Understanding Marking Rubric
- Understanding task requirements
- Structuring & Formatting
- Referencing & Citing

Most
Popular

Gold

30 50 USD

Get the Full Used Solution
(Solution is already submitted and 100% plagiarised.
Can only be used for reference purposes)

Save 33%

Silver

20 USD

Journals
Peer-Reviewed Articles
Books
Various other Data Sources – ProQuest, Informit, Scopus, Academic Search Complete, EBSCO, Exerpta Medica Database, and more

MCDB231L RNAseq data analysis Assignment

Order New Solution

Order New Solution

Choose a Plan

Premium

Gold

Silver

Request a Call Back