https://www.youtube.com/watch?v=y42KBKvanRs&list=PLaE61CK5r6_l2fxVp3r3OP0fgTSTdQUoQ&index=2&t=727s ================================================================================ NGS sequencers ================================================================================ Application of NGS data ================================================================================ A high-level overview of NGS data processing - Base calling - Get FASTQ file - Perform alignment or assembly - Get SAM/BAM file - Perform variant calling - Get VCF file ================================================================================ Scope and schema of the Best practices - If you analyze 100 number of cancer patients, sample becomes 100 number of ones - You can see 100 number of samples in the left - Non-GATK: they can't be performed by GATK ================================================================================ FASTQ: raw unaligned reads 1 recors takes 4 lines 1th line: identifier, title, sequence name (read's name, group, etc) 2nd line: sequence data (short in the case of NGS) 3rd line: + (optional, sequence name again) 4th line: associated quality score ================================================================================ - How to read quality line is different per seqeuncers ================================================================================ from Bio import SeqIO for record in SeqIO.parse("test1.fq","fastq"): print(record.format("qual")) print(record.letter_annotations) record.format("fastq-sanger") # This is same format as fastq record.format("fastq-solexa") record.format("fastq-illumina") ================================================================================ from Bio import SeqIO g=SeqIO.parse("sample_1.fq","fastq") record=g.next() print(record.format("fasta")) print(record.format("fastq")) record.letter_annotations record.format("qual") ================================================================================ The BAM format stores aligned reads and is technology independent ================================================================================ BAM headers: an essential part of a BAN file ================================================================================ To deal with BAM file, use SAMtools ================================================================================ PySAM ================================================================================ ================================================================================ VCF files store variant information ================================================================================ VCFtools ================================================================================ PyVCF ================================================================================ Generic Sequence Format (GSF) Version 3 ================================================================================ Parse GFF format file ================================================================================ bedtools ================================================================================ pybedtools ================================================================================ ================================================================================ Primer design tool