https://www.youtube.com/watch?v=II63RgH0ukE
/mnt/1T-5e7/mycodehtml/Biology/SangSooKim/Bioinformatics_basic/004/001_Variant_annotation_overview/main.html
================================================================================
Step:
- identification: find variants by comparing "sample DNA sequence" to "reference sequence"
- validation: "found variant" is correctly found?
- annotation: write notes down on the found variant
================================================================================
reference_sequence=human_genome_project_DB()
variant_file=Roche_454_gsMapper_software(sample_sequence,reference_sequence)
variant_file:
- 454HCDiff file
- find all variants which exist in only sample, by comparing with "(human genome) reference sequence"
================================================================================
15 anotations:
- column1: novelty, dbSNP database has sequence or totally new one?
- column2: depth of coverage
- column3: quality score
- column4: amino acid physiochemical properties
- column5: class of change
- column6: phylogenetic conservation based on UCSC PhastCons score
- column7: geni or genomic location
- column8: zygosity
- column9: effects on splice sites
- column10: polyphen score, prediction, and effect
- column11: PDB structures for this protein or a related protein
- column12: online mendelian inheritance in man disease associations for the gene containing the variant
- column13: protein annotation including protein ID, protein function, and description
- column14: gene annotation including chromosomal location, gene name, unique identifiers, and gene function
- column15: links to expression profiles derived from the GEO compendium based on protein ID to expression profile mapping provided by NCBI
Final_form_of_variants_data=(15000_variants,15_annotations)
================================================================================
(non-synonymous) variants: which changes anino acid
(synonymous) variants: which doesn't change anino acid