https://www.youtube.com/watch?v=II63RgH0ukE /mnt/1T-5e7/mycodehtml/Biology/SangSooKim/Bioinformatics_basic/004/001_Variant_annotation_overview/main.html ================================================================================ Step: - identification: find variants by comparing "sample DNA sequence" to "reference sequence" - validation: "found variant" is correctly found? - annotation: write notes down on the found variant ================================================================================ reference_sequence=human_genome_project_DB() variant_file=Roche_454_gsMapper_software(sample_sequence,reference_sequence) variant_file: - 454HCDiff file - find all variants which exist in only sample, by comparing with "(human genome) reference sequence" ================================================================================ 15 anotations: - column1: novelty, dbSNP database has sequence or totally new one? - column2: depth of coverage - column3: quality score - column4: amino acid physiochemical properties - column5: class of change - column6: phylogenetic conservation based on UCSC PhastCons score - column7: geni or genomic location - column8: zygosity - column9: effects on splice sites - column10: polyphen score, prediction, and effect - column11: PDB structures for this protein or a related protein - column12: online mendelian inheritance in man disease associations for the gene containing the variant - column13: protein annotation including protein ID, protein function, and description - column14: gene annotation including chromosomal location, gene name, unique identifiers, and gene function - column15: links to expression profiles derived from the GEO compendium based on protein ID to expression profile mapping provided by NCBI Final_form_of_variants_data=(15000_variants,15_annotations) ================================================================================ (non-synonymous) variants: which changes anino acid (synonymous) variants: which doesn't change anino acid