https://www.yceffort.kr/2018/05/31/Levenshtein-distance/
================================================================================
Levenshtein Distance:
- Differences as "insertion", "deletion", "substution" are calculated as "distance"
- Strings which are compared do not have to be the same length
Hamming Distance:
- How many words-replacements are needed to make 2 words same
- 2 words should have same length
- So, not very useful
Smith-Waterman:
- Split word into all possible-length
- And compare them
Sørensen-Dice Coefficient:
- Try to make "groups" from 2 words, to compare 2 words
================================================================================
Idea of Levenshtein distance
- To compare "2 words" to see how they are similar
- you can calculate "distance" which means the steps for making "2 words" same
- If 2 words are already very similar, small number of steps will be required
- Steps are composed of "inserting new character", "deleting some character", "substituting some character with other one"
================================================================================
Let's make "ghost" equal to "toast"
- Step1, substituting g with t: ghost -> thost
- Step2, substituting h with o: thost -> toost
- Step3, substituting o with a: toost -> toast
out_dist=Levenshtein_distance("ghost","toast")
# out_dist: 3
================================================================================
def Final_Levenshtein_distance():
dist_val=get_levenshtein_distance(longer_word,shorter_word)
final_levenshtein_dist=(longer_word_len-dist_val)/longer_word_len
return final_levenshtein_dist
================================================================================
Usecase of Levenshtein distance
- Comparison "strings"
- Comparison "words"
- Comparison "gene sequence"
- etc