- Joined
- Apr 26, 2004
- Location
- The Netherlands
Heyo!
I know i could probably ask this question on a maths-related forum or browse through a long list of papers and books on it to try and condense what i need..but it's a real simple problem and i just like these forums alot.
I'm doing research on proteins atm and have written a script that outputs a vector for every sequence filled with values depending on certain aspects of that sequence:
Sequence1 = [0.2,0.4,0,0,1,1,0.2..etc]
Each vector will contain 344 values which range between 1 and 0 to about 3 decimals (unsure atm but better be flexible). What i need to do is compare each number in the vector with the corresponding number in the other vector in turn and calculate the 'distance' between them, where equal numbers = smallest distance and one being 1 and the other 0 = largest difference.
Goal would be to input two sequences, and recieve a single number between 0 and 1 to indicate how well they correlate, 0 being identical and 1 being very different. Language to be used = python..
I was thinking of looping through the sequences at first, removing any values that 0 are in both (these will have no biological value). Then, award points based on the difference between the remainder - but how can i translate this into a single 'score' in the end?
Tia
- Sjaak
I know i could probably ask this question on a maths-related forum or browse through a long list of papers and books on it to try and condense what i need..but it's a real simple problem and i just like these forums alot.
I'm doing research on proteins atm and have written a script that outputs a vector for every sequence filled with values depending on certain aspects of that sequence:
Sequence1 = [0.2,0.4,0,0,1,1,0.2..etc]
Each vector will contain 344 values which range between 1 and 0 to about 3 decimals (unsure atm but better be flexible). What i need to do is compare each number in the vector with the corresponding number in the other vector in turn and calculate the 'distance' between them, where equal numbers = smallest distance and one being 1 and the other 0 = largest difference.
Goal would be to input two sequences, and recieve a single number between 0 and 1 to indicate how well they correlate, 0 being identical and 1 being very different. Language to be used = python..
I was thinking of looping through the sequences at first, removing any values that 0 are in both (these will have no biological value). Then, award points based on the difference between the remainder - but how can i translate this into a single 'score' in the end?
Tia
- Sjaak