- Joined
- Apr 26, 2004
- Location
- The Netherlands
Hey guys, i need some help..breaking my head over this one for two days now.
I have a list of lists, where each sub-list contains three items, a protein name, a profile name and a score. An example looks like this, sorted by the first item, the protein name:
hspData[][] (list of lists)
['1DQUA_1.pir', '1DQUA_1.hsp', '0.12']
['1DQUA_1.pir', '1F61A_1.hsp', '0.09']
['1DQUA_1.pir', '1IGWA_1.hsp', '0.06']
['1DQUA_1.pir', '1MUMA_1.hsp', '0.06']
['1DQUA_1.pir', '1PYMA_1.hsp', '0.04']
['1DQUA_1.pir', '1ZLPA_1.hsp', '0.04']
['1DQUA_1.pir', '2HJPA_1.hsp', '0.03']
['1DQUA_1.pir', '2QIWA_1.hsp', '0.03']
['Carp.pir', '1DQUA_1.hsp', '0.30']
['Carp.pir', '1F61A_1.hsp', '0.21']
['Carp.pir', '1IGWA_1.hsp', '0.18']
['Carp.pir', '1MUMA_1.hsp', '0.19']
['Carp.pir', '1PYMA_1.hsp', '0.17']
['Carp.pir', '1ZLPA_1.hsp', '0.13']
['Carp.pir', '2HJPA_1.hsp', '0.11']
['Carp.pir', '2QIWA_1.hsp', '0.09']
['Chicken.pir', '1DQUA_1.hsp', '0.12']
['Chicken.pir', '1F61A_1.hsp', '0.08']
['Chicken.pir', '1IGWA_1.hsp', '0.06']
['Chicken.pir', '1MUMA_1.hsp', '0.06']
['Chicken.pir', '1PYMA_1.hsp', '0.06']
['Chicken.pir', '1ZLPA_1.hsp', '0.06']
['Chicken.pir', '2HJPA_1.hsp', '0.06']
['Chicken.pir', '2QIWA_1.hsp', '0.04']
['Human.pir', '1DQUA_1.hsp', '0.21']
['Human.pir', '1F61A_1.hsp', '0.16']
['Human.pir', '1IGWA_1.hsp', '0.14']
['Human.pir', '1MUMA_1.hsp', '0.15']
['Human.pir', '1PYMA_1.hsp', '0.08']
['Human.pir', '1ZLPA_1.hsp', '0.06']
['Human.pir', '2HJPA_1.hsp', '0.05']
['Human.pir', '2QIWA_1.hsp', '0.04']
['Loach.pir', '1DQUA_1.hsp', '0.06']
['Loach.pir', '1F61A_1.hsp', '0.06']
['Loach.pir', '1IGWA_1.hsp', '0.05']
['Loach.pir', '1MUMA_1.hsp', '0.08']
['Loach.pir', '1PYMA_1.hsp', '0.06']
['Loach.pir', '1ZLPA_1.hsp', '0.06']
['Loach.pir', '2HJPA_1.hsp', '0.07']
['Loach.pir', '2QIWA_1.hsp', '0.07']
(One of the proteins, 1DQUA, happens to be a profile as well, this is pure coincidence).
What i need to do is as follows:
- Extract all entries from protein X and determine which one has the highest score
- Return a List of Lists that contains each protein *once* followed by the best scoring profile (only the name is needed, not the value itself)
The function so far looks like this:
# Searches the hspData generated by parseHSP and returns the best profile for each Sequence
def locateBestScore(hspData):
hspData.sort() # sorts hspData by first entry, the protein
for entry in hspData: # arbitrary, just to check contents of hspData
print entry
protList = []
resultList = []
for entry in hspData: # makes a list of all different proteins in hspData
if entry[0] not in protList:
protList.append(entry[0])
tempString = ""
tempList1 = []
tempList2 = []
i = 0
for entry in hspData:
if entry[0] in tempList1:
else:
------------
Thats where i'm stuck..how do i compile a 'separate' list of each protein first, then sort that list by the last value and return the best profile, all within the for loop that goes over the entire list of lists?
Keep in mind it has to be as generic as possible, the protein names, number of proteins and number of entries per protein will have to be able to vary and it should still work then.
Thanks in advance,
Tim
I have a list of lists, where each sub-list contains three items, a protein name, a profile name and a score. An example looks like this, sorted by the first item, the protein name:
hspData[][] (list of lists)
['1DQUA_1.pir', '1DQUA_1.hsp', '0.12']
['1DQUA_1.pir', '1F61A_1.hsp', '0.09']
['1DQUA_1.pir', '1IGWA_1.hsp', '0.06']
['1DQUA_1.pir', '1MUMA_1.hsp', '0.06']
['1DQUA_1.pir', '1PYMA_1.hsp', '0.04']
['1DQUA_1.pir', '1ZLPA_1.hsp', '0.04']
['1DQUA_1.pir', '2HJPA_1.hsp', '0.03']
['1DQUA_1.pir', '2QIWA_1.hsp', '0.03']
['Carp.pir', '1DQUA_1.hsp', '0.30']
['Carp.pir', '1F61A_1.hsp', '0.21']
['Carp.pir', '1IGWA_1.hsp', '0.18']
['Carp.pir', '1MUMA_1.hsp', '0.19']
['Carp.pir', '1PYMA_1.hsp', '0.17']
['Carp.pir', '1ZLPA_1.hsp', '0.13']
['Carp.pir', '2HJPA_1.hsp', '0.11']
['Carp.pir', '2QIWA_1.hsp', '0.09']
['Chicken.pir', '1DQUA_1.hsp', '0.12']
['Chicken.pir', '1F61A_1.hsp', '0.08']
['Chicken.pir', '1IGWA_1.hsp', '0.06']
['Chicken.pir', '1MUMA_1.hsp', '0.06']
['Chicken.pir', '1PYMA_1.hsp', '0.06']
['Chicken.pir', '1ZLPA_1.hsp', '0.06']
['Chicken.pir', '2HJPA_1.hsp', '0.06']
['Chicken.pir', '2QIWA_1.hsp', '0.04']
['Human.pir', '1DQUA_1.hsp', '0.21']
['Human.pir', '1F61A_1.hsp', '0.16']
['Human.pir', '1IGWA_1.hsp', '0.14']
['Human.pir', '1MUMA_1.hsp', '0.15']
['Human.pir', '1PYMA_1.hsp', '0.08']
['Human.pir', '1ZLPA_1.hsp', '0.06']
['Human.pir', '2HJPA_1.hsp', '0.05']
['Human.pir', '2QIWA_1.hsp', '0.04']
['Loach.pir', '1DQUA_1.hsp', '0.06']
['Loach.pir', '1F61A_1.hsp', '0.06']
['Loach.pir', '1IGWA_1.hsp', '0.05']
['Loach.pir', '1MUMA_1.hsp', '0.08']
['Loach.pir', '1PYMA_1.hsp', '0.06']
['Loach.pir', '1ZLPA_1.hsp', '0.06']
['Loach.pir', '2HJPA_1.hsp', '0.07']
['Loach.pir', '2QIWA_1.hsp', '0.07']
(One of the proteins, 1DQUA, happens to be a profile as well, this is pure coincidence).
What i need to do is as follows:
- Extract all entries from protein X and determine which one has the highest score
- Return a List of Lists that contains each protein *once* followed by the best scoring profile (only the name is needed, not the value itself)
The function so far looks like this:
# Searches the hspData generated by parseHSP and returns the best profile for each Sequence
def locateBestScore(hspData):
hspData.sort() # sorts hspData by first entry, the protein
for entry in hspData: # arbitrary, just to check contents of hspData
print entry
protList = []
resultList = []
for entry in hspData: # makes a list of all different proteins in hspData
if entry[0] not in protList:
protList.append(entry[0])
tempString = ""
tempList1 = []
tempList2 = []
i = 0
for entry in hspData:
if entry[0] in tempList1:
else:
------------
Thats where i'm stuck..how do i compile a 'separate' list of each protein first, then sort that list by the last value and return the best profile, all within the for loop that goes over the entire list of lists?
Keep in mind it has to be as generic as possible, the protein names, number of proteins and number of entries per protein will have to be able to vary and it should still work then.
Thanks in advance,
Tim