• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Python: Need help with some List operations

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Sjaak

Member
Joined
Apr 26, 2004
Location
The Netherlands
Hey guys, i need some help..breaking my head over this one for two days now.

I have a list of lists, where each sub-list contains three items, a protein name, a profile name and a score. An example looks like this, sorted by the first item, the protein name:

hspData[][] (list of lists)

['1DQUA_1.pir', '1DQUA_1.hsp', '0.12']
['1DQUA_1.pir', '1F61A_1.hsp', '0.09']
['1DQUA_1.pir', '1IGWA_1.hsp', '0.06']
['1DQUA_1.pir', '1MUMA_1.hsp', '0.06']
['1DQUA_1.pir', '1PYMA_1.hsp', '0.04']
['1DQUA_1.pir', '1ZLPA_1.hsp', '0.04']
['1DQUA_1.pir', '2HJPA_1.hsp', '0.03']
['1DQUA_1.pir', '2QIWA_1.hsp', '0.03']
['Carp.pir', '1DQUA_1.hsp', '0.30']
['Carp.pir', '1F61A_1.hsp', '0.21']
['Carp.pir', '1IGWA_1.hsp', '0.18']
['Carp.pir', '1MUMA_1.hsp', '0.19']
['Carp.pir', '1PYMA_1.hsp', '0.17']
['Carp.pir', '1ZLPA_1.hsp', '0.13']
['Carp.pir', '2HJPA_1.hsp', '0.11']
['Carp.pir', '2QIWA_1.hsp', '0.09']
['Chicken.pir', '1DQUA_1.hsp', '0.12']
['Chicken.pir', '1F61A_1.hsp', '0.08']
['Chicken.pir', '1IGWA_1.hsp', '0.06']
['Chicken.pir', '1MUMA_1.hsp', '0.06']
['Chicken.pir', '1PYMA_1.hsp', '0.06']
['Chicken.pir', '1ZLPA_1.hsp', '0.06']
['Chicken.pir', '2HJPA_1.hsp', '0.06']
['Chicken.pir', '2QIWA_1.hsp', '0.04']
['Human.pir', '1DQUA_1.hsp', '0.21']
['Human.pir', '1F61A_1.hsp', '0.16']
['Human.pir', '1IGWA_1.hsp', '0.14']
['Human.pir', '1MUMA_1.hsp', '0.15']
['Human.pir', '1PYMA_1.hsp', '0.08']
['Human.pir', '1ZLPA_1.hsp', '0.06']
['Human.pir', '2HJPA_1.hsp', '0.05']
['Human.pir', '2QIWA_1.hsp', '0.04']
['Loach.pir', '1DQUA_1.hsp', '0.06']
['Loach.pir', '1F61A_1.hsp', '0.06']
['Loach.pir', '1IGWA_1.hsp', '0.05']
['Loach.pir', '1MUMA_1.hsp', '0.08']
['Loach.pir', '1PYMA_1.hsp', '0.06']
['Loach.pir', '1ZLPA_1.hsp', '0.06']
['Loach.pir', '2HJPA_1.hsp', '0.07']
['Loach.pir', '2QIWA_1.hsp', '0.07']

(One of the proteins, 1DQUA, happens to be a profile as well, this is pure coincidence).

What i need to do is as follows:

- Extract all entries from protein X and determine which one has the highest score
- Return a List of Lists that contains each protein *once* followed by the best scoring profile (only the name is needed, not the value itself)

The function so far looks like this:

# Searches the hspData generated by parseHSP and returns the best profile for each Sequence
def locateBestScore(hspData):
hspData.sort() # sorts hspData by first entry, the protein

for entry in hspData: # arbitrary, just to check contents of hspData
print entry

protList = []
resultList = []

for entry in hspData: # makes a list of all different proteins in hspData
if entry[0] not in protList:
protList.append(entry[0])

tempString = ""
tempList1 = []
tempList2 = []


i = 0
for entry in hspData:
if entry[0] in tempList1:

else:

------------

Thats where i'm stuck..how do i compile a 'separate' list of each protein first, then sort that list by the last value and return the best profile, all within the for loop that goes over the entire list of lists?

Keep in mind it has to be as generic as possible, the protein names, number of proteins and number of entries per protein will have to be able to vary and it should still work then.

Thanks in advance,

Tim
 
Your psudeocode is a little confusing (I think I understand it now :)), but what you're looking for isn't too hard to do. The code could be a lot cleaner with linked lists (since we could find/add/remove stuff without much trouble), but arrays will work:

Java-esque pseudocode
Code:
locateBestScore(hspData) {
   bestScores[][]
   foreach (entry in hspData) {
      print entry

     if (entry[0] not in bestScores) {
         //If this protein isn't in our list, add it
         bestScores.append(entry)
      }
      else {
         //The protein is in our list. Is this a better scoring profile?
         bestScoreIndex = index the protein was found at
         if (entry[2] > bestScores[bestScoreIndex][2]) {
            //We have a better profile!
            bestScores[bestScoreIndex] = entry
         }
      }
   }
   return bestScores
}

Note that as-is we're returning all the list data in hspData (not just the protein and profile) but it's not too hard to fix that ;)

JigPu
 
It's python, not pseudo code ;)

I made it into this..and it works :D

Code:
def locateBestScore(hspData):
	hspData.sort()
	
	#for entry in hspData:
	#	print entry

	tempList = []

	for entry in hspData:
		i = 0
		x = 0
		while i< len(tempList):
			if entry[0] == tempList[i][0]:
				x = 1
			i = i + 1
		if x == 0:
			tempList.append(entry)
		
		i = 0		
		while i< len(tempList):	
			if entry[0] == tempList[i][0]:
				if float(entry[2]) > float(tempList[i][2]):
					tempList[i] = entry
				
			i = i + 1
			
	return tempList

The sort() and print thingies are unnecessary but needed for debugging the other 'classes' (a def in python is kind of like a class in Java). First it makes a list where every protein and their profile and score are in *once* (the x variable is to determine whether or not it is present already, since we're operating lists inside a list, the 'in' and 'not in' functions will not work. Then it simply loops over each entry and checks if the score is better or not, and they need to be converted to Float before they can be compared.

Thanks :)
 
Back