• Welcome to Overclockers Forums! Join us to reply in threads, receive reduced ads, and to customize your site experience!

Python for-loop better way to do this (element of a list in another list)

Overclockers is supported by our readers. When you click a link to make a purchase, we may earn a commission. Learn More.

Stratus_ss

Overclockix Snake Charming Senior, Alt OS Content
Joined
Jan 24, 2006
Location
South Dakota
Here is the ugly code I am trying to fix.... it works but its hackish and expensive on big lists

Code:
for x in report_list:
...   x = x.split(",")[0].lower()
...   for y in host_list:
...    if x in y:
...     print x

report_list[0] has something like this
Code:
VN-PR-QA-MONGO2,PR,MongoDB,VM,,,,/:  Total: 48.4 Free: 37.9 Used: 8.1,/DATA:  Total: 1029.3 Free: 729.9 Used: 246.2,CentOS 5.8,2.6.18-308.4.1.el5,,,,,3.0.33-3.39.el5_8,,,2.4.2,,,,,fe6e8f364314060963daa2d21327e289a28d9d20

and the host_list[0] is something like

Code:
vn-pr-qa-mongo2.zyx.iiii.com

Essentially because the host name does not report the fqdn, I need to make sure that a machine with a certain fqdn reports its values.

Hopefully its something simple that I am just overlooking

Sets dont really work (from my understanding) and intersections also dont work because I have to manipulate the variables before I check to see if the host is in the report_list.

Thanks in advance.
 
The reassignment of x is a bit confusing to me, but anyway...

I've done some testing on things like this, and for the most part doing it the more pythonic way (as you've done) tends to be faster because you're making good use of the underlying C libraries.

If performance sucks, you might try optimizing your call to split(), since you only ever use the first token. See below...

Code:
for x in report_list:
...   first_term = x.split(',', 1)[0].lower()
...   for y in host_list:
...    if first_term in y:
...     print first_term

For example (from my python terminal):
Code:
>>> s = '1,2,3'
>>> s.split(',')
['1', '2', '3']
>>> s.split(',', 1)
['1', '2,3']
 
I am surprised that I actually did it the "pythonic" way.

I assumed that it was a bit hackish.

As for the reassignment of X it was laziness in terms of 'will this work?.... yep'


Apreciate the feedback!
 
I once ran a performance comparison between three ways of searching for a list of strings in a given input string, which is pretty similar. The first way I pre-compiled a big regex to search my list (since the strings I was searching for was a fixed set), the second way I wrote my own algorithm in native python, very similar to this (http://en.wikipedia.org/wiki/Knuth–Morris–Pratt_algorithm) and the third way was like you did, with very little code and a 'for x in list, if x in y' approach. The KMP algorithm should have been the least computationally expensive, but the three approaches with my sample set took 70, 60, and 9 seconds respectively. I concluded that the less code you write the less the interpreter has to work. It's pretty non-intuitive coming from other languages.
 
You're not doing anything with y in host_list, so why not have that as a set?
 
You're not doing anything with y in host_list, so why not have that as a set?

Because x will only be found try in y, not in host_list.

I have tried that numerous times. Basically no matter what I do, if I try

Code:
x in host_list
False

Whereas if I do

Code:
x in y
True

Therefore I need to compare x to the individual elements as represented by y
 
Yeah, the "in" semantics with strings and lists are a bit weird.

'hello' in ['hello', 'hi', 'foo'] returns true because the string is in the list
'hello' in 'hello there' returns true because 'hello there' contains 'hello'
'hello' in ['hello there', 'hi', 'foo'] returns false because 'hello' is not in the list

Sometimes I wish the operator wasn't overloaded in this way, and instead something like 'hello there'.contains('hello') could be used. That might be bias from a non-functional programming background, though.
 
Ah, I didn't look close enough at the example
vn-pr-qa-mongo2 vs vn-pr-qa-mongo2.zyx.iiii.com

Well, if the search string is always at the start of host_list (and delineated), you could pre-process that into a dictionary. Otherwise, yeah, that's probably the best you can do.
 
Last edited:
So no hashmaps in python?

If the list is ordered then you could use a binary seach to lower the amount of entries you need to check against.

In reality I would imagine the lists would have to be quite long for it to make a meaningful difference in running time.
 
Python does have hashmaps - called dictionaries (for key/value pairs) and sets (just for keys).

The issue here (which I didn't catch at first) is that he is searching for a substring within another string for 2 lists of strings, EG/
Find vn-pr-qa-mongo2 inside vn-pr-qa-mongo2.zyx.iiii.com
 
One solution could be to concentrate on the reading end... What are you reading the host list from? Could you already trim it while reading it from the file? Or process it in other ways (or even compare to report)?
 
basically what happens is that there is a main script (run_reports.py) which executes commands on remote machines (or rather asks them to execute commands and return results)

Each host returns a csv. The 'for x in y' bit is to ensure that all of the hosts in a list of hosts (fed into run_reports.py) returned a result. Its basic error checking. Because I am requesting remote resources to do something the exit code always is 0 because the request always works, whether or not the host replies is outside the result of the command. Sometimes the network times out or the remote host generates an error locally or some other such thing.

So I am implementing a simple check to make sure that all hosts have reported in, if they havent I want to know easily which ones did not reply in an allotted time period

So short answer:

host_list is predefined passed into the program as an argument
report_list is the list of hosts which has successfully reported in (as extracted by scanning the CSV file)
 
Back