Writing fast python code

I did the mistake to write a time critical software in python.

The script does the following:

listen on a named pipe
get relevant information with json[keyname]
get specific parts of the information with regex
check the result against a whitelist
e.g. if stringTest is in listOfStringsTest:
fire up a http request
again json
fire up again a http request


A question to the experienced python programmer:
what parts of the script wastes too much time ?

Other urls found in this thread:

marinamele.com/7-tips-to-time-python-scripts-and-control-memory-and-cpu-usage
nim-lang.org/
twitter.com/NSFWRedditImage

the two http requests will absolutely dwarf the rest of the code

the last http request is the 'goal' of the script.
i was able to subtract the first one while writing this thread.
how long does a "is in" need ?
150 entrys in the list and each ~8 bytes

would i profit from rewriting it in c ?

>would i profit from rewriting it in c ?
only if you're running this program on a Pentium Pro from 1997

Cython

use pypy

Program is I/O bound. Choice of language won't help you much.

thanks at least i wont need to write it again

will still try this, i know i wont help but i will try it

>get relevant information with json[keyname]
>get specific parts of the information with regex
>check the result against a whitelist
>e.g. if stringTest is in listOfStringsTest:

These could be really naive and dumb. Regex can be very slow if you're abusing it for something that you should be doing with a json parsing library.

If your program is comparing a string to a list of strings one by one that is stupid and slow.
Use whatever python calls a hashset or dictionary or similar, and its an order of magnitude faster. That's assuming you only care about unique strings in your list of strings and don't want duplicates for some reason. Even then maintaining a dictionary with the string hashed against a count of occurances would be faster than a naive list.

check them all first, then open one tcp connection and fire all needed http requests with that single connection, it depends also on the speed and rate limiting of the server if applied

Use requests/lxml. Didn't read full post til now, but I make a living off web scraping. There's no real way to make it 'fast' it's entirely dependent on your dl speed. if you are looking for something you want to run forever I'd recommend perl

better yet put it on AWS or the cloud.

use xpaths and NoSQL to query/store your data.

It is already, i have a 0,7ms ping to the destinstion server
Doed that mean its in the same center ?

>what parts of the script wastes too much time ?

Profile your program so you actually know which parts are slow. Don't guess based on what anonymous strangers on the internet say.

yeah, post a paste or something so we can actually help you.

I dont have permission to post it.
I do it for a customer and the code is not generic.
I think that could help!
The regex is still needed because i dont need it for json, but i will definitly try that hashmap or dictionary thing.
Do you mean the library requests ?
I should do this but im too lazy.
Will probably do this anyways because i have never done it and maybe i learn something. Thx!

Still intereset if my aws instance is in the same data center as the destination server.
How good is 0.7ms ?

I need to say i tryed multiple datacenters in AWS, so it wasnt luck

Definitely the HTTP requests

>will still try this, i know i wont help but i will try it

Then why try it? You are not going to notice much difference. There might be a millisecond shaved off or at least parts of a millisecond. But it's the http requests that is the killer. Unless you manually parse the JSON instead of using the standard library.

Here are some tips to check how much each line of code takes:

marinamele.com/7-tips-to-time-python-scripts-and-control-memory-and-cpu-usage

Just remember to remove it from the production code. You can get Python do be very high performance because a lot of the standard library and other libraries are based on C code made by very smart people. Python is often used in high performance environments like super computers and so on.

Rewrite it in Nim
Python's standard library is now ported to Nim

nim-lang.org/

If you'll be making frequent checks whether an item exists in a collection, don't use a list. Your program has to traverse through the entire list looking for the item. Use a set instead, since checking if an item exists in a set can be done instantly.

Like others mentioned, though, your http requests are probably the bottleneck here.