I did the mistake to write a time critical software in python.
The script does the following:
listen on a named pipe get relevant information with json[keyname] get specific parts of the information with regex check the result against a whitelist e.g. if stringTest is in listOfStringsTest: fire up a http request again json fire up again a http request
A question to the experienced python programmer: what parts of the script wastes too much time ?
the two http requests will absolutely dwarf the rest of the code
Brody Watson
the last http request is the 'goal' of the script. i was able to subtract the first one while writing this thread. how long does a "is in" need ? 150 entrys in the list and each ~8 bytes
would i profit from rewriting it in c ?
Brandon Harris
>would i profit from rewriting it in c ? only if you're running this program on a Pentium Pro from 1997
Jacob King
Cython
Easton Jackson
use pypy
Isaiah Long
Program is I/O bound. Choice of language won't help you much.
Robert Diaz
thanks at least i wont need to write it again
will still try this, i know i wont help but i will try it
Juan Bennett
>get relevant information with json[keyname] >get specific parts of the information with regex >check the result against a whitelist >e.g. if stringTest is in listOfStringsTest:
These could be really naive and dumb. Regex can be very slow if you're abusing it for something that you should be doing with a json parsing library.
If your program is comparing a string to a list of strings one by one that is stupid and slow. Use whatever python calls a hashset or dictionary or similar, and its an order of magnitude faster. That's assuming you only care about unique strings in your list of strings and don't want duplicates for some reason. Even then maintaining a dictionary with the string hashed against a count of occurances would be faster than a naive list.
Alexander Martin
check them all first, then open one tcp connection and fire all needed http requests with that single connection, it depends also on the speed and rate limiting of the server if applied
Jordan Johnson
Use requests/lxml. Didn't read full post til now, but I make a living off web scraping. There's no real way to make it 'fast' it's entirely dependent on your dl speed. if you are looking for something you want to run forever I'd recommend perl
Colton Myers
better yet put it on AWS or the cloud.
Aiden Reyes
use xpaths and NoSQL to query/store your data.
Benjamin James
It is already, i have a 0,7ms ping to the destinstion server Doed that mean its in the same center ?
Ayden Cruz
>what parts of the script wastes too much time ?
Profile your program so you actually know which parts are slow. Don't guess based on what anonymous strangers on the internet say.
Angel Thompson
yeah, post a paste or something so we can actually help you.
Gabriel Wright
I dont have permission to post it. I do it for a customer and the code is not generic. I think that could help! The regex is still needed because i dont need it for json, but i will definitly try that hashmap or dictionary thing. Do you mean the library requests ? I should do this but im too lazy. Will probably do this anyways because i have never done it and maybe i learn something. Thx!
Julian Butler
Still intereset if my aws instance is in the same data center as the destination server. How good is 0.7ms ?
Kevin Bennett
I need to say i tryed multiple datacenters in AWS, so it wasnt luck
Tyler Price
Definitely the HTTP requests
Benjamin Davis
>will still try this, i know i wont help but i will try it
Then why try it? You are not going to notice much difference. There might be a millisecond shaved off or at least parts of a millisecond. But it's the http requests that is the killer. Unless you manually parse the JSON instead of using the standard library.
Here are some tips to check how much each line of code takes:
Just remember to remove it from the production code. You can get Python do be very high performance because a lot of the standard library and other libraries are based on C code made by very smart people. Python is often used in high performance environments like super computers and so on.
Jacob Jackson
Rewrite it in Nim Python's standard library is now ported to Nim
If you'll be making frequent checks whether an item exists in a collection, don't use a list. Your program has to traverse through the entire list looking for the item. Use a set instead, since checking if an item exists in a set can be done instantly.
Like others mentioned, though, your http requests are probably the bottleneck here.