Simple Python Threading Example

At my work we have a Python library that interfaces to all our API micro services (that are written in Java/Scala). It is a very useful tool for debugging and working with our platform, so I spend a lot of my time in a Python REPL.

Often times I find myself needing to hit multiple APIs in parallel. Since Python is synchronous, I was looking for an easy way to parallelize my requests, without having to write a lot of lines of code (since I am using REPL).

Turns out Python has a multiprocessing.dummy
module, described in the docs as follows:

multiprocessing.dummy replicates the API of multiprocessing but is no more than a wrapper around the threading module.

With this module, calls are parallelized with just a few lines as follows:

from multiprocessing.dummy import Pool
pool = Pool(10) # Number of concurrent threads
asyncresponse = pool.map(somesynfunction, somelistofarguments)
pool.close()
pool.join()
 

I created the following real world example to show how it works.

Let say I have a list of zip codes from 94400 to 94420 and I wanted to check which one of them are valid US zip codes. I could use the free API from Google to query location data for each zip code, you can try it in your browser:

https://maps.googleapis.com/maps/api/geocode/json?address=94403

For my range of zip codes, I actually run the program twice: once using sync flow and another using 10 threads. I timed both approaches.

import requests
from multiprocessing.dummy import Pool
import time
from datadiff import diff
 
def getzip(code):
    try:
        code = str(code)
        url = "https://maps.googleapis.com/maps/api/geocode/json?address={}".format(code)
        res = requests.get(url).json()['results']
        if len(res)  0:
    print "diff is"
    print d
 
for r in asyncres:
    print "Zip code {} is {} US code".format(r[0], "valid" if r[1] else "invalid")
 

My sample run resulted in the following output:

$ python getzip.py
Range is: 20
Using one thread
took 7.47538208961
 
Using multiple threads
took 3.59181404114
 
Zip code 94400 is invalid US code
Zip code 94401 is valid US code
Zip code 94402 is valid US code
Zip code 94403 is valid US code
Zip code 94404 is valid US code
Zip code 94405 is invalid US code
Zip code 94406 is invalid US code
Zip code 94407 is invalid US code
Zip code 94408 is invalid US code
Zip code 94409 is invalid US code
Zip code 94410 is invalid US code
Zip code 94411 is invalid US code
Zip code 94412 is invalid US code
Zip code 94413 is invalid US code
Zip code 94414 is invalid US code
Zip code 94415 is invalid US code
Zip code 94416 is invalid US code
Zip code 94417 is invalid US code
Zip code 94418 is invalid US code
Zip code 94419 is invalid US code
 

Few things to note

  1. Google has a rate limit on their API, so my code ended up doing a lot of retries. In real world use cases I see a much bigger speed up.
  2. There is no guarantee for results to be returned in the same order, as one would get with a sync flow. For some use cases results will need to be sorted after the fact.

In conclusion, I find this approach to be the most user friendly way to parallelize Python API calls with a few lines of code.

P.S. Check out: Programming Interviews Exposed: Secrets to Landing Your Next Job
.

Alex Kras稿源:Alex Kras (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 综合编程 » Simple Python Threading Example

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录