Python

6474 readers

3 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy's API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

founded 2 years ago

MODERATORS

[email protected]

Multiple process or threads? And, why? (programming.dev)

submitted 2 months ago by [email protected] to c/[email protected]

16 comments fedilink hide all child comments

(For context, I'm basically referring to Python 3.12 "multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor"...)

Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.

Is this correct? Anyone cares to elaborate for me?

At least from a theorethical standpoint. Of course, many real work has a mix of both, and I'd better start with profiling where the bottlenecks really are.

If serves of anything having a concrete "algorithm". Let's say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I'm computing some averages from these data, and saving to a new file.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 1 points 2 months ago

Yes it is correct. TLDR; threads run one code at the time, but can access same data. processes is like running python many times, and can run code simultaneously, but sharing data is cumbersome.

If you use multiple threads, they all run on the same python instance, and they can share memory (i.e. objects/variables can be shared). Because of GIL (explained by other comment), the threads cannot run at the same time. This is OK if you are IO bound, but not CPU bound

If you use multiprocessing, it is like running python (from terminal) multiple times. There is no shared memory, and you have a large overhead since you have to start up python many times. But if you have large calculations you can do in parallell that takes long time, it will be much faster than threads as it can use all cpu cores.

If these processes need to share data, it is more complicated. You need to use special functions to share data, like queues and pipes. If you need to share many MB of data, this takes a lot of time in my experience (10s of milliseconds).

If you need to do large calculations, using numpy functions or numba may be faster than multiple processes, due to good optimizations. But if you need to crunch a lot of data, multiprocessing is usually the way to go