this post was submitted on 14 Oct 2024

18 points (100.0% liked)

Python

6474 readers

3 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

📅 Events

Past

November 2023

PyCon Ireland 2023, 11-12th
PyData Tel Aviv 2023 14th

October 2023

PyConES Canarias 2023, 6-8th
DjangoCon US 2023, 16-20th (!django 💬)

July 2023

PyDelhi Meetup, 2nd
PyCon Israel, 4-5th
DFW Pythoneers, 6th
Django Girls Abraka, 6-7th
SciPy 2023 10-16th, Austin
IndyPy, 11th
Leipzig Python User Group, 11th
Austin Python, 12th
EuroPython 2023, 17-23rd
Austin Python: Evening of Coding, 18th
PyHEP.dev 2023 - "Python in HEP" Developer's Workshop, 25th

August 2023

PyLadies Dublin, 15th
EuroSciPy 2023, 14-18th

September 2023

PyData Amsterdam, 14-16th
PyCon UK, 22nd - 25th

🐍 Python project:

💓 Python Community:

#python IRC for general questions
#python-dev IRC for CPython developers
PySlackers Slack channel
Python Discord server
Python Weekly newsletters
Mailing lists
Forum

✨ Python Ecosystem:

🌌 Fediverse

Communities

#python on Mastodon
c/django on programming.dev
c/pythorhead on lemmy.dbzer0.com

Projects

Pythörhead: a Python library for interacting with Lemmy
Plemmy: a Python package for accessing the Lemmy API
pylemmy pylemmy enables simple access to Lemmy's API with Python
mastodon.py, a Python wrapper for the Mastodon API

Feeds

founded 2 years ago

MODERATORS

[email protected]

Multiple process or threads? And, why? (programming.dev)

submitted 2 months ago by [email protected] to c/[email protected]

16 comments fedilink hide all child comments

(For context, I'm basically referring to Python 3.12 "multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor"...)

Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.

Is this correct? Anyone cares to elaborate for me?

At least from a theorethical standpoint. Of course, many real work has a mix of both, and I'd better start with profiling where the bottlenecks really are.

If serves of anything having a concrete "algorithm". Let's say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I'm computing some averages from these data, and saving to a new file.

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 17 points 2 months ago* (last edited 2 months ago) (3 children)

Python has a Global Interpreter Lock (GIL) which has been a bane and a boon. A boon because many basic types are thread-safe as actions happen in lock step. A bane because despite having multiple threads, there's still a master coordinating them all, which means there is no parallelism but concurrency. Python 3.13 allows disabling the GIL, but I cannot say much to that since I haven't tested it myself. Most likely it means nothing is really thread safe anymore and it's up to the developer to handle that.

So, in Python, using multiple threads is not a surefire way to have a performance boost. Small tasks that don't require many operations are OK for threading, but many cycles may be lost to the GIL. Using it for I/O bound stuff is good though as the main python thread won't be stuck waiting on those things to complete (reading or writing files, network access, screen access, ...) . Larger tasks with more operations that are I/O bound or require parallelism (encoding a video file, processing multiple large files at once, reading large amounts of data from the network, ...) are better as separate processes.

As an example: if you have one large file to read then split out into multiple small files, threads are a good option. Splitting happens sequentially, but writing to disk is (comparatively) slow task that one shouldn't wait on and can be dedicated to a thread. Doing these operations on multiple large files is worth doing in parallel using multiple processes. Each process will read a file, split it, and write in threads, while one master process orchestrates the slave processes.

Of course, your mileage may vary. I've run into the issue of requiring parallelism on small tasks and the only thing that worked was moving out that logic to a cython and outside the GIL (terrible experience). For small, highly parallel operations, probably Python isn't the right language and something like Rust should be explored.

Anti Commercial-AI license

[–] [email protected] 2 points 2 months ago (1 children)

For small, highly parallel operations, probably Python isn't the right language and something like Rust should be explored.

You could also try Julia, which, if I'm not mistaken, handles concurrency and parallelism well, but is also interactive and easy to write like python.

[–] [email protected] 2 points 2 months ago (1 children)

Does it also support writing compiling or exporting to python modules?

Anti Commercial-AI license

[–] [email protected] 2 points 2 months ago

I don't think so, there was some discussion about why writing Julia as a python transpiler wouldn't work as well. But it does supposedly have very good interoperability, both ways - calling Julia functions from Python or vice versa.

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago) (1 children)

Wow coming from C++/Rust I was about to answer that both are parallelism. I did not knew about python's GIL. So I suppose this is the preferred way to do concurrency, there is no async/await, and you won't use Qt "just" for a bit of concurrency. Right ?

We learn a little bit everyday. Thanks!

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago) (2 children)

IINM whether it's "true" parallelism depends on the number of hardware cores (which shouldn't be a problem nowadays). A single, physical core means concurrency (even with "hyper threading") and multiple cores could mean parallelism. I can't remember if threads are core bound or not. Processes can bound to cores on linux (on other OSes too most likely).

So I suppose this is the preferred way to do concurrency, there is no async/await

Python does have async which is syntax sugar for coroutines to be run in threads or processes using an executor (doc). The standard library has asyncio which describes valuable usecases for async/await in python.

and you won’t use At “just” for a bit of concurrency. Right ?

Is "At" a typo?

We learn a little bit everyday. Thanks!

You're welcome :) I discovered the GIL the hard way unfortunately. Making another person aware of its existence to potentially save them some pain is worth it.

Anti Commercial-AI license

[–] [email protected] 2 points 2 months ago (1 children)

I can’t remember if threads are core bound or not.

On Linux, by default they're not. getcpu(2) says:

   The getcpu() system call identifies the processor and node on which the
   calling thread or process is currently running and writes them into the
   integers pointed to by the cpu and node arguments.  ...

   The  information  placed in cpu is guaranteed to be current only at the
   time of the  call:  unless  the  CPU  affinity  has  been  fixed  using
   sched_setaffinity(2),  the  kernel  might  change  the CPU at any time.
   (Normally this does not happen because the scheduler tries to  minimize
   movements  between  CPUs  to keep caches hot, but it is possible.)  The
   caller must allow for the possibility that the information returned  in
   cpu and node is no longer current by the time the call returns.

[–] [email protected] 1 points 2 months ago

Thank you. That's good to know. In my OS architecture lectures, we were introduced to an OS with core bound threads. I can't remember if it was a learning OS or something that really existed, hence my doubts.

Anti Commercial-AI license

[–] [email protected] 2 points 2 months ago

and you won’t use At “just” for a bit of concurrency. Right ?

Is "At" a typo?

Yes I wanted to talk about the Qt Framework. But with that much ways to do concurrency in the language's core, I suspect you would use this framework for more than just its signal/slots feature. Like if you want their data structures, their network or GUI stack, …

I'm not using Python, but I love to know the quirks of each languages.

[–] [email protected] 3 points 2 months ago* (last edited 2 months ago)

This is a very good explanation!