this post was submitted on 14 Oct 2024
18 points (100.0% liked)

Python

6474 readers
3 users here now

Welcome to the Python community on the programming.dev Lemmy instance!

๐Ÿ“… Events

PastNovember 2023

October 2023

July 2023

August 2023

September 2023

๐Ÿ Python project:
๐Ÿ’“ Python Community:
โœจ Python Ecosystem:
๐ŸŒŒ Fediverse
Communities
Projects
Feeds

founded 2 years ago
MODERATORS
 

(For context, I'm basically referring to Python 3.12 "multiprocessing.Pool Vs. concurrent.futures.ThreadPoolExecutor"...)

Today I read that multiple cores (parallelism) help in CPU bound operations. Meanwhile, multiple threads (concurrency) is due when the tasks are I/O bound.

Is this correct? Anyone cares to elaborate for me?

At least from a theorethical standpoint. Of course, many real work has a mix of both, and I'd better start with profiling where the bottlenecks really are.

If serves of anything having a concrete "algorithm". Let's say, I have a function that applies a map-reduce strategy reading data chunks from a file on disk, and I'm computing some averages from these data, and saving to a new file.

you are viewing a single comment's thread
view the rest of the comments
[โ€“] [email protected] 3 points 2 months ago

Threads all run on the same core, processes can run on different cores.

Because threads run on the same core, the only time they can improve performance is if there are non-cpu tasks in your code - usually I/O operations. Otherwise the only thing multi threading can provide is the appearance of parallelism (as the cpu jumps back and forth between threads progressing each in small steps).

On the other hand, multiprocessing allows you to run code on different cores, meaning you can take full advantage of all your processing power. However, if youre program has a lot of I/O tasks, you might end up bottlenecked by the I/O and never see any improvements.

For the example you mentioned, it's likely threading would be the best as it's got a little less overhead, easier to program, and you're task is mostly I/O bound. However, if the calculations are relatively quick, it's possible you wouldn't see any improvement as the cpu would still end up waiting for the I/O.