I've been experimenting on creative writing tools with a bunch of writer friends, and the setup described in this paper is notoriously shit. I mean they come up to ChatGPT on v3.5 (or Bard lmao) and expect it to write comedy ? Jeez talk about setting yourself up for failure. That's like walking up to a junior screenwriter and yelling "GIVE ME A JOKE" to them. I don't understand why people keep repeating that mistake, they design experiments where they expect the model to be the source of creativity but that's just stupid.
If you want to get output that is not entirely mediocre, you need something like a Dramatron architecture where you decouple various task (fleshing out characters, outlining at the episode level, outlining at the scene level, writing dialogues etc...) and maintain internal memory of what is being worked on. It is non-trivial to setup but it gets there sometimes - even the authors of this paper recognize that this would have probably produced better results. You also need a user able to provide good ideas that the model can work with, you can't expect the good creative stuff to come from the robot.
Instinctively i'd say you have to treat the model like your own junior writer, and how do you make a junior writer useful ? By teaching them to "yes, and" in a writing room with better writers (in this case, the user). In that context, with a good experienced user at the helm, it can definitely bring value. Nothing groundbreaking but i can see how a refined version of this could help, notably with consistency, story beats, pacing, the boring stuff. GPTs are better critics than they are writers anyway.
That being said i never really pursued "pure comedy" on LLMs as it sounds like a lost battle. In my mind it's kind of like tickling : if a machine pokes your ribs you don't get the tickles, that only works when a human does it. I doubt they can fix that in the short or mid term.