this post was submitted on 16 Mar 2024
17 points (100.0% liked)

Programming

17392 readers
153 users here now

Welcome to the main community in programming.dev! Feel free to post anything relating to programming here!

Cross posting is strongly encouraged in the instance. If you feel your post or another person's post makes sense in another community cross post into it.

Hope you enjoy the instance!

Rules

Rules

  • Follow the programming.dev instance rules
  • Keep content related to programming in some way
  • If you're posting long videos try to add in some form of tldr for those who don't want to watch videos

Wormhole

Follow the wormhole through a path of communities [email protected]



founded 1 year ago
MODERATORS
 

I want to compile a docx file into a Typst file, I believe deep down docx is XML, and Typst is close to markdown with interesting functionalities, is that feasible? Note that Typst does have syntax to define functions and call them and I want to create special functions during the code gen step, is ANTLR the right tool for the job? Are there better tools? I want to have as few bugs as possible

top 8 comments
sorted by: hot top controversial new old
[–] [email protected] 1 points 7 months ago

Since the source is XML XSLT may work to transform it.

[–] [email protected] 5 points 7 months ago

ANTLR is for writing parsers. You don't need a new custom parser, just use an existing XML parser.

[–] [email protected] 3 points 8 months ago* (last edited 8 months ago)

I don't know anything about Typst, but I do know that .docx files are really just a zip file containing a folder structure with a bunch of xml (and a few other) files. I've written a few find/replace docx scripts in bash utilizing this information.

[–] [email protected] 7 points 8 months ago (1 children)

Antlr sounds excessive for either of those. Use an ordinary xml library for docx (if there's not already one for docx) and something simple for typst.

[–] [email protected] 2 points 8 months ago (1 children)

I want to compile the docx INTO a typst file, not a separate parser for each

[–] [email protected] 3 points 8 months ago (1 children)

Oh, ok, antlr would be inappropriate then. I'd check whether pandoc already does that conversion.

[–] [email protected] 2 points 8 months ago* (last edited 8 months ago)

I just checked, it does convert to Typst but I do want to write custom stuff alongside what pandoc will output, that seems like the right tool and saves me a lot of efforts, thanks

[–] [email protected] 5 points 8 months ago

I'm not sure what the best approach would be, but for reading docx you might be better off using something like Apache POI. Docx may be XML, but it's imo absolute abuse of XML. POI shields you a little bit from all the nonsense happening in docx. I could see ANTLR working for Typst since there's probably not another interface for it.

I don't think it'll support it, but you could also check if this can be done with pandoc.