Recursive macros in C, demystified (once the ugly crying stops)

by eatonphilon 11/6/2025, 1:09 AMwith 82 comments

by chirszon 11/6/2025, 3:27 AM

The behavior of C macros is actually described by a piece of pseudocode from Dave Prosser and it is not in the standard:

* https://www.spinellis.gr/blog/20060626/

* https://www.spinellis.gr/pubs/jrnl/2006-DDJ-Finessing/html/S...

* https://gcc.gnu.org/legacy-ml/gcc-prs/2001-q1/msg00495.html

by kragenon 11/6/2025, 11:22 AM

I think the C preprocessor was designed after the GPM clone m6, its successor m4, and Ratfor, so I suspect the difficulty in doing things like this is intentional. I guess I should ask McIlroy, who is responsible for pushing m4 to its absolute limits and was present when the C preprocessor was being designed: https://www.cs.dartmouth.edu/~doug/barem4.m4

    _        Pure macros as a programming language
    _
    _ m4 is Turing complete even when stripped to the bare minimum
    _ of one builtin: `define'. This is not news; Christopher
    _ Strachey demonstrated it in his ancestral GPM, described in
    _ "A general- purpose macrogenerator", The Computer Journal 8
    _ (1965) 225-241.
    _
    _ This m4 program more fully illustrates universality by
    _ building familiar programming capabilities: unlimited
    _ precision integer arithmetic, boolean algebra, conditional
    _ execution, case-switching, and some higher-level operators
    _ from functional programming. In support of these normal
    _ facilities, however, the program exploits some unusual
    _ programming idioms:
    _ 
    _ 1. Case-switching via macro names constructed on the fly.
    _ 2. Equality testing by redefining macros.
    _ 3. Representing data structures by nested parenthesized lists.
    _ 4. Using macros as associative memory.
    _ 5. Inserting nested parameter symbols on the fly.
    _
    _ Idioms 2 and 5 are "reflective": the program writes code
    _ for itself.
It's very easy to get into enormous amounts of trouble in m4, m6, or GPM. The C preprocessor is not without its problems, but it is rare that I have difficulty in understanding why a given gcc -E invocation produces the output it does.

by danderschon 11/6/2025, 3:08 AM

Related: The Preprocessor Iceberg https://jadlevesque.github.io/PPMP-Iceberg/

There you can find a recursive macro expansion implementation (as a gcc hack) that fits on a slide:

  #2""3
  
  #define PRAGMA(...) _Pragma(#__VA_ARGS__)
  #define REVIVE(m) PRAGMA(push_macro(#m))PRAGMA(pop_macro(#m))
  #define DEC(n,...) (__VA_ARGS__)
  #define FX(f,x) REVIVE(FX) f x
  #define HOW_MANY_ARGS(...) REVIVE(HOW_MANY_ARGS) \
      __VA_OPT__(+1 FX(HOW_MANY_ARGS, DEC(__VA_ARGS__)))
  
  int main () {
      printf("%i", HOW_MANY_ARGS(1,2,3,4,5)); // 5
  }
It sounds like the one in the article works for more compilers, but there doesn't seem to be a copy-pasteable example anywhere to check for myself. Also, the "Our GitHub Org" link on the site just links to github.com.

by camel-cdron 11/6/2025, 6:55 AM

I did the first week of AOC22 in the C preprocessor: https://github.com/camel-cdr/boline/tree/main/aoc22

by SAI_Peregrinuson 11/6/2025, 3:01 PM

One can also (ab)use the build system to run arbitrary preprocessing steps with any language over the "C" input. You can have recursive macros by using M4 or Perl or Python or some other language to expand them, converting your "foo.c.in" into a "foo.c" to hand off to the C preprocessor & compiler. It still feels dirty, but it's often much easier to understand & debug.

by fuhsnnon 11/6/2025, 4:02 AM

I wonder if the author is aware of the __VA_TAIL__ proposal[1], it covered similar grounds and IMO very well thought out, but unfortunately not accepted into C2Y (judging from committee meeting minutes).

[1] https://www.open-std.org/jtc1/sc22/wg14/www/docs/n3307.htm

by 0x69420on 11/6/2025, 12:36 PM

genuinely remarkable, the altogether perhaps even productive mischief you can get up to, especially with `__VA_OPT__` becoming a proper standard in both C and C++ so you don't have to feel dirty about using it.

i recently made use of plenty of ugly tricks in this vein to take a single authoritative table of macro invocations that defined a bunch of pixel formats, and make them graduate from defining bitfield structs to classes with accessors that performed good old fashioned shifts and masks, all without ever specifying the individual bit offsets of channels, just their individual widths, and macro magic did the rest. no templates, no actual c++, could just as feasibly produce pure c bindings down the line by just changing a few names.

getting really into this stuff makes you stop thinking of c function-like macros as functions of their arguments as such, but rather unary functions of argument lists, where arity roughly becomes the one notion vaguely akin to typing in the whole enterprise, or at least the one place where the compiler exhibits behaviour resembling that of a type checker. this was especially true considering the entries in the table i wound up with were variadic, terminating in variably many (name, width) parenthesised tuples. and i just... had the means to "uncons" them so to speak. fun stuff.

this is worth it, imo, in precisely one context, which is: you want a single source of truth that defines fiddly but formulaic implementations spread across multiple files that must remain coordinated, and this is something you do infrequently enough that you don't consider it worthwhile introducing "real" "big boy" code gen into your build process. mind, you usually do end up having to commit to a little utility header that defines convenient macros (_Ex and such in the article), but hey. c'est la vie. basically x macros (https://en.wikipedia.org/wiki/X_macro) on heart attack quantities of steroids.

by procaryoteon 11/6/2025, 7:29 AM

In many ways being limited ends up being a feature. Even limited as it is, you get some crimes against humanity like the bourne shell source, but at least most people agree it is a bad idea

If it allowed more unlimited metaprogramming, building big complex things as macros might well have become popular

by pjsgon 11/6/2025, 3:46 AM

I wept when the author mentioned implementing SHA256 in macros.

by jhallenworldon 11/6/2025, 7:36 PM

The lack of (easy) recursion in CPP is so frustrating because it was always available in assembly languages with even very old and very simple macro assemblers- with the caveat that the recursion depth was often very limited, and no tail call elimination. For example, if you need to fill memory:

    ; Fill memory with backward sequence
    macro fill n
        word n
        if n != 0
            fill n - 1
        endif
    endm

    So "fill 3" expands to:
        word 3
        word 2
        word 1
        word 0
There is no way this was not known about when C was created. They must have been burned by recursive macro abuse and banned it (perhaps from m4 experience as others have said).

The other assembly language feature that I missed is the ability to switch sections. This is useful for building tables in a distributed fashion. Luckily you can do it with gcc.

by bluGillon 11/6/2025, 6:41 PM

I've ready the article 4 times already today and I'm still crying. This looks like the solution to a problem I'm having (C++, but I'm doing things that templates and constexpr can't do), but trying to get it all to work is painful. Kudos to the author at making an attempt to explain it.

by Joker_vDon 11/6/2025, 1:50 PM

    #define _H4X0R_CONVERT_ONE(arg)                  \
        ((union { unsigned long long u; void *v; }){ \
            .u = (unsigned long long)arg,           \
    }).v
Couldn't this be just

    #define _H4X0R_CONVERT_ONE(arg) (void*)(uintptr_t)(arg)
?

Also, thanks, now I can finally use

    void my_printf(const char *fmt, void* args[], size_t argc);
    
ergonomically:

    #define my_printf(fmt, ...) (my_printf)((fmt), \
        (void*[]){ H4X0R_VA_VOID_STAR_CONVERT(__VA_ARGS__) }, \
        H4X0R_VA_COUNT(__VA_ARGS__))

    int main(int argc, char **argv) {
        my_printf("int: %d, ptr: %p, str: %s, missing: %d\n", 42, argv, "Hello world!");
    }

    $ gcc test.c && ./a.out
    int: 42, ptr: 0x7FFF46AA3E78, str: Hello world!, missing: %!d(MISSING)
Funnily enough, the difference between passing ... and locally-allocated void*[] is basically who has to spill the data to the stack, the caller or the called function.

by imglorpon 11/7/2025, 12:32 AM

> C has many advantages that have led to its longevity (60 years as perhaps the most important language).

53 years by my count. Did something relevant happen in 1960? Maybe author is alluding to B?

by WalterBrighton 11/6/2025, 4:39 AM

Imagine trying to implement the C preprocessor. I had to write it from scratch 3 times before it worked 100%.

by tester756on 11/6/2025, 4:05 PM

Macro is one of the ugliest features available in langs like C/CPP

by winocmon 11/6/2025, 6:57 AM

Mildly related, sort of, one can prevent expansion of variadic macros as follows:

   #define printf(...)

   int (printf)(const char *, ...);
I keep on seeing many random code bases just resort to #undef instead...

by paradoson 11/6/2025, 10:53 AM

A C preprocessor implemented in Python: https://github.com/paulross/cpip

by stevefan1999on 11/6/2025, 3:20 AM

I used to write a preprocessor until I noticed those kind of thing...I stopped writing it after that

by hyperhelloon 11/6/2025, 3:07 AM

Can I use this technique to expand MACRO(a,b,c,…) into something like F(a,b,c…); G(a,b,c…)?

by russfinkon 11/6/2025, 3:42 AM

Is this a DoS risk - code that sends your build chain into an infinite loop?

by MangoToupeon 11/6/2025, 7:21 AM

The c pre processor. C doesn't have macros. It's fucking miserable. Anyone who uses it is a masochist