This is quite informative for an introduction to one particular type of IR. It's better than some others because it actually does mention that there are alternatives in some places. And since what people are using an IR for varies a lot from program to program, there is no alternative that is always best (there are trends, but even then you should be aware of the cost comparison).
Since this post talks a lot about SSA, it might be worth mentioning the thing that probably confuses people the most: how do you know if a variable is valid in a BB. It can be informative (but memory-inefficient, and probably optimization-inefficient) to require all incoming variables to go through phi nodes. Also think about how destructors and outward `goto/break` affect dominance in strange ways.
The second understated thing about SSA is: exactly how much additional data do you store for exactly what period of time. "Don't store the extra data" means many passes will have to recompute it, but "do store the extra data" makes it costlier when passes have to invalidate it. Also make sure you don't run out of RAM!
My own post (more focused on interpreters, but not exclusively so) probably talks more about alternative implementation choices (though I do say "don't do that" for strictly-worse ones, and there are some I still haven't gone into detail about) in many compiler stages than any other source I've seen:
The main aspect about the linked post that annoys me is that it falls into the trap of declaring a duality between "stack-based" and "register-based", when in fact there are so many options that the difference within each of those named categories can be bigger than the difference between them.
pizlonator 4 hours ago [-]
What I talk about when I talk about IRs is the principle of isolated ugliness.
A good IR means that the only contract between compiler passes is the IR itself. This means that individual passes might get as ugly as they need to get without affecting other passes. This then means that lots of different folks can write lots of different passes, and then you can glue them together somehow in a pass pipeline and it all works.
LLVM is a great example of this. You can be productive at writing LLVM passes without understanding any of the other passes in LLVM. Maybe you’ll come to understand some of them just because it’s informative and there’s cool stuff to learn. But even then you don’t have to understand the other passes that deeply. Writing a cool new pass that obeys the IR’s laws is not going to break the other passes. You’ll only break the other passes by breaking IR law.
So what makes a good IR is, more than any other single thing, about whether that IR has easy to understand laws and whether those laws are really followed. This allows you to isolate ugliness. Which allows you to have a large team writing lots of passes. And then your compiler can drink the tears of your defeated competitors.
almostgotcaught 4 hours ago [-]
> A good IR means that the only contract between compiler passes is the IR itself.
this is like impossible (or at least np-hard). phase ordering exists and will always exist - SLP vectorizer will dramatically affect passes after it and sometimes even passes before it!
> Writing a cool new pass that obeys the IR’s laws is not going to break the other passes.
I don't really get how you can fail to obey the IR's laws other than just emitting unparseable IR? alternatively if you like have uses without defs or flip-flopping types, the verifier needs to catch that after the guilty pass and just bail (not put the onus on the subsequent passes to fix).
Anyway IMHO (even though no one asked me), IR needs to have enough structure/info that you can quickly reproduce whatever metadata you need for XYZ arbitrary analysis. Meaning it should represent all of the useful/necessary things (don't make me infer non-negativity structurally if I can just annotate) and it should propagate those things across rewrites.
pizlonator 4 hours ago [-]
> this is like impossible (or at least np-hard). phase ordering exists and will always exist - SLP vectorizer will dramatically affect passes after it and sometimes even passes before it!
It's not impossible.
SLP vectorizer will affect the performance upside of passes before/after it. It won't affect their correctness.
> I don't really get how you can fail to obey the IR's laws other than just emitting unparseable IR? alternatively if you like have uses without defs or flip-flopping types, the verifier needs to catch that after the guilty pass and just bail (not put the onus on the subsequent passes to fix).
Lots of compilers - not LLVM IR - have a totally different kind of phase ordering problem than what you are thinking of: if you reorder the phases then the compiler crashes and burns. Some of these compiler are in production, sadly. I've heard some crazy horror stories. And it's all too easy to find yourself in that situation if you're not careful.
LLVM IR is an example of being careful.
almostgotcaught 4 hours ago [-]
> SLP vectorizer will affect the performance upside of passes before/after it. It won't affect their correctness.
maybe. it's late so i can't think of a counter-example off the top of my head where someone downstream of SLP somehow took something for granted and then it blew up in their faces. even if it is somehow magically not true for SLP, it's definitely true for other sequences of passes, where subsequent passes depend on prior passes.
> if you reorder the phases then the compiler crashes and burns. Some of these compiler are in production, sadly. I've heard some crazy horror stories. And it's all too easy to find yourself in that situation if you're not careful.
i mean i have witnessed this in a prod MLIR compiler i have personally worked on - it's exactly what i had in mind above. did it crash because it tried to deref some pointer to some attribute/node/whatever? no that's true - it crashed via llvm::report_fatal_error (or assert) - but it crashed nonetheless.
jcranmer 3 hours ago [-]
Many passes in LLVM tend to assume that the IR is in some sort of poorly-defined canonical form. For example a loop transformation might bother to only handle loops where the latch and exiting block are one and the same, or assume the existence of a dedicated loop preheader. In principle, this dependency on canonical form should only be one where code is less optimal if the code isn't canonical, but compilers are big, complex things, so it should be no surprise that sometimes the noncanonical IR ends up causing crashes or miscompiles.
pizlonator 3 hours ago [-]
Do you remember a case where it did cause a crash or a miscompile, and it wasn't also a case where that same crash/miscompile could have happened under default/optimal phase order?
I can easily imagine that reordering phases increases overall likelihood of surfacing bugs that are there anyway.
I have a harder time imagining a dependence on canonical form that leads to a crash under phase reordering but is totally sound otherwise.
jcranmer 3 hours ago [-]
Offhand, I don't have any cases.
The cases where they're the most likely to occur is in the codegen pipeline, but codegen is already acknowledged to be a situation where you have non-optional passes that have to occur in a particular order, so I suspect you're already discounting those scenarios.
pizlonator 3 hours ago [-]
> it's definitely true for other sequences of passes, where subsequent passes depend on prior passes.
I don’t think it’s true for optimization passes in llvm and that’s what makes llvm kinda great.
I would love to hear of a counter example.
But say that you found a counterexample. That would still not be as bad as what happens in a lot of compilers out there where being able to reorder passes (or even run them a second time) is the exception rather than the norm, and our debate would be about whether there exists even a single pair of passes that could be safely reordered.
almostgotcaught 3 hours ago [-]
Sure - I agree that LLVM is mostly reasonable - I'm just saying I don't think has to do with the design of the IR itself and is primarily because of convention/code quality/stewardship. Like to ask my question more precisely: can you give me an example of an IR design that didn't "isolate ugliness"?
Edit: I'm being rate limited so here's what I wanted to post in response:
You kept talking about LLVM and I just assumed LLVM IR but this thing (different rules before and after regalloc) is exactly the case for LLVM's MIR (even going so far as no longer being SSA after a certain part of the pipeline). And in fact I have a PR hanging out right now for adding a legalization for GISel (which also operates on MIR) that crashes in exactly the way you say it shouldn't - I run my legalizer and it's fine and dandy and then somewhere down the line (another legalizer) there's a null dereference. So I guess LLVM isn't reasonable everywhere! And also I should take a think about the differences between LLVMIR and MIR.
pizlonator 3 hours ago [-]
In my experience with compilers where phase ordering causes fires, it's exactly to do with the IR.
Because I don't want to pick on others (as much fun as that would be), I'll just pick on compilers I wrote that have this property and how much I regret everything I did there.
- The first compiler IR I designed was called Fiji C1, and it had both an SSA form and a not-SSA form, and it had multiple stages of lowering within the same IR. You absolutely could not breathe on the phase order without destroying everything.
- I'm maybe like mostly responsible for JavaScriptCore's DFG IR, which has a LoadStore form, a ThreadedCPS form, and an SSA form, plus maybe other forms too. And there are many phases in there that serve the purpose of carefully lowering some aspect of type speculation. Foolish is the man who tries to mess with the order of those phases.
- The Assembly IR in JavaScriptCore's top tier JIT has different laws before and after regalloc, and before and after stack alloc. That's maybe inevitable and like not too bad since it also has a small number of phases, but still.
Note that in each of these cases, it absolutely is an IR issue. It's not like there were some rules that I failed to follow. It's that I designed the compiler specifically to have an IR that follows different laws in different parts of the pipeline.
o11c 3 hours ago [-]
The "different laws" thing really should be handled by changing the types of all the nodes around the critical passes. Unfortunately, every language I know sucks at typing if you want to avoid all the copying that the naive approach gives you. (remember especially that some types will only exist during certain phases. The most basic of these is: which token types are valid in a parse tree and which aren't? E.g. binary operator tokens usually remain valid, but parentheses disappear, whereas many new types appear. For later phases this only gets more complicated.)
dzaima 3 hours ago [-]
As a fun note, here's a talk of automatically reordering LLVM passes to optimize for code size, which seemingly worked enough to make a talk/paper about: https://www.youtube.com/watch?v=_SqWd74zG2Y
mhh__ 5 hours ago [-]
I'm curious what the story is with (if I have it right) V8 dropping sea of nodes.
Is it an argument against sea of nodes or just that it isn't pulling it's weight for them at the moment
pizlonator 4 hours ago [-]
My take: sea of nodes has no objective benefits over CFG/SSA. Like, none whatsoever. It’s all feels. And it has lots of objective downsides, which are summarized quite nicely in the V8 folks post about dropping it.
Let me dig into the no objective benefits bit: there does not exist a compiler transformation or analysis that would be easier to write in the sea of nodes representation. Not even one. Contrast this with SSA versus the alternatives, where SSA saves you hella lines of code that you would otherwise be writing to track use def chains.
almostgotcaught 4 hours ago [-]
> If your static analysis is over SSA, then generally the static analysis is easier and (potentially) storing facts is easier. This is due to this property called sparseness. Where a static analysis over non-SSA programs has to store facts about all variables at all program points, an analysis over SSA need only store facts about all variables, independent of context.
Not sure what this has to do with SSA or not? Maybe you're saying if you don't have SSA then value numbering requires extra book-keeping to keep track of the names? that's true but the book-keeping only kicks when a name is updated, not at every program point.
> If we want to keep our information sparse, though, we have to add a new definition to the IR.
I mean this isn't like a law of nature, it's just a design point - a question of whether the facts have to be serializable for some reason. If they don't (and in general, I can't think of a reason they would have to be), you can just keep things in-memory in general and do either sparse or dense analysis on an as-needed basis. To wit MLIR, an SSA IR, handily supports both sparse and dense dataflow analysis:
Note, both analyses propagate lattices which support meet and join.
> Anyway, SSA only stores type information about instructions and does not encode information that we might later learn in the IR. With basic SSA, there’s not a good way to encode refinements…
This is true of course but I struggle to think of a reasonable semantic that would enable you to refine/change/alter the type of a value that wasn't an operation. The conditional example is specious but could be rectified if it were modeled correctly: the conditional is an operation which has operands and two regions, each with a basic block with basic block args and it's the types of the bb args that can show as "refined". MLIR's scf.if isn't "isolated from above" and its bbs don't have args but it could have both such things and such an operation would model what you want precisely and in the IR/types themselves.
pizlonator 4 hours ago [-]
> Not sure what this has to do with SSA or not? Maybe you're saying if you don't have SSA then value numbering requires extra book-keeping to keep track of the names? that's true but the book-keeping only kicks when a name is updated, not at every program point.
It is the case that if you do a flow-insensitive analysis of variables over SSA, it will be equivalent to a flow-sensitive analysis of variables, because of the static single assignment property. I think that's what the author is trying to say, and I agree.
> To wit MLIR, an SSA IR, handily supports both sparse and dense dataflow analysis:
I don't think dense vs sparse analysis is the same thing as the use of the word "sparse" in the post.
> This is true of course but I struggle to think of a reasonable semantic that would enable you to refine/change/alter the type of a value that wasn't an operation.
If you're compiling a dynamically typed language, for example.
almostgotcaught 4 hours ago [-]
> It is the case that if you do a flow-insensitive analysis of variables over SSA, it will be equivalent to a flow-sensitive analysis of variables, because of the static single assignment property.
you're saying the same thing i'm saying - the question/issue arises from value numbering and naming. see "engineering a compiler" page ~400 where they discuss the benefits of SSA for value numbering: https://imgur.com/a/bPabq7D
> I don't think dense vs sparse analysis is the same thing as the use of the word "sparse" in the post.
it is - the allusion/claim in the post is that SSA is sparse because you don't need to keep track of facts/metadata for all variables at all program points. that's because of the same thing: SSA uniquely locates/identifies values and therefore you can uniquely map between SSA ids and metadata. so you don't need to have a table of stuff that's updated at every program point (like you do in dense analysis), just along control-flow/operand/result edges.
pizlonator 3 hours ago [-]
> you're saying the same thing i'm saying - the question/issue arises from value numbering and naming. see "engineering a compiler" page ~400 where they discuss the benefits of SSA for value numbering: https://imgur.com/a/bPabq7D
We're definitely not saying the same thing, because to me, the post's claim about the benefits of SSA for flow analysis are obvious and unobjectionable, whereas you raised an objection.
> it is - the allusion/claim in the post is that SSA is sparse because you don't need to keep track of facts/metadata for all variables at all program points. that's because of the same thing: SSA uniquely locates/identifies values and therefore you can uniquely map between SSA ids and metadata. so you don't need to have a table of stuff that's updated at every program point (like you do in dense analysis), just along control-flow/operand/result edges.
So then what were you talking about when you said, "Not sure what this has to do with SSA or not?". Seems like you're now trying to explain to me exactly what this has to do with SSA or not.
Since this post talks a lot about SSA, it might be worth mentioning the thing that probably confuses people the most: how do you know if a variable is valid in a BB. It can be informative (but memory-inefficient, and probably optimization-inefficient) to require all incoming variables to go through phi nodes. Also think about how destructors and outward `goto/break` affect dominance in strange ways.
The second understated thing about SSA is: exactly how much additional data do you store for exactly what period of time. "Don't store the extra data" means many passes will have to recompute it, but "do store the extra data" makes it costlier when passes have to invalidate it. Also make sure you don't run out of RAM!
My own post (more focused on interpreters, but not exclusively so) probably talks more about alternative implementation choices (though I do say "don't do that" for strictly-worse ones, and there are some I still haven't gone into detail about) in many compiler stages than any other source I've seen:
https://gist.github.com/o11c/6b08643335388bbab0228db763f9921...
The main aspect about the linked post that annoys me is that it falls into the trap of declaring a duality between "stack-based" and "register-based", when in fact there are so many options that the difference within each of those named categories can be bigger than the difference between them.
A good IR means that the only contract between compiler passes is the IR itself. This means that individual passes might get as ugly as they need to get without affecting other passes. This then means that lots of different folks can write lots of different passes, and then you can glue them together somehow in a pass pipeline and it all works.
LLVM is a great example of this. You can be productive at writing LLVM passes without understanding any of the other passes in LLVM. Maybe you’ll come to understand some of them just because it’s informative and there’s cool stuff to learn. But even then you don’t have to understand the other passes that deeply. Writing a cool new pass that obeys the IR’s laws is not going to break the other passes. You’ll only break the other passes by breaking IR law.
So what makes a good IR is, more than any other single thing, about whether that IR has easy to understand laws and whether those laws are really followed. This allows you to isolate ugliness. Which allows you to have a large team writing lots of passes. And then your compiler can drink the tears of your defeated competitors.
this is like impossible (or at least np-hard). phase ordering exists and will always exist - SLP vectorizer will dramatically affect passes after it and sometimes even passes before it!
> Writing a cool new pass that obeys the IR’s laws is not going to break the other passes.
I don't really get how you can fail to obey the IR's laws other than just emitting unparseable IR? alternatively if you like have uses without defs or flip-flopping types, the verifier needs to catch that after the guilty pass and just bail (not put the onus on the subsequent passes to fix).
Anyway IMHO (even though no one asked me), IR needs to have enough structure/info that you can quickly reproduce whatever metadata you need for XYZ arbitrary analysis. Meaning it should represent all of the useful/necessary things (don't make me infer non-negativity structurally if I can just annotate) and it should propagate those things across rewrites.
It's not impossible.
SLP vectorizer will affect the performance upside of passes before/after it. It won't affect their correctness.
> I don't really get how you can fail to obey the IR's laws other than just emitting unparseable IR? alternatively if you like have uses without defs or flip-flopping types, the verifier needs to catch that after the guilty pass and just bail (not put the onus on the subsequent passes to fix).
Lots of compilers - not LLVM IR - have a totally different kind of phase ordering problem than what you are thinking of: if you reorder the phases then the compiler crashes and burns. Some of these compiler are in production, sadly. I've heard some crazy horror stories. And it's all too easy to find yourself in that situation if you're not careful.
LLVM IR is an example of being careful.
maybe. it's late so i can't think of a counter-example off the top of my head where someone downstream of SLP somehow took something for granted and then it blew up in their faces. even if it is somehow magically not true for SLP, it's definitely true for other sequences of passes, where subsequent passes depend on prior passes.
> if you reorder the phases then the compiler crashes and burns. Some of these compiler are in production, sadly. I've heard some crazy horror stories. And it's all too easy to find yourself in that situation if you're not careful.
i mean i have witnessed this in a prod MLIR compiler i have personally worked on - it's exactly what i had in mind above. did it crash because it tried to deref some pointer to some attribute/node/whatever? no that's true - it crashed via llvm::report_fatal_error (or assert) - but it crashed nonetheless.
I can easily imagine that reordering phases increases overall likelihood of surfacing bugs that are there anyway.
I have a harder time imagining a dependence on canonical form that leads to a crash under phase reordering but is totally sound otherwise.
The cases where they're the most likely to occur is in the codegen pipeline, but codegen is already acknowledged to be a situation where you have non-optional passes that have to occur in a particular order, so I suspect you're already discounting those scenarios.
I don’t think it’s true for optimization passes in llvm and that’s what makes llvm kinda great.
I would love to hear of a counter example.
But say that you found a counterexample. That would still not be as bad as what happens in a lot of compilers out there where being able to reorder passes (or even run them a second time) is the exception rather than the norm, and our debate would be about whether there exists even a single pair of passes that could be safely reordered.
Edit: I'm being rate limited so here's what I wanted to post in response:
You kept talking about LLVM and I just assumed LLVM IR but this thing (different rules before and after regalloc) is exactly the case for LLVM's MIR (even going so far as no longer being SSA after a certain part of the pipeline). And in fact I have a PR hanging out right now for adding a legalization for GISel (which also operates on MIR) that crashes in exactly the way you say it shouldn't - I run my legalizer and it's fine and dandy and then somewhere down the line (another legalizer) there's a null dereference. So I guess LLVM isn't reasonable everywhere! And also I should take a think about the differences between LLVMIR and MIR.
Because I don't want to pick on others (as much fun as that would be), I'll just pick on compilers I wrote that have this property and how much I regret everything I did there.
- The first compiler IR I designed was called Fiji C1, and it had both an SSA form and a not-SSA form, and it had multiple stages of lowering within the same IR. You absolutely could not breathe on the phase order without destroying everything.
- I'm maybe like mostly responsible for JavaScriptCore's DFG IR, which has a LoadStore form, a ThreadedCPS form, and an SSA form, plus maybe other forms too. And there are many phases in there that serve the purpose of carefully lowering some aspect of type speculation. Foolish is the man who tries to mess with the order of those phases.
- The Assembly IR in JavaScriptCore's top tier JIT has different laws before and after regalloc, and before and after stack alloc. That's maybe inevitable and like not too bad since it also has a small number of phases, but still.
Note that in each of these cases, it absolutely is an IR issue. It's not like there were some rules that I failed to follow. It's that I designed the compiler specifically to have an IR that follows different laws in different parts of the pipeline.
Is it an argument against sea of nodes or just that it isn't pulling it's weight for them at the moment
Let me dig into the no objective benefits bit: there does not exist a compiler transformation or analysis that would be easier to write in the sea of nodes representation. Not even one. Contrast this with SSA versus the alternatives, where SSA saves you hella lines of code that you would otherwise be writing to track use def chains.
Not sure what this has to do with SSA or not? Maybe you're saying if you don't have SSA then value numbering requires extra book-keeping to keep track of the names? that's true but the book-keeping only kicks when a name is updated, not at every program point.
> If we want to keep our information sparse, though, we have to add a new definition to the IR.
I mean this isn't like a law of nature, it's just a design point - a question of whether the facts have to be serializable for some reason. If they don't (and in general, I can't think of a reason they would have to be), you can just keep things in-memory in general and do either sparse or dense analysis on an as-needed basis. To wit MLIR, an SSA IR, handily supports both sparse and dense dataflow analysis:
https://github.com/llvm/llvm-project/blob/main/mlir/include/...
https://github.com/llvm/llvm-project/blob/main/mlir/include/...
Note, both analyses propagate lattices which support meet and join.
> Anyway, SSA only stores type information about instructions and does not encode information that we might later learn in the IR. With basic SSA, there’s not a good way to encode refinements…
This is true of course but I struggle to think of a reasonable semantic that would enable you to refine/change/alter the type of a value that wasn't an operation. The conditional example is specious but could be rectified if it were modeled correctly: the conditional is an operation which has operands and two regions, each with a basic block with basic block args and it's the types of the bb args that can show as "refined". MLIR's scf.if isn't "isolated from above" and its bbs don't have args but it could have both such things and such an operation would model what you want precisely and in the IR/types themselves.
It is the case that if you do a flow-insensitive analysis of variables over SSA, it will be equivalent to a flow-sensitive analysis of variables, because of the static single assignment property. I think that's what the author is trying to say, and I agree.
> To wit MLIR, an SSA IR, handily supports both sparse and dense dataflow analysis:
I don't think dense vs sparse analysis is the same thing as the use of the word "sparse" in the post.
> This is true of course but I struggle to think of a reasonable semantic that would enable you to refine/change/alter the type of a value that wasn't an operation.
If you're compiling a dynamically typed language, for example.
you're saying the same thing i'm saying - the question/issue arises from value numbering and naming. see "engineering a compiler" page ~400 where they discuss the benefits of SSA for value numbering: https://imgur.com/a/bPabq7D
> I don't think dense vs sparse analysis is the same thing as the use of the word "sparse" in the post.
it is - the allusion/claim in the post is that SSA is sparse because you don't need to keep track of facts/metadata for all variables at all program points. that's because of the same thing: SSA uniquely locates/identifies values and therefore you can uniquely map between SSA ids and metadata. so you don't need to have a table of stuff that's updated at every program point (like you do in dense analysis), just along control-flow/operand/result edges.
We're definitely not saying the same thing, because to me, the post's claim about the benefits of SSA for flow analysis are obvious and unobjectionable, whereas you raised an objection.
> it is - the allusion/claim in the post is that SSA is sparse because you don't need to keep track of facts/metadata for all variables at all program points. that's because of the same thing: SSA uniquely locates/identifies values and therefore you can uniquely map between SSA ids and metadata. so you don't need to have a table of stuff that's updated at every program point (like you do in dense analysis), just along control-flow/operand/result edges.
So then what were you talking about when you said, "Not sure what this has to do with SSA or not?". Seems like you're now trying to explain to me exactly what this has to do with SSA or not.