Are JIT and GC ever faster than static compilation and manual memory management?

Are JIT and GC ever faster than static compilation and manual memory management?

no

yes

Not necessarily, but they can be under some circumstances.

The key about JIT compilation is that it allows the compiler global knowledge of the program. For instance, the Java JIT can know that a certain method is monomorphic and therefore inline its implementation at every call site, whereas a C++ compiler can't know that since other methods may be linked in later.

As for the performance of GCs, that's more controversial. There's a classical Lisp paper that mathematically proves that a GC offers higher theoretical throughput than a malloc/free system, but it is contested to have changed since the rise to prominence of caches. I haven't seen a rigorous treatment of that assertion, however.

No, JIT is inherently slower and inherently worse at cache utilisation.

maybe

JIT vs. AOT compilation has nothing with cache management to do.

Profile guided optimisations offer pretty much the same information to a AOT compiler. The advantage is minimal while cache is much more important and used worse by JITs.

Came for this.
Never change, Sup Forumsee.

>Profile guided optimisations offer pretty much the same information to a AOT compiler.
Profile-guided optimization still can't know about monomorphic methods.

>cache is much more important and used worse by JITs.
Again, JIT vs. AOT compilation has nothing with cache management to do.

>For instance, the Java JIT can know that a certain method is monomorphic and therefore inline its implementation at every call site, whereas a C++ compiler can't know that since other methods may be linked in later.
You can declare a member function non-virtual or final. Isn't that the same?

Not in the way that you don't have to.

The method can still be polymorphic in theory, but if no class that overrides it is currently loaded the JIT can treat it as monomorphic. A class that overrides such a method can still be loaded at a later time, and the JIT will deoptimize and recompile the code that calls it. In that way, a program that only actually uses one particular class of a set of classes from a library will get performance benefits from that, while it will still work properly for programs that use the full set of classes.

Of course, that's not the only thing a JIT can do either. Looking at Hotspot, one thing it commonly does is simply assume that some certain function is in practice only called with arguments of one or two specific types, and optimizes heavily for that case (by inlining, for example), and if it turns out that those assumptions were wrong, it will back out and compile a slightly slower but fully compatible version of that function.

The ability to be able to try things that aren't necessarily provable is one of the greatest strengths of a JIT compiler.

Can't a C++ compiler tell that none of the classes being compiled into the binary override some particular virtual method, and therefore inline all of those calls?
Or is that not possible due to the compile/link model?

>Can't a C++ compiler tell that none of the classes being compiled into the binary override some particular virtual method, and therefore inline all of those calls?
For a monolithic binary, an LTO compiler could perhaps do that, but noone uses LTO in practice.

More importantly, it can't work unless the compiler can also prove that you won't be loading dynamic libraries during runtime anyway.

Theoretically it can do a lot of nifty things. They are barely feasible to implement as evident by the decades spent on improving JVM/CLR/Python/whatever. Also, in most cases shorter and more predictable delays matter more than theoretical performance and reliability, thus GC shouldn't be used in real-time applications(pretty much everything with GUI).

I see, so the possibility exists that a dynamically loaded library overrides a virtual method that you assumed could be inlined.

I'm beginning to see the advantages of JIT for heavily polymorphic programs.

>They are barely feasible to implement as evident by the decades spent on improving JVM
What are you talking about? The Hotspot JVM is probably the one implementation that actually does all these things and do them well. It's actually unbelievable that the CLR doesn't do them since it has Microsoft behind it. Python and the rest of the dynlang pack can't do anything mostly because the languages themselves make it almost impossible to prove useful invariants.

>Also, in most cases shorter and more predictable delays matter more than theoretical performance and reliability
This, however, is true, and is by far the strongest argument against JITting.

>I see, so the possibility exists that a dynamically loaded library overrides a virtual method that you assumed could be inlined.
You may also want to consider eg. the possibility that an externally installed dynamic library is upgraded independently from the programs that use it and thus changes its internal class structure.

Can you go into more detail about what the Hotspot JVM does better than the CLR?

Most importantly the CLR doesn't do de-optimization at all, so it can't do any of the speculative optimizations that are arguably the signature of Hotspot.

Thank you user!

Manual memory management can be much faster, but the problem is that malloc is really expensive to use.
JVM implements memory management by using malloc once to allocate a big chunk of memory.
JVM then uses mmap or something similar to move objects into the heap.
mmap is really cheap compared to malloc.

>mmap is really cheap compared to malloc.

I'm not completely sure what you mean, but as I understand it you want the compiler to only link what you want into the binary?
This is possible with modules in C++, which isn't standardized yet but Clang supports it.

>I'm not completely sure what you mean
I think he's being pretty explicit. He wants the compiler to be able to inline virtual functions that happen to be monomorphic under specific usages.

>as I understand it you want the compiler to only link what you want into the binary
This is what static linking does and has always done.

Err I meant memcpy. I think JVM uses memcpy to copy initialized objects into memory.
Of course this is implementation specific.

it depends

Even so, malloc isn't "really expensive". It's more expensive than stack allocation or GC'd heap allocation, but it is by no means "really expensive".

Also:
>Manual memory management can be much faster
See again:
>There's a classical Lisp paper that mathematically proves that a GC offers higher theoretical throughput than a malloc/free system

GC is slightly faster on allocation and way slower on deallocation. It also tends to have chunks of memory spread all over the heap, not good for CPU caching.

>way slower on deallocation
That's not exactly true. It isn't necessarily slower, it's just that it does a large bunch of deallocation at one time, leading to a single noticeable delay. Each of the individual deallocattion involved aren't necessarily slower, however.

>It also tends to have chunks of memory spread all over the heap, not good for CPU caching.
But the very thing with caching is that it doesn't matter if the cached accesses are spread out, as long as they have locality.

>spread out locality

you sound like someone with a plan

As I understand it, Java's heap is merely a stack that you can't pop from.
When the heap pointer gets too high, the GC is invoked. Dead objects are deleted and the rest are relocated/defragmented to lower that pointer.

>temporal locality

You're mostly correct, but it's a bit more complex due to several different such spaces with different roles being involved. Cf. ephemeral GC:

>But the very thing with caching is that it doesn't matter if the cached accesses are spread out, as long as they have locality.
Even if you were right about being spread out being a bad thing, you'd still be wrong. malloc/free systems suffer far more from heap fragmentation than a compacting GC.

>Even if you were right about being spread out being a bad thing, you'd still be wrong. malloc/free systems suffer far more from heap fragmentation than a compacting GC.
Not him, but there are allocators that avoid heap fragmentation equally as a compacting GC. Jemalloc, for example. Get with the times, gramps.

>but there are allocators that avoid heap fragmentation equally as a compacting GC
I find this difficult to believe, as a GC can only compact the heap by moving objects around, which relies on all memory and all types being managed. If you did the same in C/C++, there'd be dangling pointers all over the place if you tried to move objects around.

>If you did the same in C/C++, there'd be dangling pointers all over the place if you tried to move objects around.
In practise, apparently a more clever block allocation strategy suffices. It's not like compacting is free from cost anyway.

That sounds like quite a different thing, though, as a compacting GC will have exactly zero fragmentation after collection.

Mind you, I'm not saying this makes any interesting difference in practice, I was just saying that it was wrong to say that the structures of a compacting GC are more spread out.

>Mind you, I'm not saying this makes any interesting difference in practice, I was just saying that it was wrong to say that the structures of a compacting GC are more spread out.
I honestly have no idea what meant by that.