I was reading yesterday the gv100 whitepaper

i was reading yesterday the gv100 whitepaper
when suddenly

Quality of Service is how quickly the GPU execution resources will be available to process work
for a client upon submission of the work. Volta MPS provides control for MPS clients to specify
what fraction of the GPU is necessary for execution. This control to restrict each client to only a
fraction of the GPU execution resources reduces or eliminates head-of-line blocking where work
from one MPS client may overwhelm GPU execution resources, preventing other clients from
making progress until prior work from another MPS client completes
images.nvidia.com/content/volta-architecture/pdf/Volta-Architecture-Whitepaper-v1.0.pdf
(page 32)
they literally emulated async and still is almost 60% behind amd on it
cant wait to see their "async" in work while cloging poor 4 core cpu's

Other urls found in this thread:

amd.com/Documents/GCN_Architecture_whitepaper.pdf
forum.beyond3d.com/threads/amd-vega-10-vega-11-vega-12-and-vega-20-rumors-and-discussion.59649/page-199
images.anandtech.com/doci/9124/Async_Aces.png
devblogs.nvidia.com/parallelforall/inside-volta/
labs.eleks.com/wp-content/uploads/2012/11/1.1.HostKernelSchedule.png
www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
ict-energy.eu/sites/ict-energy.eu/files/140424 ST Wlotzka - Asynchronous_iteration_for_GPU-accelerated_HPC_systems.pdf
twitter.com/AnonBabble

>they literally emulated async and still is almost 60% behind amd on it
It actually sounds like this would be used for VDI environments.

this is literally how async works since its inception
they created an abstraction layer cause obviously they lack the hardware
i doubt they will bring this on the consumers anyway it will tax the system like a mothefucker

>this is literally how async works since its inception
So? And it doesnt matter because absolutely no one uses AMD for VDI workloads. This is for the Tesla V100 which will be extensively for VDI.

>VDI workloads
this is part of the new microcode for the cuda in general not just for tesla

Literally from the first page of your PDF it talks about how this PDF for the Tesla V100 you retard. This has nothing to do with vidya, this is for VDI workloads

>first page
i literally posted page 32 on which they are talking about the new cuda microcode

how much of a fanboi are you user? how illiterate are you user? pic related. amd has no presence at all in the data center and I dont know why you're bringing up vidya when the pdf makes it clear it has nothing to do with that.

>new cuda microcode
>new microcode for the cuda
Also the word microcode doesn't exist in that document

May you link to a whitepaper detailing AMD's supposed hardware async and how they achieve equal functionality? I'd love to see how they achieve 'hardware' async and on which cards....

It's implied by what feature is being discussed. Async is done in microcode thus why its Async.

> mfw brainlet exposes himself

You dont understand what microcode is do you? Anyways stay butthurt you dont understand what applications this card is for.

not op but you must be an idiot
amd.com/Documents/GCN_Architecture_whitepaper.pdf

shit i guess Sup Forums was right
guess beyond3d was wrong once again
forum.beyond3d.com/threads/amd-vega-10-vega-11-vega-12-and-vega-20-rumors-and-discussion.59649/page-199
who would knew a bunch of gpu engineers are amateurs compared to Sup Forums

I'm not. I know what AMDs Async units are. Here's quick and better detailing than your white paper :
images.anandtech.com/doci/9124/Async_Aces.png
What I want is more detail.. The kind that Nvidia provides using standard language so one can compare them to industry standard featuresets.

Speaking of which, Nvidia has Hardware Async. They provide Instruction-level and thread-level preemption via Warp schedulers and Streaming multi-processor control that presides over compute cores just like AMD's pipeline. The difference being that they have better drivers and software so it actually performs far better than AMDs implementation. Let me know if you want source sauce friendo...

Also, I thought AMD's async functionality was a fucking meme?

I do, it's what controls the idiosyncrasies of a hardware pipeline like Async control and out of band features like Async you dumb ass brainlet which is why its referred to as CUDA microcode as there are no forward facing API or ISA level interfaces to the async functionality as its handled by microcode features in the pipeline based on flows and the ISA triggers. So what exactly do you know about microcode fekkit?

warp sc occurs on the cpu not on the gpu first of all they tried to make their sm's to be more like gcn in maxwell 3.0 but their cores arent nowhere near as flexible as amd cores are
last time nvidia had a hardware sc was back in the kepler era the 780ti was the last superscalar card from nvidia

First of all Warp schedulers are hardware that reside on the GPU as are SMs which are triggered by Instantiations that can occur from compute and/or GPU flows in CPU land.

Whatever their SM/Warp scheduler combination, it's proven to perform far better which much less latency and granularity than AMD's cores and (world changing) async HARDWARE scheduling.

People keep railing about software/hardware as if there's no interface between the two where the magic happens. What matters is latency and throughput on real world operations. Therein you find that AMD's implementation lags incredibly behind NVidia's So much for
> muh HW schedulers
As if a whole slew of microcode, the actual pipeline, firmware, drivers, the ISA, and APIs don't matter...

devblogs.nvidia.com/parallelforall/inside-volta/

OP is a dumb AYYMDPOORFAG that don't understand anything

>MUH ASYNC

Pascal destroys AYYMD in DX12 performance and Volta will be even more embarassing for AYYMD

really? post any proof that the async implementaion of nvidia is actually superior

People haven't realised how much the novidia blob hogs cpus.
Offloading jobs to the cpu for the supposed best cards in the world.
They live up to juang sung han tsung's quote "novidia is a software company"

>Whatever their SM/Warp scheduler combination, it's proven to perform far better which much less latency and granularity than AMD's cores and (world changing) async HARDWARE scheduling.

you mean like mid 2016 that 1080 reviewers were having insane numbers on aots and later on we learned that nvidia "forgot" to render the most heavy shader of the game that was passing through the compute path?
this is how nvidia deals with fp16 shaders they just replace with their own low resolution ones
cant wait to see what will happen with the new wolfenstein that uses fp16 on shaders for almost everything

Novidia renders what they want... Suppixel tessalation anyone?

...

Searched for website but couldn't find it. However, it had to do w/ HPC in academia whereby they do deep analysis on compute pipelines and compare latency. To warp in a kernel and execute a task, AMD's GPU took somewhere near 30x more time. Memory transfer and sync operations were far worse as well as other key features. Async is a high level feature dependent on very low level hardware and software.

So, the proof was in architectural details. Something that can be hidden in canned benchmarks or in developer code. You can't can low level hardware functionality/driver and that's how async gets implemented.

CPUs manufs don't go around yapping their hardware level pre-emption. They all have this capability. The question is how does it perform. Like I said, I was reviewing HPC for a while and came across a site that showed what actually happens in hardware down to the # of clocks. AMDs approach depends on huge flows to perform adequately whereas Nvidia's is an all around polished performer. Low compute flow rate = low latency. High flow rate = slightly higher latency.

AMD's approach incurs high latency at the low end and it gets hidden at the high end. This really isn't a secret fyi. There are tradeoffs to different approaches. Brainlets hear (hw) and they suddenly think better performance..

Gaymen performance of a specific game that maps to and lends itself to specific enhanced performance on a particular GPU. I'm speaking about academic measurements of different points of both manuf's pipeline down to the clock not some stupid ass gaymen benchmark. I'm talking about architectural details and performance therein. If you have a hardware pipeline that is 100ns long because you have hardware schedulers vs one that is 40ns long because you don't. The 100ns likely only performs better when being maxed out and incurring lots of async interruptions. If async interruptions are rare then the 40ns pipeline wins out.

You don't gain an understanding of computer architecture looking at benchmarks on the weekend.

I'm referring to analysis like this :
labs.eleks.com/wp-content/uploads/2012/11/1.1.HostKernelSchedule.png

As for what you describe, yeah :
www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf
Latency hiding is an art from that's clearly beyond just saying you have async hardware schedulers. Bulldozer had lots of shit but performed like ass. Sometimes you get shit wrong. Software can be fixed or tuned.

What am I looking at in this image? Is left or right supposed to be better? Because right looks way clearer to me.

let us take out some truths then

fact one and the most serious
nvidia cores cant flip pause or switch mid cycle UNLESS they flush their workload on which they need to wait thus why they can be synchronous without any penalty but when you introduce asynchronous workloads on those cores you gonna have to be sure what is going where something that amd never really needed

talking about HPC perf while cuda dominates its kinda moot untill recently there wasnt any async workload to really push the cards only one simple test writte by beyond3d
second maxwell 3.0 has 1 graphics pipeline and 31 compute
gcn always had 64
third the fp16 perfomance on maxwell 3.0 just like 2.0 just like 1.0 is SHIT thus the shitfest we see on every game that uses compute path for shaders

you cant talk about academic implent of async since up untill recently async wasnt even a thing they "try" to use cuda as such but given how the "wavefronts" arent capable of doing anything more than serial workloads calling it async is really going far
ict-energy.eu/sites/ict-energy.eu/files/140424 ST Wlotzka - Asynchronous_iteration_for_GPU-accelerated_HPC_systems.pdf
look how they create async workloads i agree its simpler than to cut down the "workload" and distribute it the amd way but you cant really call it async not in a long shot

both should have been equal 1080 back then was "forgetting" to render the heavy shaders glass and snow mainly because they were passing through the compute path and the compute path on nvidia was busy calculating the volumetrics

> Latency
> Throughput
Pick one to sacrifice for the other

> fact one and the most serious
Fact that needs more detail. Within a warp and within an sm. Warp and SM size was decided based on flows. So, yes, you flush a workload. A small portion of the overall workloads. This is efficient for Nvidia's hardware scheduler. It would take 500 long page whitepaper to detail and accurately assess this design.
AMD chose another design. It has pros and cons just like Nvidias. Ace's aren't per core, they group cores just like Nvidia. Honestly its the same shit but is integrated across the pipeline/software/drivers in a whole different manner.

> You gonna have to be sure what is going where something that amd never really needed
Yeah, you're gonna have to be a non lazy fuck and optimize your code

> talking about HPC perf while cuda dominates
That's all that needs to be said

> Academic async implementation / HPC
This is the whole concept behind warms/SMs and convergent kernels vs what happens when they diverge (shit performance). You do an execution in parallel across a bunch of parallel threads then sync after the execution. WTF would you go out of your way to provide async within such a short form parallel execution? RTG jumped the gun on this feature like they always do. GPUs, at the time, were meant for hugely parallel convergent compute. It's evolving and Nvidia is evolving their arch with it. RTG has this issue of trying to define a standard no one is even on and declaring victory.. Look at Volta, they've gotten around to asynch because now it potentially matters. Back then it didn't.

I'll be honest man.. I'm rooting for RTG and open standards. I'm rooting for a solid futuristic compute architecture with amazing drivers/development tools/feature-sets that gives me granular high performance control over my GPU. The issue is.. neither RTG or Nvidia deliver this without you paying out the ass for their high end cards.

One tries to fuck you less than the other but those pro margins are just to tempting. So you end up with the same shit most times from both of them. A gimped ass pro card with the GPU pipe increasingly becoming a 2nd rate occupant of the die.

RTG thinks far out into the future and is pushing the hardest for open compute. However, they keep fucking up in the short/medium term on drivers/software stack. How can I expect them to drive something as big as HSA if they can't even deliver a fuckn driver stack for an in-release vidya? I trust AMD CPU division. I don't trust RTG.

Then you have big daddy Nvidia who has 500 die spins segmenting features into oblivion with insane pro card pricing. AMD has less budget so you get more on the die (aka power utilization) but then like faggots they disable a shit ton of stuff in drivers/software pulling the same shit as Nvidia except in drivers and they have poor perf/watt.

If they had any sense, they'd have delivered a true Vega FE card ungimped targetting a sweet spot... But no one knows wtf features the fucking cards will have from RX on up.

Raj is busy tweeting bullshit instead of getting engineers on the mic and detailing their fucking product correctly. It's not like there's NDAs and shit to protect their secret development. The cards are already being sold ffs.

So I dunno man, neither of these assclowns deserve to sit on the future of GPU computing progress. If you ask me, its time for a 3rd party to enter the market and kick both of them in the teeth.

>2017
>trusting anything nvidia say
>trusting anything any company says for that matter

ISHIGGYDIGGYDOODLADOOYOUFAGGOTS