i was reading yesterday the gv100 whitepaper when suddenly
Quality of Service is how quickly the GPU execution resources will be available to process work for a client upon submission of the work. Volta MPS provides control for MPS clients to specify what fraction of the GPU is necessary for execution. This control to restrict each client to only a fraction of the GPU execution resources reduces or eliminates head-of-line blocking where work from one MPS client may overwhelm GPU execution resources, preventing other clients from making progress until prior work from another MPS client completes images.nvidia.com/content/volta-architecture/pdf/Volta-Architecture-Whitepaper-v1.0.pdf (page 32) they literally emulated async and still is almost 60% behind amd on it cant wait to see their "async" in work while cloging poor 4 core cpu's
>they literally emulated async and still is almost 60% behind amd on it It actually sounds like this would be used for VDI environments.
Elijah Barnes
this is literally how async works since its inception they created an abstraction layer cause obviously they lack the hardware i doubt they will bring this on the consumers anyway it will tax the system like a mothefucker
Josiah Watson
>this is literally how async works since its inception So? And it doesnt matter because absolutely no one uses AMD for VDI workloads. This is for the Tesla V100 which will be extensively for VDI.
Kevin Rogers
>VDI workloads this is part of the new microcode for the cuda in general not just for tesla
Ayden Fisher
Literally from the first page of your PDF it talks about how this PDF for the Tesla V100 you retard. This has nothing to do with vidya, this is for VDI workloads
Michael Reed
>first page i literally posted page 32 on which they are talking about the new cuda microcode
Zachary Rogers
how much of a fanboi are you user? how illiterate are you user? pic related. amd has no presence at all in the data center and I dont know why you're bringing up vidya when the pdf makes it clear it has nothing to do with that.
Matthew Cooper
>new cuda microcode >new microcode for the cuda Also the word microcode doesn't exist in that document
Wyatt Lewis
May you link to a whitepaper detailing AMD's supposed hardware async and how they achieve equal functionality? I'd love to see how they achieve 'hardware' async and on which cards....
William Miller
It's implied by what feature is being discussed. Async is done in microcode thus why its Async.
> mfw brainlet exposes himself
Nathaniel Stewart
You dont understand what microcode is do you? Anyways stay butthurt you dont understand what applications this card is for.
I'm not. I know what AMDs Async units are. Here's quick and better detailing than your white paper : images.anandtech.com/doci/9124/Async_Aces.png What I want is more detail.. The kind that Nvidia provides using standard language so one can compare them to industry standard featuresets.
Speaking of which, Nvidia has Hardware Async. They provide Instruction-level and thread-level preemption via Warp schedulers and Streaming multi-processor control that presides over compute cores just like AMD's pipeline. The difference being that they have better drivers and software so it actually performs far better than AMDs implementation. Let me know if you want source sauce friendo...
Also, I thought AMD's async functionality was a fucking meme?
I do, it's what controls the idiosyncrasies of a hardware pipeline like Async control and out of band features like Async you dumb ass brainlet which is why its referred to as CUDA microcode as there are no forward facing API or ISA level interfaces to the async functionality as its handled by microcode features in the pipeline based on flows and the ISA triggers. So what exactly do you know about microcode fekkit?
Wyatt Moore
warp sc occurs on the cpu not on the gpu first of all they tried to make their sm's to be more like gcn in maxwell 3.0 but their cores arent nowhere near as flexible as amd cores are last time nvidia had a hardware sc was back in the kepler era the 780ti was the last superscalar card from nvidia
Lincoln Jones
First of all Warp schedulers are hardware that reside on the GPU as are SMs which are triggered by Instantiations that can occur from compute and/or GPU flows in CPU land.
Whatever their SM/Warp scheduler combination, it's proven to perform far better which much less latency and granularity than AMD's cores and (world changing) async HARDWARE scheduling.
People keep railing about software/hardware as if there's no interface between the two where the magic happens. What matters is latency and throughput on real world operations. Therein you find that AMD's implementation lags incredibly behind NVidia's So much for > muh HW schedulers As if a whole slew of microcode, the actual pipeline, firmware, drivers, the ISA, and APIs don't matter...
OP is a dumb AYYMDPOORFAG that don't understand anything
>MUH ASYNC
Pascal destroys AYYMD in DX12 performance and Volta will be even more embarassing for AYYMD
Isaiah Anderson
really? post any proof that the async implementaion of nvidia is actually superior
Grayson Morales
People haven't realised how much the novidia blob hogs cpus. Offloading jobs to the cpu for the supposed best cards in the world. They live up to juang sung han tsung's quote "novidia is a software company"
Jace Jackson
>Whatever their SM/Warp scheduler combination, it's proven to perform far better which much less latency and granularity than AMD's cores and (world changing) async HARDWARE scheduling.
you mean like mid 2016 that 1080 reviewers were having insane numbers on aots and later on we learned that nvidia "forgot" to render the most heavy shader of the game that was passing through the compute path? this is how nvidia deals with fp16 shaders they just replace with their own low resolution ones cant wait to see what will happen with the new wolfenstein that uses fp16 on shaders for almost everything
Xavier Thomas
Novidia renders what they want... Suppixel tessalation anyone?
David Richardson
...
Nathan Lee
Searched for website but couldn't find it. However, it had to do w/ HPC in academia whereby they do deep analysis on compute pipelines and compare latency. To warp in a kernel and execute a task, AMD's GPU took somewhere near 30x more time. Memory transfer and sync operations were far worse as well as other key features. Async is a high level feature dependent on very low level hardware and software.
So, the proof was in architectural details. Something that can be hidden in canned benchmarks or in developer code. You can't can low level hardware functionality/driver and that's how async gets implemented.
CPUs manufs don't go around yapping their hardware level pre-emption. They all have this capability. The question is how does it perform. Like I said, I was reviewing HPC for a while and came across a site that showed what actually happens in hardware down to the # of clocks. AMDs approach depends on huge flows to perform adequately whereas Nvidia's is an all around polished performer. Low compute flow rate = low latency. High flow rate = slightly higher latency.
AMD's approach incurs high latency at the low end and it gets hidden at the high end. This really isn't a secret fyi. There are tradeoffs to different approaches. Brainlets hear (hw) and they suddenly think better performance..
Daniel Hill
Gaymen performance of a specific game that maps to and lends itself to specific enhanced performance on a particular GPU. I'm speaking about academic measurements of different points of both manuf's pipeline down to the clock not some stupid ass gaymen benchmark. I'm talking about architectural details and performance therein. If you have a hardware pipeline that is 100ns long because you have hardware schedulers vs one that is 40ns long because you don't. The 100ns likely only performs better when being maxed out and incurring lots of async interruptions. If async interruptions are rare then the 40ns pipeline wins out.
You don't gain an understanding of computer architecture looking at benchmarks on the weekend.
As for what you describe, yeah : www2.eecs.berkeley.edu/Pubs/TechRpts/2016/EECS-2016-143.pdf Latency hiding is an art from that's clearly beyond just saying you have async hardware schedulers. Bulldozer had lots of shit but performed like ass. Sometimes you get shit wrong. Software can be fixed or tuned.
Carson Robinson
What am I looking at in this image? Is left or right supposed to be better? Because right looks way clearer to me.
Gavin Clark
let us take out some truths then
fact one and the most serious nvidia cores cant flip pause or switch mid cycle UNLESS they flush their workload on which they need to wait thus why they can be synchronous without any penalty but when you introduce asynchronous workloads on those cores you gonna have to be sure what is going where something that amd never really needed
talking about HPC perf while cuda dominates its kinda moot untill recently there wasnt any async workload to really push the cards only one simple test writte by beyond3d second maxwell 3.0 has 1 graphics pipeline and 31 compute gcn always had 64 third the fp16 perfomance on maxwell 3.0 just like 2.0 just like 1.0 is SHIT thus the shitfest we see on every game that uses compute path for shaders
you cant talk about academic implent of async since up untill recently async wasnt even a thing they "try" to use cuda as such but given how the "wavefronts" arent capable of doing anything more than serial workloads calling it async is really going far ict-energy.eu/sites/ict-energy.eu/files/140424 ST Wlotzka - Asynchronous_iteration_for_GPU-accelerated_HPC_systems.pdf look how they create async workloads i agree its simpler than to cut down the "workload" and distribute it the amd way but you cant really call it async not in a long shot
Jordan Ward
both should have been equal 1080 back then was "forgetting" to render the heavy shaders glass and snow mainly because they were passing through the compute path and the compute path on nvidia was busy calculating the volumetrics
Hunter Davis
> Latency > Throughput Pick one to sacrifice for the other
> fact one and the most serious Fact that needs more detail. Within a warp and within an sm. Warp and SM size was decided based on flows. So, yes, you flush a workload. A small portion of the overall workloads. This is efficient for Nvidia's hardware scheduler. It would take 500 long page whitepaper to detail and accurately assess this design. AMD chose another design. It has pros and cons just like Nvidias. Ace's aren't per core, they group cores just like Nvidia. Honestly its the same shit but is integrated across the pipeline/software/drivers in a whole different manner.
> You gonna have to be sure what is going where something that amd never really needed Yeah, you're gonna have to be a non lazy fuck and optimize your code
> talking about HPC perf while cuda dominates That's all that needs to be said
> Academic async implementation / HPC This is the whole concept behind warms/SMs and convergent kernels vs what happens when they diverge (shit performance). You do an execution in parallel across a bunch of parallel threads then sync after the execution. WTF would you go out of your way to provide async within such a short form parallel execution? RTG jumped the gun on this feature like they always do. GPUs, at the time, were meant for hugely parallel convergent compute. It's evolving and Nvidia is evolving their arch with it. RTG has this issue of trying to define a standard no one is even on and declaring victory.. Look at Volta, they've gotten around to asynch because now it potentially matters. Back then it didn't.
I'll be honest man.. I'm rooting for RTG and open standards. I'm rooting for a solid futuristic compute architecture with amazing drivers/development tools/feature-sets that gives me granular high performance control over my GPU. The issue is.. neither RTG or Nvidia deliver this without you paying out the ass for their high end cards.
Levi Taylor
One tries to fuck you less than the other but those pro margins are just to tempting. So you end up with the same shit most times from both of them. A gimped ass pro card with the GPU pipe increasingly becoming a 2nd rate occupant of the die.
RTG thinks far out into the future and is pushing the hardest for open compute. However, they keep fucking up in the short/medium term on drivers/software stack. How can I expect them to drive something as big as HSA if they can't even deliver a fuckn driver stack for an in-release vidya? I trust AMD CPU division. I don't trust RTG.
Then you have big daddy Nvidia who has 500 die spins segmenting features into oblivion with insane pro card pricing. AMD has less budget so you get more on the die (aka power utilization) but then like faggots they disable a shit ton of stuff in drivers/software pulling the same shit as Nvidia except in drivers and they have poor perf/watt.
If they had any sense, they'd have delivered a true Vega FE card ungimped targetting a sweet spot... But no one knows wtf features the fucking cards will have from RX on up.
Raj is busy tweeting bullshit instead of getting engineers on the mic and detailing their fucking product correctly. It's not like there's NDAs and shit to protect their secret development. The cards are already being sold ffs.
So I dunno man, neither of these assclowns deserve to sit on the future of GPU computing progress. If you ask me, its time for a 3rd party to enter the market and kick both of them in the teeth.
Brody Clark
>2017 >trusting anything nvidia say >trusting anything any company says for that matter