RYZEN CRASHES IN HEAVY WORKLOADS

Question

RYZEN CRASHES IN HEAVY WORKLOADS

Aiden Scott

phoronix.com/scan.php?page=news_item&px=Ryzen-Test-Stress-Run

>With running a number of new Ryzen Linux tests lately, a number of readers requested I take a fresh look at the reported Ryzen segmentation fault issues / bugs affecting a number of many Linux users. I did and still am able to reproduce the problem.

>For those that missed our earlier article on the matter from early June, heavy workloads can cause problems on Ryzen, in particular segmentation faults while there have also been reports of some stability problems.

>This Google Doc remains among the resources trying to track this issue on Linux while on the Gentoo Forums, AMD Forums, and elsewhere are more reports of various problems encountered under extreme workloads -- like a ton of code compiling for hours on end, but can also happen in other scenarios.

>AMD hasn't publicly commented on the problem and as of Linux 4.13 the issue is still happening. If carrying out the same tests on Intel CPUs, the segmentation faults do not occur. There is even ryzen-test to easily try reproducing the issue. The ryzen-test script will build GCC in parallel loops from a compressed ramdisk, in order to easily stress the CPU. In my day-to-day benchmarking of Ryzen CPUs, however, I haven't hit this problem or even on my main production desktop with using Ryzen 5. The problem really comes to light just under very heavy and continuous workloads it seems.

>AMD hasn't publicly commented on the problem and as of Linux 4.13 the issue is still happening. If carrying out the same tests on Intel CPUs, the segmentation faults do not occur.
>AMD hasn't publicly commented on the problem and as of Linux 4.13 the issue is still happening. If carrying out the same tests on Intel CPUs, the segmentation faults do not occur.
>AMD hasn't publicly commented on the problem and as of Linux 4.13 the issue is still happening. If carrying out the same tests on Intel CPUs, the segmentation faults do not occur.

August 4, 2017 - 19:54

Other urls found in this thread:

svnweb.freebsd.org/base?view=revision&revision=321899
phoronix.com/forums/forum/phoronix/latest-phoronix-articles/967080-ryzen-test-stress-run-make-it-easy-to-cause-segmentation-faults-on-zen-cpus/page6
reddit.com/r/Amd/comments/6rkrne/ryzentest_stressrun_make_it_easy_to_cause/dl5xmuw/
arstechnica.com/gadgets/2016/01/intel-skylake-bug-causes-pcs-to-freeze-during-complex-workloads/
lists.debian.org/debian-devel/2017/06/msg00308.html
reddit.com/r/Amd/comments/6rrbsp/epyc_confirmed_to_suffer_from_the_segfault_issue/
twitter.com/SFWRedditGifs

Henry Roberts

only Sup Forumstards fall for amd cpus so who cares

August 4, 2017 - 19:55

Camden Perry

>Linux making Ryzen crash
Try again, Brian.

August 4, 2017 - 19:55

David Evans

>first shill thread fails > because of terrible formatting and bad image
>try again with spam and a crying wojak
lmao

August 4, 2017 - 19:56

Christian Gonzalez

>it's OK when AMD does it

August 4, 2017 - 19:58

Jackson Rogers

Work on my machine.

August 4, 2017 - 20:00

Asher Rogers

>Linux
Found your problem.

August 4, 2017 - 20:02

Jordan Jenkins

Didn't BSD fix this already?

August 4, 2017 - 20:05

Christian Murphy

>Using linux
hahahahahahahhaah

August 4, 2017 - 20:09

Andrew Smith

>Linux 'developers' haven't fixed an issue for a new product yet
>this is somehow AMD's fault

TRY AGAIN, BRIAN.

August 4, 2017 - 20:10

Cameron Hill

Sounds like something that will get patched.

August 4, 2017 - 20:12

Nolan Nguyen

It's a GCC thing. Literally never happened on Clang or Microsoft's compiler.

August 4, 2017 - 20:13

Owen Young

Hi, Matt Dillon here. Yes, I did find what I believe to be a
hardware issue with Ryzen related to concurrent operations. In a
nutshell, for any given hyperthread pair, if one hyperthread is
in a cpu-bound loop of any kind (can be in user mode), and the
other hyperthread is returning from an interrupt via IRETQ, the
hyperthread issuing the IRETQ can stall indefinitely until the
other hyperthread with the cpu-bound loop pauses (aka HLT until
next interrupt). After this situation occurs, the system appears
to destabilize. The situation does not occur if the cpu-bound
loop is on a different core than the core doing the IRETQ. The
%rip the IRETQ returns to (e.g. userland %rip address) matters a
*LOT*. The problem occurs more often with high %rip addresses
such as near the top of the user stack, which is where DragonFly's
signal trampoline traditionally resides. So a user program taking
a signal on one thread while another thread is cpu-bound can cause
this behavior. Changing the location of the signal trampoline
makes it more difficult to reproduce the problem. I have not
been because the able to completely mitigate it. When a cpu-thread
stalls in this manner it appears to stall INSIDE the microcode
for IRETQ. It doesn't make it to the return pc, and the cpu thread
cannot take any IPIs or other hardware interrupts while in this
state.

svnweb.freebsd.org/base?view=revision&revision=321899

>Yes, I did find what I believe to be a hardware issue with Ryzen related to concurrent operations.

JUST WAIT(TM) FOR MICROCODE PATCHES

August 4, 2017 - 20:13

Gabriel Murphy

>The bug is in Clang but worse.... I can get twice the number of seg faults when using Clang.... 121 per hour... More details tomorrow. been running tons of tests all day.

phoronix.com/forums/forum/phoronix/latest-phoronix-articles/967080-ryzen-test-stress-run-make-it-easy-to-cause-segmentation-faults-on-zen-cpus/page6

Nice try.

August 4, 2017 - 20:15

Blake Baker

>amdpajeets suddenly love microsoft
like pottery

August 4, 2017 - 20:18

Adam Harris

>More details tomorrow.
It is tomorrow, where are the detail Michael?

August 4, 2017 - 20:20

Logan Lopez

That's what you get for buying Rypoo garbage from Poos in Loos

August 4, 2017 - 20:20

David Fisher

Why do you feel the need to put a quote into a really obnoxious "code" box, faggot?

August 4, 2017 - 20:21

Christian Sanders

>linux
That's your problem.

August 4, 2017 - 20:24

Carson Edwards

>muh ryzen moar coars because I can compile gorillion gentoo VMs
>ryzen has critical bug related to heavy tasks like compiling
>w-who gives a shit about loonix, poojeetsoft designated street 10 4lyfe
wew

August 4, 2017 - 20:24

Isaac Flores

>no one at AMD considered testing their processors by compiling a bunch of stuff

lel

August 4, 2017 - 20:28

Kevin Cook

Eypc is dead in water if they don't fix this

August 4, 2017 - 20:28

Camden Gray

Sounds like a Linux problem

August 4, 2017 - 20:30

Luis Ortiz

>Linux
Same bug has been reproduced by running crypto mining software on Windows

August 4, 2017 - 20:32

Christopher Miller

[citation needed]

August 4, 2017 - 20:38

Daniel Hernandez

reddit.com/r/Amd/comments/6rkrne/ryzentest_stressrun_make_it_easy_to_cause/dl5xmuw/

August 4, 2017 - 20:40

Xavier Ross

>With running a number of new Ryzen Linux tests lately,

Stopped right there. I have nothing to worry about.

August 4, 2017 - 20:41

Robert Rivera

update your gentoo kernel compiler niggy

August 4, 2017 - 20:41

Asher Wilson

>lincucks

try again

August 4, 2017 - 20:43

Caleb Rivera

Daily reminder to report shitposters

August 4, 2017 - 20:46

Hunter Rivera

>I can't take criticism: the post

August 4, 2017 - 20:47

Nolan Gutierrez

>linux tests
Stopped reading there. who gives a fuck?

August 4, 2017 - 20:48

Ryan Morales

Looks to me, like Linux is shit, per usual FOSS can't into code.

August 4, 2017 - 20:48

Levi Clark

pajeet damage controol

August 4, 2017 - 20:50

William Richardson

>IMG_0500
kys amdrone phoneposting scum

August 4, 2017 - 20:50

Christian Sanchez

>Citation
>Anecdote from someone with OCed hardware

This is not a citation

August 4, 2017 - 20:51

Bentley Barnes

Everyone knows it's just a compiler bug producing invalid code
As if AMD wouldn't make such tests before shipping a new processor

August 4, 2017 - 20:52

Jordan Howard

>Everyone knows it's just a compiler bug producing invalid code
nigga it's a hardware bug that needs a microcode patch

August 4, 2017 - 20:53

Jack Hughes

Intelfag desperation is so delicious.

August 4, 2017 - 20:54

Evan Watson

>overclocking CPU could cause system instability
Nothing new.

August 4, 2017 - 20:54

Camden Campbell

>It's bad when Intel does it
arstechnica.com/gadgets/2016/01/intel-skylake-bug-causes-pcs-to-freeze-during-complex-workloads/

>but it's TOTALLY FINE when AMD does it

August 4, 2017 - 20:56

Kayden Cruz

No, it's software problem needing patch

August 4, 2017 - 20:56

Austin Anderson

Stopped reading after linus tech tips, he's OS is trash, and worthless.

August 4, 2017 - 20:57

Jayden Sanders

>I am unaware as to how often this occurs with intel processors: The Thread

>Happening status: Not happening, never was happening.

August 4, 2017 - 21:05

Noah Jackson

>It's bad when Intel does it
arstechnica.com/gadgets/2016/01/intel-skylake-bug-causes-pcs-to-freeze-during-complex-workloads/

>but it's TOTALLY FINE when AMD does it

August 4, 2017 - 21:06

Dominic Cooper

A processor doesn't crash retarded OP. It's the OS the one that crashes. Try using a non-hobbyist OS next time.

August 4, 2017 - 21:10

Gavin Lee

>It's bad when Intel does it
arstechnica.com/gadgets/2016/01/intel-skylake-bug-causes-pcs-to-freeze-during-complex-workloads/

>but it's TOTALLY FINE when AMD does it

August 4, 2017 - 21:11

Jacob Ward

>linux
Found the problem

August 4, 2017 - 21:15

Austin Sanders

bait

August 4, 2017 - 21:22

Wyatt Anderson

retard

August 4, 2017 - 21:39

Joshua Johnson

Go away POO IN THE JOO

August 4, 2017 - 21:42

Robert Baker

Sup Forumstards only uses gaymer intel cpu. They don't use gaymer amd cpu.

August 4, 2017 - 21:46

Matthew Martin

>install gentoo on my brand new ryzen computer
>SEGFAULT.ogg starts playing

August 4, 2017 - 22:56

Robert Howard

>buying a meme CPU and using a meme OS

August 4, 2017 - 22:59

Parker Gutierrez

>being a wincuck
Hello, Sup Forums.

August 4, 2017 - 23:13

Dominic Foster

Really surprised they haven't fixed this by now.

August 5, 2017 - 00:13

Justin Powell

This is really to be expected. Ryzen is a completely new design. Remember: If you buy the gen 1 of anything, you're beta testing for the manufacturer.

August 5, 2017 - 01:28

Carson Bailey

...

August 5, 2017 - 01:34

Jackson Gray

Remember that FMA and VTE bug a few weeks from launch, guess what, microcode update.

Worst case if this can't be fixed my micrcode it can by a new stepping.

August 5, 2017 - 01:37

Evan Morgan

Good thing I only play games.

August 5, 2017 - 01:38

Ryan James

>Ryzen
>hyperthread
Is this stupid nigger serious? No wonder why they can't get the fucker to work properly, they're trying to use Intel drivers on it.

August 5, 2017 - 01:42

Connor Diaz

AMD has sent out 5000 EPYC samples for partner testing since 2017 started til computex, I find it hard to believe companies and AMD aren't aware of this months before ryzen launched.

August 5, 2017 - 01:43

Robert Sanchez

The ayymd damage controll team now look at server market as a sour grape?

August 5, 2017 - 03:01

Christopher Anderson

>Running loonix or wangblows
I think I found the problem

August 5, 2017 - 03:05

Ethan Williams

Its like no one remembers the Phenom TLB bug where the only fix was to cripple memory performance. But its probably cause no one bought that pile of shit.

August 5, 2017 - 03:14

Eli Powell

This. Works perfectly on templeOS.

August 5, 2017 - 03:18

Dominic Robinson

>w-who needs to compile shit on server cpu

August 5, 2017 - 03:21

Jacob Watson

Works perfectly with minuet.

August 5, 2017 - 03:24

Ryder Harris

I know one die hard amdfag who bought it and defended it.

August 5, 2017 - 03:27

Julian Morris

>ocaml
>fixed
where's amds update?

August 5, 2017 - 03:28

Evan Thomas

This shit is still present on the new stepping as epycs are affected too.

August 5, 2017 - 03:34

Jace Jenkins

EPYC fail

August 5, 2017 - 03:34

Nathaniel Mitchell

>Ryzen can't into gaming
>Epyc can't into compiling on OS ran by majority of servers
Quick guess in what way AMD will fuck up threadripper. My bet is on the lack of proper cooling since no copper plate actually covers the IHS

August 5, 2017 - 03:39

Ethan Kelly

>FOSS
>Being smart

chooch one

August 5, 2017 - 03:42

Landon Davis

Noctua is already making heatsinks that cover the entire IHS. AMD is distributing Epyc heatsinks with Threadripper. I think the shitty watercooler is Ayylienware only.

Also, this is going to be fixed like every other issue. Intel had a similar Hyperthreading crash bug recently, too.

August 5, 2017 - 03:43

Tyler Sanchez

Should say "fixed with microcode".

August 5, 2017 - 03:44

James Murphy

that affects ocaml fags only, not the whole Linux stack

August 5, 2017 - 03:47

Aaron Morgan

OCAml discovered it first. Doesn't mean it wouldn't effect other types of software.

August 5, 2017 - 03:48

Hunter Wilson

no, the increased demands of ocaml on the compiler was the blame, honey.

August 5, 2017 - 03:53

Nathan Young

It might just be that they don't give enough voltage to the cpu when all the cores are running at 100%.

It would explain why some people seem to be much more affected than others and why it only happens at high workloads.

August 5, 2017 - 03:53

Owen James

doesnt happen on my r7.

August 5, 2017 - 03:56

Nicholas Price

install Gentoo

August 5, 2017 - 03:58

Jace Moore

...

August 5, 2017 - 04:01

Adam Morales

How quickly we forget. lists.debian.org/debian-devel/2017/06/msg00308.html

August 5, 2017 - 04:01

Brody Ramirez

that's what i'm using.

August 5, 2017 - 04:06

Isaiah Cook

Hyperthreading is dumb.

August 5, 2017 - 04:07

Aaron James

>sphagetti code lincucks has problems with cutting edge hardware
More news at 11

August 5, 2017 - 04:08

Jack Reed

> verified that the microcode fix indeed solved the OCaml issue
>One important point is that the code pattern that triggered the issue in OCaml was present on gcc-generated code. There were extra constraints being placed on gcc by OCaml

August 5, 2017 - 04:09

Ayden Davis

doubtful

August 5, 2017 - 04:10

Nathan Bennett

or sandy bridge fried sata contollers

August 5, 2017 - 04:11

Wyatt Morgan

reddit.com/r/Amd/comments/6rrbsp/epyc_confirmed_to_suffer_from_the_segfault_issue/

AYYMD IS FINISHED & BANKRUPT

AYYMDPOORFAGS CONFIRMED ON SUICIDE WATCH

August 5, 2017 - 04:13

Henry Brooks

SAY GOODBYE TO THE ENTERPRISE MARKET, LISA

August 5, 2017 - 04:14

Jayden Gomez

a single workload on a very specific built that literally has nothing to do with anything serious

clickbait title since clearly he doesnt even know what is going on

but yeah you know/g/

August 5, 2017 - 04:17

Matthew Powell

>let's make a server cpu but don't test it in server specific workloads
>the dilapidated shanty city of AMD

August 5, 2017 - 04:18

William Gonzalez

t. pajeet damage control centre: Mumbai branch

August 5, 2017 - 04:20

Charles Foster

t. Brian JUSTnich

August 5, 2017 - 04:21

Hunter Ramirez

sucks to be a tool for intel

August 5, 2017 - 04:22

Jeremiah Bennett

Testing is for squares

August 5, 2017 - 04:25

Nathaniel Gray

AMD are incompetent. News at 11.

August 5, 2017 - 04:26

Andrew Carter

It might have something to do with the issues ryzen has with some ram sticks.

August 5, 2017 - 04:29

1 2 ... 10 Next

RYZEN CRASHES IN HEAVY WORKLOADS

Last threads