Andy Levy wrote:
>
>
> Do you know which core process #3 was running on? Can you run just one
> process for a while and see if you can isolate the issue to a single
> core?
>
I wondered why you asked specifically about process 3 - when I realized
thats the one I said "hardware failure" - I apologize for not
clarifying... when Prime95 finds a problem the message is always the same
"FATAL ERROR: Rounding was [X], expected less than [X]
Hardware Failure tetected, consult stress.txt file.
Torture Test ran [X] Minutes - [X] errors, [X] warnings"
This same message displayed for three of the four instances of Prime95
that were running.
As for isolating the cores - I'm not sure its possible. when running
multiple instances of Prime95, you change a value of "n" in the comamnd
when starting it up [ -An ]... I used 1, 2, 3, 4....
The cores load in sequence despite whatever numbers I assign. If I
assign 3 to an instance first -- core 1 still fires first... an so forth.
I have now removed a stick of RAM from one of the other boxes - and
put it in the trouble maker. I started running Prime95 with the cores
at 2.0Ghz - and they ran fine past the initial time when Prime95
reported errors lastnight. I let it run about 20 mins on 2.0 Ghz - I
know they say to run em 24 hours at least... but I wanted to see if it
would fail just as fast at 2.0Ghz as it did at 2.83 Ghz.... Heres the
thing... I bumped the cores up to 2.83Ghz about 30 mins ago... and no
errors yet.
I'm going to let it run at least 24 hours with this other RAM in
there... see what happens.
-- ----------------------------------------------------------------------------- The white zone is for loading and unloading of passengers only... -----------------------------------------------------------------------------
This archive was generated by hypermail 2b29 : Mon Dec 01 2008 - 11:09:51 EST