|
Post by turtlesoup on Jan 8, 2007 5:11:10 GMT 7
Just to clarify, I have upgraded to 1.00.05 but I don't have the jumpers installed. I'll post if I experience the degraded state again.
|
|
jk
New Member
Posts: 15
|
Post by jk on Jan 9, 2007 11:55:23 GMT 7
I just found this forum.... I thought I'd share my experience.
I have RAID5 running with 5 x WD500YS-01MPB0 f/w 07.0 drives.
My RAID will startup randomly degraded with on drive dropped from the array. I'd say this happens at rate of about 1 in 5 to 1 in 10 shutdowns. The shutdowns can be from the UI or from a front panel switch press, even a self imposed shutdown because of low battery UPS condition, it doesn't seem to matter.
I have tested both with and without the jumpers installed, but mostly INSTALLED (which is where they are now), they had no decernable effect on the failure rate.
I saw failures in their firmware ver 1.00.03.xx and in 1.00.04.09 (which is what I am running now). It seemed to me .03 failed more often than .04 but it could just be the random behavior of it all.
I have never seen 2 drives dropped from the array but that is my fear so I have ben reluctant to try .05 release since it require a power cycle.
I contacted Thecus on this matter (and a couple of other issues) just prior to the .04 release and got a couple of responses, but they have stopped answering my e-mails.
I never had any luck with the tech support e-mail so I went through the Sales department using sales@ email address. That seemed to get some attention. Maybe if everyone mailed their sales department it would light a fire in their engineering group.
|
|
|
Post by frevel on Jan 9, 2007 15:08:10 GMT 7
Just for information for everyone:
I still have the problem with firmware 1.00.05. It makes no difference which NTP mode I am using. It also makes no difference which Spread Spectrum Mode I am using.
I wrote a mail to the Thecus support and at least got a response that they are working on this issue.
Frevel
|
|
nogami
Junior Member
Posts: 68
|
Post by nogami on Jan 10, 2007 17:19:47 GMT 7
For info, am using 5x WD5000YS and have installed the latest firmware, i.e. 1.00.05. I haven't put any jumpers on the drives, only because I haven't received them yet but as mentioned the problem appears to have disappeared since the time synchronization was switched to manual. If that was the solution, glad to be of help! The thing that makes me think that was the problem was that whenever I was getting the "degraded" error before, the timestamp on the degraded drive was always different from the other 4 drives that mounted correctly. I'm not sure, but I suspect that may be how the system validates the drives as part of the same RAID array. It may be that for some reason the correct timestamp isn't being written to one of the drives, so the next time the system is rebooted, it sees one drive with a different time, and doesn't bring it into the array. As I put the jumpers on at the same time, I can't confirm if it was the jumpers, or the time/date that solved the problem, only that since doing both, it hasn't been degraded once, through many reboots/power cycles.
|
|
|
Post by omega on Jan 10, 2007 18:12:29 GMT 7
But if the degraded disk issue is supposed to be caused only by the time synchronisation settings, why is this issue only affecting installations using these WD500YS drives?
Andreas
|
|
ss42
New Member
Posts: 16
|
Post by ss42 on Jan 11, 2007 19:55:05 GMT 7
But if the degraded disk issue is supposed to be caused only by the time synchronization settings, why is this issue only affecting installations using these WD500YS drives? The WD5000YS is a popular enterprise drive that is supposed to work well in RAID. We are trying to capture information that Thecus (or this group) may use to find and fix this problem. It is disappointing that Frevel observed that (WD5000YS + 1.00.05 + no NTP + SS jumpers) still falls out of sync and degrades. We haven't seen a failure since we went no-NTP; but we only had one cycle on three 5200s. It is entirely possible that the WD5000YS acts in a way that makes this problem surface. That's a clue! Has anyone seen this degraded phenomenon with a drive other than the WD5000YS? And what makes the WD5000YS special? Hopefully Thecus is all over this.
|
|
|
Post by omega on Jan 11, 2007 21:34:50 GMT 7
I've checked on my N5200 what Thecus is doing with the time synchronisation settings... and this are the result:
1) On bootup /app/cfg/rc.local start /app/bin/ntp_cfg which itself asks the config DB what NTP server to use and then calls /app/bin/ntpdate.sh. This script more or less simply calls ntpdate to fetch the date from the NTP servers and hwclock --systohc in oder to set the host clock to the system time.
2) There is an entry in the cron config file /app/cfg/crond.conf which starts /app/bin/ntp_cfg every 4 hours.
3) There is another cron job which calls hwclock --hctosys --localtime hourly, cusing the system time to be set to the hardware clock.
4) When shutting down the operating system, the hw clock again is set to the system time.
This are the time setting tasks I could find in the system. For me it is really not clear why these tasks should affect the RAID on the WD500YS drives in any way. But maybe someone has a clue....
I think it is more likely that the whole issue has something to how the system is shutdown. Recently I watched the ouput from the shutdown procedure and it contained too many errors for my taste (e.g. it tries to remove the swapping area from /dev/md0 where it is mounted on /dev/md1 on my box). I'll try to post the output of the shutdown script shortly.
But still this wouldn't explain why this issue seems to affect only WD drives.
Andreas
|
|
rmrch
New Member
Posts: 3
|
Post by rmrch on Jan 13, 2007 6:24:33 GMT 7
I have installed the spread spectrum jumpers a week ago.
This did not solve the problem with the WD5000YS disks discussed here.
The 6th booting up - after 5 regular starts - since then failed and my storage server came up in degraded mode.
That is the usual failure average - there is no improvement.
Any new ideas?
|
|
ss42
New Member
Posts: 16
|
Post by ss42 on Jan 14, 2007 22:21:19 GMT 7
|
|
|
Post by gideon007 on Jan 15, 2007 5:45:50 GMT 7
well, even though that doesn't help any of you with those disks it is very reasonable of Thecus to label those disks unstable. This thread is a sad testimony of it
|
|
|
Post by kniteowl on Jan 15, 2007 12:50:35 GMT 7
As with everyone who is having the problem, I have the following
RAID5 running 5 x WD500YS-01MPB0 f/w 07.0 drives
I have only ever had this problem once and have yet to be able to reproduce it since upgrading to 1.00.05 of the firmware. On my unit, I have the NTP time sync setup, no jumpers on the drives. I usually shutdown at the end of every day. I am not sure if I have just been lucky or what may be different.
Today, while looking at the WDC site, I notice that they have a new firmware for these drives. The following is what it is suppose to fix.
"This firmware resolves an issue where a WD1600YS, WD2500YS, WD4000YS, or WD5000YS hard drive is dropped from a RAID set without reporting errors after a period of normal usage."
Just for the heck of it, I upgraded the firmware and now it shows that I am on f/w 09.0 for the drives. Now, or course, since I have been unable to reproduce the problem, I don't know if this will help, but for those of you that this happens regularly on, if you can give this a try and see if the problems go away and report back, it would be helpful.
Teng
|
|
ss42
New Member
Posts: 16
|
Post by ss42 on Jan 15, 2007 20:28:42 GMT 7
Yuck. Well I guess a unfortunate solution would be better than no solution at all. We will wait to hear more user or Thecus confirmation that this is indeed the issue. Real pain for us with three N5200s online. Guess we would have to pull the drives, one by one, do the drive firmware update, reboot the N5200. This sounds way too risky for the data on the arrays. Thinking it would be nice to have another N5200s so that we can do the drive upgrades on an empty array. Here is the WD firmware link: support.wdc.com/download/index.asp?cxml=n&pid=15&swid=57
|
|
|
Post by turtlesoup on Jan 17, 2007 7:16:16 GMT 7
Yuck. Well I guess a unfortunate solution would be better than no solution at all. We will wait to hear more user or Thecus confirmation that this is indeed the issue. Real pain for us with three N5200s online. Guess we would have to pull the drives, one by one, do the drive firmware update, reboot the N5200. This sounds way too risky for the data on the arrays. Thinking it would be nice to have another N5200s so that we can do the drive upgrades on an empty array. Here is the WD firmware link: support.wdc.com/download/index.asp?cxml=n&pid=15&swid=57Well, I got motivated this afternoon so I decided to update the firmware on all my drives. Updated them 1 by 1 starting at the bottom (disk #5). For each disc, I powered down the n5200, took out the drive, updated it, put it back in the n5200, booted back up, and checked the web page to see that the new firmware was reported and there were no errors. Everything was going great until I got to the last drive (disk #1). I upgraded it, placed it back in the n5200, booted up and BLAH ..... came up in degraded mode not being able to find disk 1. So I'm currently rebuilding the raid. Sad to say but it looks like the WD 0.9 firmware doesn't solve this problem. P.S. I didn't lose any data but I did backup everything before starting. One of the drives (disk #3) took about twice as long to update as the others.
|
|
|
Post by turtlesoup on Jan 17, 2007 15:07:47 GMT 7
Possible success! I just upgraded to the 1.00.6.5 beta firmware and rebooted the n5200 10 times (alternating shutdown and reboot) with no degraded problems! I do some more reboots tomorrow and report back, but now it's time to watch some 24 before bed.
|
|
|
Post by ondro727 on Jan 18, 2007 1:55:21 GMT 7
It really may be only those particuler WD drives. Mine N5200 was power cycled appr. 30-times now withou a singe drive problem. Using 5x500GB Seagate 7200.10 drives in RAID5.
|
|