A Digital Age Deserves A Digital Leader

Advice on large TB array (MSA-60)

Advice on large TB array (MSA-60)

Postby kd1966 » Sun Dec 21, 2008 9:51 pm

Thought this would be the most appropriate place for this....

At work, we have a MSA60 (HP) array that has 12 ea. 750GB SATA drives; we run this in RAID 5, creating a ~7TB single drive backup. However, we have had issues with it failing drives; now the individual drives are SATA, but the MSA60 itself connects to the backup server via SAS cabling.

Aside from the failing drive positions (Various locations) we know part of the issue is the HP array is connected to a Dell server with a Dell SAS RAID card, and that is one of our planned fixes early next year - bring it all onto an HP platform in order to get the support, which we cannot due to mixed hardware.

I was just wondering if anyone has any experience with large TB SATA RAID arrays and why at certain times of peak operation (Like weekends when we do full backups on all servers) at least ONE drive will fail, which is not necessarily an issue as the hot spare kicks in and rebuilds, but if another drive fails during the rebuild period, it's hosed, which just happened this weekend (Again..... doh) :eek:

Prior to the failure, we noticed the array was severly fragmented, to the point where our largest backup (~1TB) was taking over 70 hours to complete, most of which was in the Verify process; the average file fragmentation was over 4400 fragments.............. :shocked: for each file
PRO PLATINUM
User avatar
Posts: 6831
Joined: Tue Aug 09, 2005 2:00 am
Location: USA - GSO - NC

Postby yeshuas » Sun Dec 21, 2008 10:01 pm

How often do you replace the drives, not because of failure, just as a routine?
Game Over!!!!!!!!
Image
ASUS Maximus V Gene MB
Windows 8 X64; Windows 7 X64; Windows 7 X86
Intel I5-3570K
16GB Corsair Vengeance Ram
eVGA GeForce GTX 550 TI
Corsair GS700 PS
1TB Seagate SATA 6.0Gb HD
Thermaltake Case
Software Development
User avatar
Posts: 5075
Joined: Wed Jan 17, 2007 3:29 pm
Location: Chicago, IL
Real Name: Daniel Schmidt

Postby yeshuas » Sun Dec 21, 2008 10:15 pm

I guess the point I am getting at is; If the drives are comparatively the same age and one fails, it is a good idea to replace another one as soon as the Raid has rebuilt and then another, and then another etc until all drives are new again.
Game Over!!!!!!!!
Image
ASUS Maximus V Gene MB
Windows 8 X64; Windows 7 X64; Windows 7 X86
Intel I5-3570K
16GB Corsair Vengeance Ram
eVGA GeForce GTX 550 TI
Corsair GS700 PS
1TB Seagate SATA 6.0Gb HD
Thermaltake Case
Software Development
User avatar
Posts: 5075
Joined: Wed Jan 17, 2007 3:29 pm
Location: Chicago, IL
Real Name: Daniel Schmidt

Postby kd1966 » Sun Dec 21, 2008 10:38 pm

When failures occur, we run a diagnostic, but it doesn't turn up anything, other than the array, card, or drive failed. Now the diagnostic we use is Dell, and we cannot use the HP diagnostic until we move it all to the HP platform, which is the major reason we are doing this, for the support. Typically we will pull the failed drive, reseat the drive and clear the subsequent foreign configuration on the drive in order place it back in hot spare mode. Also, once we reseat the drive(s) and retest them, they come up without errors........ weird.
PRO PLATINUM
User avatar
Posts: 6831
Joined: Tue Aug 09, 2005 2:00 am
Location: USA - GSO - NC

Postby yeshuas » Sun Dec 21, 2008 10:44 pm

Well the larger the Raid size the more information transferred etc. so the more activity and the more age on the drives in a shorter period of time and a higher degree of failure, or chance of failure.

What is the age of the drives when they fail?
Game Over!!!!!!!!
Image
ASUS Maximus V Gene MB
Windows 8 X64; Windows 7 X64; Windows 7 X86
Intel I5-3570K
16GB Corsair Vengeance Ram
eVGA GeForce GTX 550 TI
Corsair GS700 PS
1TB Seagate SATA 6.0Gb HD
Thermaltake Case
Software Development
User avatar
Posts: 5075
Joined: Wed Jan 17, 2007 3:29 pm
Location: Chicago, IL
Real Name: Daniel Schmidt

Postby kd1966 » Sun Dec 21, 2008 10:47 pm

We have not replaced any of the drives (They are not your garden variety<$100 drives) as the company wants to "prove" they are bad before replacing. We do have 2 spares, which we have rotated in as drives fail. This MSA-60 sat on a shelf for about a year before it was actually installed, which was in June of this year, so we haven't had it for very long.
PRO PLATINUM
User avatar
Posts: 6831
Joined: Tue Aug 09, 2005 2:00 am
Location: USA - GSO - NC

Postby yeshuas » Sun Dec 21, 2008 11:08 pm

Found this ...........
When a disk fails in a RAID 5 array and it has to rebuild there is a significant chance of a non-recoverable read error during the rebuild (BER / UER). As there is no longer any redundancy the RAID array cannot rebuild, this is not dependent on whether you are running Windows or Linux, hardware or software RAID 5, it is simple mathematics. An honest RAID controller will log this and generally abort, allowing you to restore undamaged data from backup onto a fresh array.
Game Over!!!!!!!!
Image
ASUS Maximus V Gene MB
Windows 8 X64; Windows 7 X64; Windows 7 X86
Intel I5-3570K
16GB Corsair Vengeance Ram
eVGA GeForce GTX 550 TI
Corsair GS700 PS
1TB Seagate SATA 6.0Gb HD
Thermaltake Case
Software Development
User avatar
Posts: 5075
Joined: Wed Jan 17, 2007 3:29 pm
Location: Chicago, IL
Real Name: Daniel Schmidt

Postby kd1966 » Mon Dec 22, 2008 12:11 am

Yeah, that's kind of the same thing we saw as well; there has been some discussion regarding the approaching "end of life" of RAID 5 as a viable business solution due to the increasing sizes of the arrays along with the unrecoverable read errors (Meaning - error rate has not changed with SATA drives as they got larger and larger) of the drives leading to more failures. I'm guessing now what the focus will be at our work is to get it into an all HP environment so we can prove that something is bad/broken and get it fixed. Thanks Daniel!
PRO PLATINUM
User avatar
Posts: 6831
Joined: Tue Aug 09, 2005 2:00 am
Location: USA - GSO - NC

Postby mnemonicj » Mon Dec 22, 2008 6:03 am

Would it be possible to do a RAID 6 instead? A RAID 6 has two parity blocks across all of the discs so it can have 1 drive fail and still be redundant and have 2 drives fail and still have all of the data.

I'm not saying it will solve your problem, but it will save your data. RAID 6 is a little harder to come by though.
PRO Level 15
User avatar
Posts: 1066
Joined: Tue Aug 17, 2004 1:41 am
Location: Indianapolis, IN

Postby kd1966 » Tue Dec 23, 2008 4:42 pm

We will move to RAID 6 when we move it all to the HP server; I just finished updating the firmware on our SmartArray P800 SAS RAID card, which does support RAID 6, and we don't necessarily "need" to have 7+ TB of disk space with a TL4000 library anyhow....

EDIT: Forgot to say thanks to you guys for responding! ^*^ I appreciate it :notworthy
PRO PLATINUM
User avatar
Posts: 6831
Joined: Tue Aug 09, 2005 2:00 am
Location: USA - GSO - NC

Next

Return to Hardware and Customizing

Who is online

Users browsing this forum: No registered users and 3 guests

cron
cron