This is part of a series of blog entries from StringLiterals.com. In this series, we are sharing the entire process of building a twenty terabyte ZFS file server from scratch. This is part two: Hardware Selection.
Hardware technology moves very fast, necessitating in-depth research with each new generation of hardware. In this article, we will help you understand which decisions need to be made, the order in which to make them, and the important factors to weigh for each decision. Additionally, we will share a few specific hardware examples for each choice. In the end, we’ll share our choices, and see how we performed against our tight budget of $3,500 USD.
Considerations
When choosing hardware for our ZFS server, we must recognize that the considerations to be taken are specific to the task at hand. We are not building a gaming PC, a business workstation, or a virtualization host. Our goal is to assemble a file server utilizing ZFS technology. Therefore, we take the following into consideration at each step:
- stability
- interoperability
- cost
- I/O throughput
- expandability
- Compatibility with OpenSolaris or FreeBSD.
Through the course of trial and error, we discovered that there is a very specific order in which you should make hardware decisions when building a storage machine. The order is different than, for example, when building a desktop PC. eg: with a gaming system, you typically decide on a graphics card and desired CPU first, and then build the system around those choices.
With a storage server, we start with the disks and work our way up through the interfaces to the memory and CPU, and then out over the network card. This will be the path that data will flow. One poorly made decision can easily throw the price tag up by thousands of dollars. Some of this is because of the scale. When buying twenty hard drives, a component price difference of $60 adds up quickly. Another factor that drastically influences cost is storage connectivity. Building a system around the wrong motherboard, for example, might force us into choosing among very expensive disk controller cards. It pays to be aware of all of your options.
Let’s start with the hard drives.
Hard Drives
We chose the Western Digital 1.0 TB “Black” edition drive. We’ve previously used the WD 1TB RE3 “raid edition” with great success, so part of this decision is about brand comfort. The reason we’ve changed from the RE3 to the black is a curious one: Since buying the RE3’s (at a $70 premium each), we’ve learned that the only important difference between the “RE3″ and the “Black” edition is a firmware setting that can be manually changed. This firmware setting, called Time Limited Error Recovery (TLER), controls how long a single drive will spend attempting to read a sector.
While it might be fine for a standalone drive to spend twenty seconds to two minutes attempting to recover the data, this leads to trouble in a RAID-like setting. If the disk controller waiting on the drive times out before the drive itself gives up on a sector, the entire drive will be marked as bad and dropped from the pool. We much prefer the drives rapidly reporting a read failure, so that the ZFS system can quickly reassemble the missing data from parity on the fly. We wrote an earlier post about how to ready a WD Black drive for RAID use. Similar technology exists for other brands. It’s called Command Completion Time Limit (CCTL) for Samsung and Error Recovery ControL (ERC) for Seagate. Knowledge of how this feature works is the most critical concern when considering the use of large capacity consumer-grade hard drives in any sort of RAID configuration.
Disk Chassis: Internal vs External
Selecting a chassis for the disks all comes down to a trade-off between expandability and cost. On one end of the spectrum, we have internal hard drives mounted in the same case as the server. This is currently the most affordable way to go, but you can quickly run into a brick wall once your case is full of drives. Another option is to use an external storage chassis. External bays come in three basic varieties:
Disk Chassis: External Options
- SATA Port Multipliers
- SAS Multilane Enclosures
- SAS Expander Enclosures
SATA multipliers are by far the cheapest solution, but there’s a catch. We ruled out this approach fairly quickly, in light of the fact that this architecture only allows the controller to communicate with one drive at a time. This limitation is present because the drives must be able to act as though they have sole access to the controller. This means the controller must ask for a piece of information and wait for the drive to provide it before moving on to the next drive to request the next piece. This would be detrimental to performance.
SAS Multilane enclosures, sometimes marked as “SAS JBOD” have two advantages. First, the controller can communicate to multiple drives concurrently. Second, these chassis can be connected via a single MiniSAS connector per set of four drives. The trade-off here is that SAS controllers are relatively expensive, and you tend to fully consume the capacity of a controller with only a few drives. The chassis themselves are affordable. Here’s is an example 8-bay SAS JBOD enclosure for $469. (This blog has no affiliation with PC-Pitstop)
SAS Expander enclosures are the third and by-far preferred option when it comes to expandability. These can be daisy chained to support up to 128 drives on a single MiniSAS channel. You spend far less money on controllers since a 4 port SAS or 8 port SAS can easily drive 128-255 drive devices. For very high density setups, this your only real choice, as simply adding controller cards is not an option when you rapidly run out of expansion slots on the motherboard.
A while ago, we built a hardware RAID array using an Adaptec 8058 SAS controller, with hope of adding drives up to the sky-high limit of 256 devices via the magic of SAS expanders. So what was the catch? Stand-alone SAS expanders are simply not available on the market. The only place to find them is in the backplane of hot-swap cases, and they are very expensive compared to JBOD chassis. The best deal I’ve found with this technology is a 15-bay SAS expander enclosure for $1395 from PC Pit-stop. iStarUSA has a great selection of storage chassis, including the V-Storm series, but I’ve been unable to find these for retail sale.
Disk Chassis: Internal Options
We’re not the only ones vexed with the lack of options for storage-oriented server chassis. With enough searching, we were able to find a few viable choices. We fairly rapidly limited the field of options down to three chassis. There are basically three no-brainer price points:
Norco RPC-4020 – 4U 20 bay SATA case – $279
- Pro: Extremely affordable!
- Con: Supports ATX, but not Extended ATX server motherboards
- Pro: Backplane takes twenty individual SATA connectors, meaning you can use cheaper SATA II controllers, including those included on most motherboards
- Con: Backplane takes twenty individual SATA connectors, making for needless cable spaghetti should we choose a disk controller with SAS multilane connectors
SuperMicro CSE-846TQ-R900B Rackmount 24 bay – $949
- Pro: Moderately affordable; includes a redundant power supply
- Con: Power supply does not have an eight pin motherboard adapter needed by 5500 series Xeon boards; adapter available
- Pro: Uses SAS multilane cables – great for cable management if we use a SAS controller card with multilane or minisas connectors
- Con: Cannot use cheaper SATA connectors, ruling out the use of drive controllers built-in to motherboards
YMI Rackmount Pro 9U – $3,689
- Pro: The only manufacturer I could find of extremely large storage cases.
- Pro: If you anticipate needing 50 hot swap bays in your server chassis, this is really your only option.
The Norco RPC-4020 case is really the gem at our price point. For $279 we get twenty hot swap bays and 3 internal bays. As much as we would like the added space of having 24 bays, we found that there are extremely few options. The additional additional four bays on the SuperMicro aren’t quite the double price tag once you factor in the fact that they toss in a high quality redundant power supply; but we would still be sacrificing the the flexibility to use cheap SATA controllers due to the SAS backplane. 20 bays may seem awkward if you’re used to building RAID arrays in sets of 8 drives, but upon further contemplation we found that this case gives quite a few nice options for raidz structure:
If cashflow is tight, we could build our array slowly by using three sets of six drives, each in RAID-Z. This leaves two hot swap bays available for the operating system which will be a mirrored set. The trade-off of building in three “chunks” of six is that we are dedicated three drives to parity, but we’re not at the point where we can tolerate any two drives failing. Yes, we could tolerate 2 or even 3 failures, but only if we’re lucky enough for the failures to take place in separate chunks of the array. We’re more interested in limiting the worst-case scenario. A second drive failure within any given set of six and we would lose the entire array.
Another alternative is to build two sets of nine. With RAID-Z2 this yields 4 disks of parity for 18 disks, but gives us the benefit of being able to lose any two drives in the array, at the “cost” of only one additional drive of parity. There is a performance penalty in terms of IO operations per second (IOPS) when using larger clusters of drive in a single stripe which we will discuss in more detail when we go to configure the ZFS zpool.
A third option is to build the entire array at once, in which case we could consider making a 17-disc raid-Z2 array, and dedicate one of the three remaining bays towards a hot spare. This will ensure that we quickly recover from a single drive failure, with the hot spare providing quick recovery to full redundancy. This solution also has the same 15 drives of usable space as in scenario #1. There is a negative performance implication to one large set of drives, but we get higher effective storage (losing less capacity to parity) while maintaining great toelrance of drive faults.
More possibilities present themselves should we decide to forgo using the hotswap bays for the operating system root partition, and move those drives to the two internal HD brackets. By using all 20 bays for the array, we have more symetrical options, such as 4 raidz virtual devices with 5 drives each – a configuration we anticipate will be ideal for high IOPS performance.
We will try many of these configurations and analyze each in a later post. If you would like to read ahead, we recommend the ZFS Best Practices Guide.
Controller Cards
Because we are building a ZFS array, we have many choices when it comes to disk controllers. We are no longer constrained to selecting fast hardware RAID controllers with a hunk of NVRAM and a BBU. There are three basic choices when it comes to controller cards:
- SATA controllers built into the motherboard
- SATA controllers on expansion cards
- SAS controllers on expansion cards
The SATA controllers included on motherboards have one big limitation: quantity. Most motherboards support only six SATA connectors. There are a handful of enthusiast and server boards that provide support for 8 to 10 drives. Notable among these is the Asus P5Q, which we likely would have selected had we gone with the Intel Core2 platform, largely due to it’s excellent reported compatibility with OpenSolaris.
Standalone SATA controllers are an affordable option. Densities typically range from 2 to 8 devices per expansion card, and can be had for less than $100 each. These also have the cost-savings benefit of working with cases that have older SATA backplanes. Both SATA controllers and SAS controllers can be connected to SATA backplanes, but SATA controllers cannot be connected to SAS multi-lane backplanes.
SAS controllers are our third option, and are definitely the way of the future. The benefits include easy cabling with MinSAS SFF-8088 connectors, and near unlimited expandability both internally and externally. The down-side is that SAS controllers are much more expensive than SATA. Tomshardware.com has an excellent overview of SAS technology.
When previously building our hardware RAID array (the server we are replacing with this ZFS machine), we went with the Adaptec 5805 SAS controler, which is an extremely fast controller at the street price of around $500. Such SAS controllers are the preferred solution in two situations: The first are those circumstances where as much performance as possible must be squeezed out of 4 to 8 drives in a cost effective manner. The second situation for which SAS really shines is for storage systems that must scale well above 24 drives, when the price of SAS Expander technology is a non-issue. We also recommend the 24 port Areca ARC-1680IX-24-2 and the Dell PERC 5/i, which a certified component on the OpenSolaris Hardware Compatibility List.
Device controllers can very easily be one of the most expensive components of a storage system, second only to the disk drives.
There is one more important consideration when it comes to drive controllers, and that is the issue of redundancy. It’s possible to make a ZFS pool that can survive the failure of any one disk controller. This is achieved by making sure that no two drives withing a single raidz virtual device are hosted on the same controller. If this level of redundancy is required, we would recommend purchasing five controllers with each one controlling four of the twenty drives. This is far preferable to using one large controller. We will cover this topic again when we go to setup the RAIDZ structure in ZFS.
With our budget, we chose to use a few cheap SATA II controllers with reported OpenSolaris compatibility. We sacrificed performance for price by using the older 133mhz PCI-X 64 bit bus. To fit the bill we ordered two of the SuperMicro AOC-SAT2-MV8, available for $99 on NewEgg.com. Each of these can drive 8 SATA drives. We will initially use the motherboard to drive the remaining 4 of the 20 drives.
CPU: Intel vs AMD
This is a decision often made via personal preference, so I will not attempt to persuade the reader in one direction. I will merely state that my preference is for Intel. My decision is based largely on two factors: performance per kilowatt, and the choice of motherboards. By these measures, Intel pulled ahead of AMD with the introduction of the Core 2 core, and has been ahead ever since.
Memory: ECC vs non-ECC
We do not want to fall victim to the handful of random data corruption that happen on a typical module of memory each year. You can blame pesky cosmic rays for such random memory bit flips. If anything, our location in Denver, Colorado only makes this more important, as cosmic rays find there way to earth more frequently in the mile-high city.
The ECC feature uses banks of memory to store parity information. A process will continually scrub the memory, and is capable of correcting any one error per 64-bit word of memory.
Memory: Registered vs Unbuffered
This choice is thankfully a non-decision. The only reason to choose registered memory is if the motherboard requires it in order to reach the memory densities we require. One benefit of the modern architectures is a very high density of natively accessed memory. The on-chip memory controller has sufficient voltage to operate an entire bank of RAM in capacities of several dozen gigabytes. Very rarely these days do we see the need for a register to to sit between the memory controller and the memory banks to relay instruction. This is a good thing, because a registered memory module will take an extra clock cycle to do the necessary relaying of instructions, slowing down system performance in the area where we can least afford it.
Processor: Core i7 vs Xeon
What’s fascinating about this particular decision is that it appears to comes down to a pure question of performance vs reliability. The question of value can easily be brushed aside, because we have the novelty of a Xeon 5506 processor priced at the same point as the Core i7 920 processor. So with the dollars even on both sides of the comparison, let’s look at some specification:
At first, this masquerades as a fairly easy decision. Both processors are based on the same architecture, the Nehalem CPU core. Although the Xeon line is marketed for servers and workstations, and the i7 towards the desktop market, we must look beyond the marketing and assess what exactly what we get with each product. At this price point, the i7 actually has more muster in nearly every regard: both a higher clock speed, and more on-die cache. The higher cache of the i7 is a bit of a surprise, as this is usually a benefit of the Xeon lineup.
However, the decision becomes black-and-white once we take under consideration one very important piece of information: the Core i7 does not support ECC memory. In previous architectures, ECC support was a matter of motherboard choice, because the memory controller was located on the north-bridge chipset. With the i7/5500 series architectures, the CPU contains the memory controller, and thus we have no choice but to disqualify the i7 and adopt the Xeon. We are not going to go to great length to setup integrity safeguards on disk only to be lax about the integrity of the data once it sits in RAM.
Memory Type: DDR2 vs DDR3
Because memory bandwidth is the limiting factor in most server operations, it’s important to seek the highest performing memory architecture. Our decision of memory type is a straight-forward one. Both the Core i7 and Xeon CPU’s support triple channel DDR3 memory. This is the best bus arrangement currently available on the x86 platform. It’s also the chief reason we decided to go with the new i7/5500 architecture instead of something older. The slower bandwidth, dual channel memory architecture of the Core 2 platform is a more serious hindrance than the lesser number crunching power of older CPU’s. The only thing to remember is that we must install this memory in matched sets of three to take advantage of the triple channel architecture.
Memory Speed: 800 vs 1066 vs 1333
With memory speed, faster is usually better. If we could drop 1333mhz memory into this system, we would do it in a heartbeat. Unfortunately, our choice of processor had an unintentional side effect; at least at the lower price points. The dirty truth, buried in page 11 of the Xeon 5500 series specification, is that not all 55xx processors support the highest speed memory. The exact memory speeds supported by the Xeon differ as follows:
- 5502 through 5506 only support 800mhz RAM
- 5520 through 5540 support 800 and 1066mhz RAM
- 5550 through 5580 support 800, 1066, and 1366mhz RAM
Because of our price constraints, we have selected the Xeon 5506 for our new server. This means we must be content with memory running at 800mhz. Because fast memory is so affordable, we’ll go ahead and buy RAM capable of performing at 1333mhz. This way, once the price of the 5550, through 5580 processors become more reasonable, we can drop in an upgraded CPU and immediately get the faster performance from the memory bus as well.
Memory Model and Voltage
Once we know the type of memory bus and memory speed, we have one very important decision remaining. The choice of memory model and voltage is much more important with Nehalem core CPU’s than it was in recent history. Core i7 systems are quickly becoming notorious for instability. It appears that the primary cause for such instability is poorly matched memory. We’re going to play it safe, and limit or memory to modules that are either on the motherboard manufacturer’s supported memory list, or those that have been reported as tested and working by the community at large. For this reason, we delayed the choice of the individual memory module until after we had selected the motherboard.
Motherboard
Once we’ve made all the component decisions, above, it should be a fairly simple matter of finding a motherboard that adequately connects all the components. In this case, we need two PCI-X slots, support for the Intel Xeon 5500 series processor, support for at least four Sata drives on the onboard controller, and an ATX form factor. One weakness of the Norco case we selected is that it does not support the larger EATX form factor motherboards. We also have a preference for all Intel components on the motherboard, especially for the network controllers. These tend to be faster and more reliable than the off-brand network controllers. They’re supported by OpenSolaris, but they’re harder to find.
Plugging these search criteria into NewEgg yielded our prize: The SuperMicro X8SAX motherboard. As a bonus, this board provides three of the newer PCI-e slots. This gives us a clear upgrade path should we wish to attach pricer SAS disk controllers in the future.
This concludes our component-by-component tour. Let’s look at the final list and bill.
Summary
Here is the list of components, along with a brief review of the deciding factors. If you skipped reading the wall of text, above, please know that there is certainly more than one valid choice for each of these components. We highly advise against simply ordering what we have ordered. (For one thing, we haven’t gotten far enough in our build to confirm that they indeed work together in OpenSolaris.) Please use this guide to help you make your own decisions to best fit your particular needs.
Disks: 1TB Western Digital Black Drives – $100 each x 20 = $2000
- High density; but still within SATA spec
- Trusted brand
- Can have their firmware changed to act like more expensive “Raid Edition” RE3 drives
Chassis: Norco RPC-4020 4U Rack Case – $279
- Hot swap cages for 20 drives
- Significantly cheaper than external drive enclosures
- Allows for direct SATA connectors (no immediate need for SAS multilane cards)
Controllers: SuperMicro AOC-SAT2-MV8 PCI-X 8 port SATA controller – $99 each x2 = $198
- Cheaper than PCI-e SAS controllers
- Still relatively fast
CPU: Intel Xeon 5506 – $269
- Supports ECC where Core i7 does not
Memory: 12gb in two 6gb DDR3 kits: Crucial 1.5v 1333mhz Cas 9 ECC – $108 each x2 = $216
- ECC is a must for data integrity
- 1.5v is important for motherboard compatibility
- Will run at only 800mhz with the Xeon 5506; but we can get 1333mhz by dropping in an X5550 later
- On the “tested memory” list for our motherboard.
Motherboard: Supermicro MBD-X8SAX-0 – $260
- Rare combination of 2x PCI-X and 3x PCI-e
- Allows upgrade path to eSAS controllers in PCI-e later on
- All-Intel chipset, including Intel gigabit LAN
- Good reliability reports from NewEgg
- ATX form factor important
- 6 onboard SATA II ports, enough to drive remaining 4 hot swap bays + 2 internal HD’s
Power Supply: PC Power & Cooling 910 Watt – $170
- Single 12 volt rail to best handle drive spin-up
- High count of molex power adapters to sufficiently power our SATA backplane case
- 24 pin, 8 pin, and 4 pin motherboard connectors for use with SuperMicro X8SAS motherboard
This brings our total bill to $3,392 – safely within our $3,500 budget. Stay tuned as we discover how well these parts work together in OpenSolaris.
Hopefully this post has helped you navigate your way through the maze of decisions required to build a medium sized white-box ZFS server. Our next post will cover the OpenSolaris installation process. We’ll then stop to take an in-depth look at the design decisions for setting up ZFS, and walk through each command required to assemble our twenty drives into a single pool of storage. We’ll then compare and contrast multiple configuration options, run benchmarks, and select an implementation to keep.
Please subscribe to this blog via RSS to be notified of the next part in this series.
#1 by Tan Chin Huat on July 2nd, 2009
| Quote
Enjoyed the article. Very well written. Very detailed hardware requirements analysis. Sadly missing SSDs. It would be nice to see performance figures using SSDs. Also how you will configure RAM as primary cache (ARC) and SSDs as secondary cache (L2ARC).
Regards,
TCH
#2 by Tan Chin Huat on July 2nd, 2009
| Quote
Ben, I was thinking, maybe you should tell DELL you are writing this article about building a 20TB file server using OpenSolaris and ZFS. Tell them you need to loan DELL hardware with the following configuration
1) 1 x R710 2U Server with
a) 1 x 2.93 GHz Intel 5570 CPU
b) 3 x 4GB 1333 MHz RAM
c) 2 x 73GB 15000 RPM Disk Drives
d) 2 x 25GB SSD (for ZFS Intent Log)
e) 4 x 50GB SSD (for ZFS L2ARC)
f) 1 x SAS HBA for External Disk Storage
2) 1 x DELL PowerVault MD1000 with
a) 6 x 500 GB 7200 RPM SATA II Drives for RAIDZ
b) 1 x 500 GB 7200 RPM SATA II Drive for Hot Spare
It will be a good advertising gimmick for them. You get to test out the latest and fastest hardware. If readers out there knows it is already a working system, they would want to consider buying DELL systems. I for one am considering purchasing DELL but haven’t got the comfort level and expertise yet. It will be a win win situation. Think about it.
Regards,
TCH
#3 by Benjamin Tiefert on July 2nd, 2009
| Quote
TCH:
Thanks for the great suggestions. I suspect you will see at least a handful of lessons-learned out of this current exercise. I fully expect to have bandwidth limitations in our current setup. The first limiting factor will be the PCI-X bus. We’ll likely upgrade to PCI-e. Then the network interface will likely be our next bottle-neck. Once we have these two taken care of, I, too, look forward to seeing what type of speedup we get from dedicated read and write caches. At this juncture, we’re extremely cost-restricted. As we build out this array, I hope it will be helpful to illustrate how each of these bottlenecks quantitatively effect performance.
As for begging a loaner system from Dell, that sounds like a great idea once I have a little more traffic to this new site. At this point, my selfish concern is more about fixing our file server shortage. If things go well we might focus on more hardware at a later time. My real passion is for software, though.
Thanks,
Ben Tiefert
Pingback: opensolaris / zfs – whitebox build | Daz's bits and bobs
Pingback: En.dogeno.us » How-To » Build a 20TB file server – perfect for a lab environment that requires lots of fast and reliable disk
#4 by Nick on August 12th, 2009
| Quote
This is such a great article. LOADS of valuable info here. Kudos for putting the effort to share.
I plan on setting up a similar box, but at my home as a file / media server. I had a couple of questions…
1) Is it plausible to stream decent sized media files (.mp4, .mp3, .avi) over NFS or Samba on a ZFS system with RAIDZ? Will there be latency issues?
2) What are your thoughts on going with FreeBSD as opposed to OpenSolaris?
#5 by Benjamin Tiefert on August 12th, 2009
| Quote
Nick,
Thanks for the comment. In response to your question about streaming media files, the short answer is yes, absolutely. The network is the limiting factor in the equation. I’m currently getting about 350 megabits per second reading files over my gigabit network using the built-in CIFS (Samba-like functionality, but in-kernel) Even high definition content tops out at about 30 megabits per second of signal, and DVD-quality content uses less than 10 megabits in terms of bandwidth performance. Latency is not an issue for streaming. If you plan to use this as a media server, I highly recommend looking into a set-top box like a Popcorn Hour.
As for FreeBSD: I understand that it has a first-class implementation of ZFS, so I it’s likely a fine candidate. I prefer System V to BSD style systems as a personal preference. The only feature I would look into specifically is windows file sharing support. OpenSolaris has in-kernel CIFS, which barely stresses the CPU. I’m not a big fan of Samba as it can really peg the CPU when you’re pushing serious amounts of data through. If the machine is a dedicated NAS for less than ten people, this might not be an issue – but it’s something to be aware of and plan accordingly.
Hope this helps,
Ben
#6 by Grant on September 2nd, 2009
| Quote
Ben,
What graphics card did you use?
#7 by kiscica on September 2nd, 2009
| Quote
Awesome – is there any plan to continue this series? I’m in the middle of replacing my four 12-bay Norco PATA->Firewire JBODs hooked up to 2 Linux servers with a new setup: I just ordered 12 2TB drives and the 20-bay Norco case. Haven’t ordered the controllers/motherboard/CPU/memory/etc. yet. I wasn’t initially planning on using ZFS, which I’m not terribly familiar with, but this blog is pretty convincing. Now I’m considering taking the plunge with OpenSolaris but I’d love to hear how your project is working out first…
#8 by Tom on September 13th, 2009
| Quote
Hi Ben,
Great article, I really appreciated the detailed walkthrough of the comparisons you made with the trade-offs involved. It helps people easily make the appropriate selections for their personal application. I have been looking into upgrading my RAID 0 system to RAID 6 or zfs for about a year now and this was so very helpful.
It has been a couple of months since you posted this article and I wanted to check in and see if you were planning to continue the series. If you are still planning to write them, the “Operating System Install” and “Command-by-command zfs setup” I expect would be extremely helpful to many people.
One of the major difficulties in using zfs seems to be end-to-end explanations on how to setup a system and configure zfs. Many people (such as myself) are windows and/or linux guys, and how to build a system on OpenSolaris from end-to-end, along with full confidence on how to replace broken drive and expand the drive pool in an OpenSolaris environment, isn’t really available. How you created the system for the performance numbers you posted would help a lot of people migrate from RAID based home systems into zfs.
Thanks again for the great article.
#9 by Matt on October 14th, 2009
| Quote
Wow I had no idea the i7 didn’t support ECC memory. I guess Intel assumes everyone is running 200 VMs under Vmotion/HA so who cares about ECC(?). In light of that, I wonder why you wouldn’t consider AMD as your CPU. Until the i7, AMD owned the memory bandwidth speed benchmarks over Intel. And now that i7, the core where Intel finally got into the idea of putting the MMU on-die, doesn’t support ECC, that would be enough for me to go back to AMD. You earlier had mentioned compatibility between OpenSolaris and i7, but OpenSolaris/10/11 whatever it is has long since had AMD K8 optimizations built in. In a file server, I’m not sure why one would choose FPU ability over memory bandwidth. Aren’t you doing the majority of the disk control via controllers? And by ditching the i7 option (and I agree there) you’re back in Northbridge and Southbridge country. And now with Phenom II, AMD supports DDR3 up to DDR3-1333. So now you can get the best of all worlds: on-die MMU, ECC support, dump No/So bridge like a bad habit, and Solaris optimizations (at least the powernow for K10 in Nevada rev 80, not sure about core optimizations). But anyway, please keep updating this series, nice work! thx!
#10 by Leo on November 19th, 2009
| Quote
Hi Ben,
Awesome article series. I look forward to further reading.
I am curious about something though. I get the impression that you will be running OpenSolaris on this server. Why not Solaris?
I am asking because I tried to find information on installing OpenSolaris “server-like”, i.e not with all the fancy GUI stuff and extra software, but that didn’t seem possible without a lot of fiddling. However, with Solaris this would be the natural setup, and it too carries ZFS as far as I know.
TYIA
#11 by Bill Hathaway on December 10th, 2009
| Quote
Thanks for the very detailed writeup! I was wondering about the memory speed for your configuration though. I understand you will be currently limited to 800Mhz by your processor choice, but the Dell guide to Nehalem memory states “As soon as you put one more DIMM in any memory channel (two DIMMs in that memory channel) on any socket, the speed of the memory drops to 1066 MHz”, so it sounds like even if you popped a 5550 into the case you would still be limited to 1066 MHz by using a 3 channel * 2 DIMM config. I am not sure if there is any material savings in slower/cheaper memory, but thought I would point that out.
#12 by Izaak on February 3rd, 2010
| Quote
Great article. I’ve had very similar experience recently and tripped on many of the same conclusions you have.
For the sake of discussion I would argue a little further the affordability of SAS expanders.
For example, take a look at the Supermicro SC846E1: http://www.supermicro.com/products/chassis/4U/846/SC846E1-R1200.cfm
* 4u
* 24x 3.5″ SAS/SATA bays
* includes 2x redundant power supplies
* LSI SAS expander backplane w/ support for cascading and controller failover
* supports E-ATX
$1372 CAD (i’m canadian) from frontierpc.com: http://www.frontierpc.com/product/Supermicro/CSE-846E1-R900B.html
I would say for any important storage system you really want redundant power supplies. They are probably the most common part to fail after hard drives. Also, no matter how much fault tolerance you have, losing power everywhere at once can cause weird things to happen. Unpredictable surges etc., potentially causing damage to more than one drive at once.
For a controller, try the LSI SAS3442E-R, which can be had for $259 CAD at newegg: http://www.newegg.ca/Product/Product.aspx?Item=N82E16816118101&cm_re=lsi_sas-_-16-118-101-_-Product
* pci-e
* internal and external SAS connectors (unfortunately the external one is infiniband requiring a slightly annoying but simple mini-sas adapter)
* “Allows more than 122 total end-point devices”
I’m currently building this kind of system but smaller (12x bays), using a JBOD power card in case of a motherboard so the LSI controller sits in a 1u sun server we already own.
The issue I am incredibly confused about is drive support in current generation SAS expanders. If you actually ask, Supermicro will send you a compatibility list (not available on their web site) of drives the SAS expander supports. In some cases they even require certain firmwares, and all of the models are enterprise.
I would love to know if a WD Black with CCTL tweaked could be used with such a backplane.
In our case we’re buying expensive Seagate ES.2 drives
We don’t have the time or money to potentially waste on experimenting, I hope someone else would.
Hope you’ll continue the series,
#13 by Larry on April 25th, 2010
| Quote
This is such a great article. LOADS of valuable info here. Kudos for putting the effort to share.
I plan on setting up a similar box, but at my home as a file / media server. I had a couple of questions…
1) Is it plausible to stream decent sized media files (.mp4, .mp3, .avi) over NFS or Samba on a ZFS system with RAIDZ? Will there be latency issues?
2) What are your thoughts on going with FreeBSD as opposed to OpenSolaris?
#14 by Charles on April 26th, 2010
| Quote
Wow I had no idea the i7 didn’t support ECC memory. I guess Intel assumes everyone is running 200 VMs under Vmotion/HA so who cares about ECC(?). In light of that, I wonder why you wouldn’t consider AMD as your CPU. Until the i7, AMD owned the memory bandwidth speed benchmarks over Intel. And now that i7, the core where Intel finally got into the idea of putting the MMU on-die, doesn’t support ECC, that would be enough for me to go back to AMD. You earlier had mentioned compatibility between OpenSolaris and i7, but OpenSolaris/10/11 whatever it is has long since had AMD K8 optimizations built in. In a file server, I’m not sure why one would choose FPU ability over memory bandwidth. Aren’t you doing the majority of the disk control via controllers? And by ditching the i7 option (and I agree there) you’re back in Northbridge and Southbridge country. And now with Phenom II, AMD supports DDR3 up to DDR3-1333. So now you can get the best of all worlds: on-die MMU, ECC support, dump No/So bridge like a bad habit, and Solaris optimizations (at least the powernow for K10 in Nevada rev 80, not sure about core optimizations). But anyway, please keep updating this series, nice work! thx!