Storage Technology

Ars Technica has an interesting article on the background of SANs, how they work and how Isilion SAN systems work.

Big data meets big storage: an in-depth look at Isilon's scale-out storage solution

We've gone far afield of the original subject here, so let's swing back to it. As I said a few paragraphs back, Isilon makes a scale-out NAS product that you'd use to hold files. What makes Isilon different from other NAS systems, though, is that it's particularly suited for providing very fast access to very large files, which makes it a shoe-in for the entertainment industry and for other areas where you need to manipulate big file-based data sets.

An Isilon system consists of a minimum of three Isilon nodes, each of which are individual servers running "OneFS," a heavily modified FreeBSD derivative, and connected to each other through a fast private IP network. Instead of Ethernet, this private network uses Infiniband for its transport, which Isilon chose because of the extremely low latency it offers versus traditional Ethernet. Unlike a regular SAN or NAS, the nodes all work together in a grid to present a single volume, equal in size to the aggregate capacity of all the nodes' disks. As nodes are added to the system, their capacities are seamlessly added to the one big volume, without the need to futz around with RAID groups or LUNs or mapping or masking or storage pools or anything else, really. This one single huge volume is Isilon's core feature—even with all the features and tools that an enterprise NAS system gives you, there aren't many NAS systems that can present a single, unified multi-petabyte namespace.

Although ZFS exists in an operating system whose future is at risk, it is easily one of the most advanced, feature-rich file systems in existence. It incorporates variable block sizes, compression, encryption, de-duplication, snapshots, clones, and (as the name implies) support for massive capacities. Get to know the concepts behind ZFS and learn how you can use ZFS today on Linux using Filesystem in Userspace (FUSE).

IBM gives some background information and tips to install zfs-fuse on linux: Run ZFS on Linux

If you want ZFS to run natively, then this guide explains how to set this up native ZFS on Ubuntu / Linux: Native ZFS On Ubuntu

The top five Ask the Expert answers of 2010 to help you learn about the most important data storage best practices for your organization. Read about storage management tools and technologies such as multiprotocol or unified storage, data migration, centralized data storage and cloud storage

  • How can multiprotocol storage arrays benefit SMBs?
  • What are some good data migration strategies for SMBs?
  • How does unified data storage apply to SMBs?
  • What type of centralized data storage should I use for 1.5 TB of data across multiple locations?
  • Is it possible to use a cloud storage service for primary storage?

Read the answers: Data storage best practices: Top five storage management answers

zxfer, available under the BSD license,  is a really handy and much promising ZFS tool. It transfer ZFS filesystems, snapshots, properties, files and directories, which can be done with a single command, while having similar end-to-end assurance of data integrity as the ZFS filesystem itself.

Some more detailed features are:

  • recursive transfer of filesystems
  • minimal dependencies.
  • to transfer filesystem properties, and override specified properties (e.g. compression, copies, dedup etc.)
  • backup original properties to a file so as to be able to restore them later.
  • transfer via rsync.
  • delete snapshots on destination not present on source, and transfer from the latest common snapshot
  • comprehensive man page.
  • beep when done
  • compatible with FreeBSD, OpenSolaris and Solaris 11 Express.

This tutorial shows how to combine four single storage servers (running Fedora 12) to a distributed replicated storage with GlusterFS. Nodes 1 and 2 (replication1) as well as 3 and 4 (replication2) will mirror each other, and replication1 and replication2 will be combined to one larger storage server (distribution). Basically, this is RAID10 over network. If you lose one server from replication1 and one from replication2, the distributed volume continues to work. The client system (Fedora 12 as well) will be able to access the storage as if it was a local filesystem. GlusterFS is a clustered file-system capable of scaling to several peta-bytes. It aggregates various storage bricks over Infiniband RDMA or TCP/IP interconnect into one large parallel network file system. Storage bricks can be made of any commodity hardware such as x86_64 servers with SATA-II RAID and Infiniband HBA.

Distributed Replicated Storage Across Four Storage Nodes With GlusterFS On Fedora 12