iSCSI details

Posted by Diwakar ADD COMMENTS

I tried to collect some good information on iSCSI driver details which was request by some reader. Hope this will help you on iSCSI queries. Leave comment if it is useful... I will try to write iSCSI overview in next entry. Happy Reading!!!!!!

The iSCSI driver provides a transport for SCSI requests and responses to storage devices via an IP network instead of using a direct attached SCSI bus channel or an FC connection. The SN 5400 Series Storage Router, in turn, transports these SCSI requests and responses received via the IP network between it and the storage devices attached to it. Once the iSCSI driver is installed, the host will proceed with a discovery process for storage devices as follows:
1. The iSCSI driver requests available targets through the SendTargets discovery mechanism as configured in the /etc/iscsi.conf configuration file.
2. Each iSCSI target sends available iSCSI target names to the iSCSI driver.
3. The iSCSI driver discovery daemon process looks up each discovered target

in the /etc/iscsi.bindings file. If an entry exists in the file for the target, the corresponding SCSI target ID is assigned to the target. If no entry exists for the target, the smallest available SCSI target ID is assigned and an entry is written to the /etc/iscsi.bindings file. The driver then sends a login request to the iSCSI target.
4. The iSCSI target accepts the login and sends target identifiers.
5. The iSCSI driver queries the targets for device information.
6. The targets respond with the device information.
7. The iSCSI driver creates a table of available target devices.
Once the table is completed, the iSCSI targets are available for use by the
host using all the same commands and utilities as a direct attached (e.g., via
a SCSI bus) storage device.


- All Linux kernels released on or before Feb 4, 2002 have a known bug in the buffer and page cache design. When any writes to a buffered block device fail, it is possible for the unwritten data to be discarded from the caches, even though the data was never written to disk. Any future reads will get the prior contents of the disk, and it is possible for applications to get no errors reported.
This occurs because block I/O write failures from the buffer cache simply mark the buffer invalid when the write fails. This leaves the buffer marked clean and invalid, and it may be
discarded from the cache at any time. Any future read either finds no existing buffer or finds the invalid buffer, so the read will fetch old data from disk and place it in the cache. If the fsync(2) function initiated the write, an error may be returned. If memory pressure on the cache initiated the write, the unwritten buffer may be discarded before fsync(2) is ever called, and in that case fsync will be unaware of the data loss, and will incorrectly report success. There is currently no reliable way for an application to ensure that data written to buffered block devices has actually been written to disk. Buffered data may be lost whenever a buffered
block I/O device fails a write. The iSCSI driver attempts to avoid this problem by retrying disk
commands for many types of failures. The MinDiskCommandTimeout defaults to "infinite", which disables the command timeout, allowing commands to be retried forever if the storage device is unreachable or unresponsive.
- All Linux kernels up to and including 2.4.20 have a bug in the SCSI device initialization code. If kernel memory is low, the initialization code can fail to allocate command blocks needed for proper operation, but will do nothing to prevent I/O from being queued to the non-functional device. If a process queues an I/O request to a SCSI device that has no command blocks allocated, that process will block forever in the kernel, never exiting and ignoring all signals sent to it while blocked. If the LUN probes initiated by the iSCSI driver are blocked forever by this problem, it will not be possible to stop or unload the iSCSI driver, since the driver code will still
be in use. In addition, any other LUN probes initiated by the iSCSI driver will also block, since any other probes will lock waiting for the probe currently in progress to finish. When the failure to allocate command blocks occurs, the kernel will log a message similar to the following:
***************************************************************
kernel: scsi_build_commandblocks: want=12, space for=0 blocks
In some cases, the following message will also be logged:
kernel: scan_scsis: DANGER, no command blocks
***************************************************************
- Linux kernels 2.2.16 through 2.2.20 and 2.4.0 through 2.4.18 are known to have a problem in the SCSI error recovery process. In some cases, a successful device reset may be ignored and the SCSI layer will continue on to the later stages of the error recovery process. The problem occurs when multiple SCSI commands for a particular device are queued in the low-level SCSI driver when a device reset occurs. Even if the low-level driver correctly reports that all the commands for the device have been completed by the reset, Linux will assume only one command has been completed and continue the error recovery process. (If only one command has timed out or failed, Linux will correctly terminate the error recovery process following
the device reset.) This action is undesirable because the later stages of error recovery may send other types of resets, which can affect other SCSI initiators using the same target or other targets on the same bus. It is also undesirable because there are more serious bugs in the later stages of the Linux SCSI error recovery process. The Linux iSCSI driver now attempts to avoid this problem by replacing the usual error recovery handler for SCSI commands that timeout or fail.
- Linux kernels 2.2.16 through 2.2.20 and 2.4.0 through 2.4.2 may take SCSI devices offline after Linux issues a reset as part of the error recovery process. Taking a device offline causes all I/O to the device to fail until the HBA driver is reloaded. After the error recovery process does a reset, it sends a SCSI Test Unit Ready command to check if the SCSI target is operational
again. If this command returns SCSI sense data, instead of correctly retrying the command, Linux will treat it as a fatal error, and immediately take the SCSI device offline.

The Test Unit Ready will almost always be returned with sense data because most targets return a deferred error in the sense data of the first command received after a reset. This is a way of telling the initiator that a reset has occurred. Therefore, the affected Linux kernel versions almost always take a SCSI device offline after a reset occurs.
This bug is fixed in Linux kernels 2.4.3 and later. The Linux iSCSI driver now attempts to avoid this problem by replacing the usual error recovery handler for SCSI commands that timeout or fail.
- Linux kernels 2.2.16 through 2.2.21 and 2.4.0 through 2.4.20 appear to have problems when SCSI commands to disk devices are completed with a check condition/unit attention containing deferred sense data. This can result in applications receiving I/O errors, short reads or short writes. The Linux SCSI code may deal with the error by giving up reading or writing the first buffer head of a command, and retrying the remainder of the I/O.
The Linux iSCSI driver attempts to avoid this problem by translating deferred sense data to current sense data for commands sent to disk devices.
- Linux kernels 2.2.16 through 2.2.21 and 2.4.0 through 2.4.20 may crash on a NULL pointer if a SCSI device is taken offline while one of the Linux kernel's I/O daemons (e.g. kpiod, kflushd, etc.) is trying to do I/O to the SCSI device. The exact cause of this problem is still being investigated.
Note that some of the other bugs in the Linux kernel's error recovery handling may result in a SCSI device being taken offline, thus triggering this bug and resulting in a Linux kernel crash.
- Linux kernels 2.2.16 through 2.2.21 running on uniprocessors may hang if a SCSI disk device node is opened while the Linux SCSI device structure for that node is still being initialized.
This occurs because the sd driver which controls SCSI disks will loop forever waiting for a device busy flag to be cleared at a certain point in the open routine for the disk device. Since this particular loop will never yield control of the processor, the process initializing the SCSI disk device is not allowed to run, and the initialization process can never clear the device busy flag which the sd driver is constantly checking.
A similar problem exists in the SCSI generic driver in some 2.4 kernel versions. The sg driver may crash on a bad pointer if a /dev/sg* device is opened while it is being
initialized.
- Linux kernels prior to 2.4.20-8 (Redhat 9 distribution) had a problem of a rare occurrence of data corruption. This data can be buffer cache data as well as raw I/O data. This problem occurs when iSCSI driver sends the I/O request down to TCP. Linux iSCSI driver handles this problem by copying the incoming I/O buffer temporarily in an internal buffer and then sending the copied data down to TCP. This way the iSCSI driver keeps the original data intact. In case, this sent data gets corrupted (this gets detected by turning on CRC), the driver repeats the foregoing process.
The iSCSI Driver Version 3.2.1 for Linux is compatible with SN 5400 Series Storage Routers running software version 3.x or greater. It is not compatible with SN 5400 Series Storage Routers running software versions 1.x or 2.x.
===============================================================================
CONFIGURING AND USING THE DRIVER
===============================================================================
This section describes a number of topics related to configuring and using the iSCSI Driver for Linux. The topics covered include:
Starting and Stopping the iSCSI driver
Rebooting Linux
Device Names
Auto-Mounting Filesystems
Log Messages
Dynamic Driver Reconfiguration
Target Portal Failover
iSCSI HBA Status
Using Multipath I/O Software
Making Storage Configuration Changes
Target and LUN Discovery Limits
Dynamic Target And LUN Discovery
Persistent Target Binding
Target Authentication
Editing The iscsi.conf File
iSCSI Commands and Utilities
Driver File Listing
--------------------------------------
STARTING AND STOPPING THE iSCSI DRIVER
--------------------------------------
To manually start the iSCSI driver enter:
/etc/init.d/iscsi start
The iSCSI initialization will report information on each detected
device to the console or in dmesg(8) output. For example:

********************************************************************
Vendor: SEAGATE Model: ST39103FC Rev: 0002
Type: Direct-Access ANSI SCSI revision: 02
Detected scsi disk sda at scsi0, channel 0, id 0, lun 0
SCSI device sda: hdwr sector= 512 bytes.
Sectors= 17783240 [8683 MB] [8.7 GB]
sda: sda1
********************************************************************
The directory /proc/scsi/iscsi will contain a file (the controller
number) that contains information about the iSCSI devices.

To see the iscsi devices currently available on this system, use the utility:
/usr/local/sbin/iscsi-ls -l
If there are problems loading the iSCSI kernel module, diagnostic information will be placed in /var/log/iscsi.log.
To manually stop the iSCSI driver enter:
/etc/init.d/iscsi stop
When the driver is stopped, the init.d script will attempt to kill all processes using iSCSI devices by first sending them "SIGTERM" and then by sending any remaining processes "SIGKILL". The init.d script will then unmount all iSCSI devices in /etc/fstab.iscsi and kill the iSCSI daemon terminating all connections to iSCSI devices. It is important to note that the init.d script may not be able to successfully unmount filesystems if they are in use by processes that can't be killed. It is recommended that the you manually stop all applications using the filesystem on iSCSI devices before stopping the driver. Filesystems not listed in /etc/fstab.iscsi will not be unmounted by the script and should be manually unmounted prior to a system shutdown.
It is very important to unmount all filesystems on iSCSI devices before stopping the iSCSI driver. If the iSCSI driver is stopped while iSCSI devices are mounted, buffered writes may not be committed to disk and file system corruption may occur.
---------------
REBOOTING LINUX
---------------
The Linux "reboot" command should not be used to reboot the system while iSCSI devices are mounted or being used since the "reboot" command will not execute the iSCSI shutdown script in /etc/rc6.d/ and file system corruption may occur. To safely reboot a Linux system, enter the
following command:
/sbin/shutdown -r now
All iSCSI devices should be unmounted prior to a system shutdown or reboot.
------------
DEVICE NAMES
------------
Because Linux assigns SCSI device nodes dynamically whenever a SCSI logical unit is detected, the mapping from device nodes (e.g., /dev/sda or /dev/sdb) to iSCSI targets and logical units may vary.
Variations in process scheduling and network delay may result in iSCSI targets being mapped to different SCSI device nodes every time the driver is started. Because of this variability, configuring applications or operating system utilities to use the standard SCSI device nodes to access iSCSI devices may result in SCSI commands being sent to the wrong target or logical unit.
To provide a more reliable namespace, the iSCSI driver scans the system to determine the mapping from SCSI device nodes to iSCSI targets, and then creates a tree of directories and symbolic links under /dev/iscsi to make it easier to use a particular iSCSI target's logical units.
Under /dev/iscsi, there will be a directory tree containing subdirectories for each iSCSI bus number, each target id number on the bus, and each logical unit number for each target. For
example, the whole disk device for bus 0, target id 0, LUN 0 would be /dev/iscsi/bus0/target0/lun0/disk.
In each logical unit directory there will be a symbolic link for each SCSI device node that may be connected to that particular logical unit. These symbolic links are modeled after the Linux
devfs naming convention.
The symbolic link 'disk' will map to the whole-disk SCSI device node
(e.g., /dev/sda, /dev/sdb, etc.).
The symbolic links 'part1' through 'part15' will map to each
partition of that SCSI disk (e.g., /dev/sda1, dev/sda15, etc.).
Note that these links will exist regardless of the number of disk partitions. Opening the partition devices will result in an error if the partition does not actually exist on the disk.
The symbolic link 'mt' will map to the auto-rewind SCSI tape device node for this LUN (e.g., /dev/st0), if any. Additional links for 'mtl', 'mtm', and 'mta' will map to the other auto-rewind devices (e.g., /dev/st0l, /dev/st0m, /dev/st0a), regardless of whether these
device nodes actually exist or could be opened. The symbolic link 'mtn' will map to the no-rewind SCSI tape device node for this LUN (e.g., /dev/nst0), if any. Additional links for 'mtln', 'mtmn', and 'mtan' will map to the other no-rewind devices (e.g., /dev/nst0l, /dev/nst0m, /dev/nst0a), regardless of whether those device nodes actually exist or could be opened. The symbolic link 'cd' will map to the SCSI cdrom device node for this LUN (e.g., /dev/scd0), if any.
The symbolic link 'generic' will map to the SCSI generic device
node for this LUN (e.g., /dev/sg0), if any.
Because the symlink creation process must open all of the SCSI
device nodes in /dev in order to determine which nodes map to
iSCSI devices, you may see many modprobe messages logged to syslog
indicating that modprobe could not find a driver for a particular
combination of major and minor numbers. This is harmless, and can
be ignored. The messages occur when Linux is unable to find a
driver to associate with a SCSI device node that the iSCSI daemon
is opening as part of it's symlink creation process. To prevent
these messages, the SCSI device nodes with no associated high-level
SCSI driver can be removed.
-------------------------
AUTO-MOUNTING FILESYSTEMS
-------------------------
Filesystems installed on iSCSI devices cannot be automatically mounted at
system reboot due to the fact that the IP network is not yet configured at
mount time. However, the driver provides a method to auto-mount these
filesystems as soon as the iSCSI devices become available (i.e., after the IP
network is configured).
To auto-mount a filesystem installed on an iSCSI device, follow these steps:
1. List the iSCSI partitions to be automatically mounted in
/etc/fstab.iscsi which has the same format as /etc/fstab. The
/etc/fstab.iscsi file will not be overwritten when the driver is
installed nor will removing the current version of the driver delete
/etc/fstab.iscsi. It is left untouched during an install.
2. For each filesystem on each iscsi device(s), enter the logical volume on
which the filesystem resides. The mount points must exist for the
filesystems to be mounted. For example, the following /etc/fstab.iscsi
entries will mount the two iSCSI devices specified (sda and sdb):
*************************************************************************
#device mount FS mount backup fsck
#to mount point type options frequency pass
/dev/sda /mnt/t0 ext2 defaults 0 0
/dev/sdb /mnt/t1 ext2 defaults 0 0
*************************************************************************
3. Upon a system restart, the iSCSI startup script invokes the
iscsi-mountall script will try to mount iSCSI devices listed in
/etc/fstab.iscsi file. iscsi-mountall tries to mount the iSCSI devices
for "NUM_RETRIES" (default value 10) number of times, at an interval of
"SLEEP_INTERVAL" seconds (default value 1) between each attempt, giving
the driver the time to establish a connection with an iSCSI target.
The value of these parameters can be changed in the iscsi-mountall script
if the devices are not getting configured in the system within the
default time periods.
Due to variable network delays, targets may not always become available in the
same order from one boot to the next. Thus, the order in which iSCSI devices
are mounted may vary and may not match the order the devices are listed in
/etc/fstab.iscsi You should not assume mounts of iSCSI devices will occur in
any particular order.
Because of the variability of the mapping between SCSI device nodes
and iSCSI targets, instead of directly mounting SCSI device nodes,
it is recommended to either mount the /dev/iscsi tree symlinks,
mount filesystem UUIDs or labels (see man pages for mke2fs, mount,
and fstab), or use logical volume management (see Linux LVM) to
avoid mounting the wrong device due to device name changes resulting
from iSCSI target configuration changes or network delays.
------------
LOG MESSAGES
------------
The iSCSI driver contains components in the kernel and user level.
The log messages from these components are sent to syslog. Based on the
syslogd configuration on the Linux host, the messages will be sent to the
appropriate destination. For example, if /etc/syslog.conf has the following
entry:

*.info /var/log/messages
then all log messages of level 'info' or higher will be sent to
/var/log/messages.

If /etc/syslog.conf has the following entry:
*.info;kern.none /var/log/messages
then all log messages (except kernel messages) of level info or higher
will be sent to /var/log/messages.
If /etc/syslog.conf has the following entry:
kern.* /dev/console
then all kernel messages will be sent to the console.
All messages from the iSCSI driver when loading the iSCSI kernel
module will be placed in /var/log/iscsi.log.
The user can also use dmesg(8) to view the log messages.
------------------------------
DYNAMIC DRIVER RECONFIGURATION
------------------------------
Configuration changes can be made to the iSCSI driver without having to stop
it or reboot the host system. To dynamically change the configuration of the
driver, follow the steps below:
1. Edit /etc/iscsi.conf with the desired configuration changes.
2. Enter the following command:
/etc/init.d/iscsi reload
This will cause the iSCSI daemon to re-read /etc/iscsi.conf file and to
create any new DiscoveryAddress connections it finds. Those discovery
sessions will then discover targets and create new target connections.
Note that any configuration changes will not affect existing target sessions.
For example, removal of a DiscoveryAddress entry from /etc/iscsi.conf
will not cause the removal of sessions to targets discovered through this
DiscoveryAddress, but it will cause the removal of the discovery session
corresponding to the deleted DiscoveryAddress.
----------------------
TARGET PORTAL FAILOVER
----------------------
Some SN 5400 Series Storage Routers have multiple Gigabit Ethernet ports.
Those systems may be configured to allow iSCSI target access via multiple
paths. When the iSCSI driver discovers targets through a multi-port SN 5400
Series system, it also discovers all the IP addresses that can be used to
reach each of those targets.
When an existing target connection fails, the iSCSI driver will attempt to
connect to that target using the next available IP address. You can also
choose a preferred portal to which the iSCSI driver should attempt to connect
to when the iSCSI driver is started or whenever automatic portal failover
occurs. This is significant in a situation when you want the connection
to the targets to be made through a faster network portal (for example, when
the I/Os are going through a Gigabit Ethernet interface and you do not
prefer the connection to failover to a slower network interface).
The preference for portal failover can be specified through the
"PreferredPortal" or "PreferredSubnet" parameter in /etc/iscsi.conf.
If this preference is set, then on any subsequent failover the driver will
first try to failover to the preferred portal or preferred subnet whichever
is specified in the conf file. If both preferred portal and preferred subnet
entries are present in the conf file then the preferred portal takes
precedence. If the preferred portal or preferred subnet is unreachable,
then the driver will continuously rotate through the list of available
portals until it finds one that is active.
The Portal Failover feature is turned on by default and the whole process of
failover occurs automatically. You can chose to turn off portal failover
by disabling the portal failover parameter in /etc/iscsi.conf.
If a target advertises more than one network portal, you can manually
switch portals by writing to the HBA's special file in /proc/scsi/iscsi/.
For example, if a target advertises two network portals:
10.77.13.248:3260 and 192.168.250.248:3260.
If the device is configured with targetId as 0, busId as 0, HBA's host
number is 3 and you want to switch the target from
10.77.13.248 to 192.169.250.248, use the following command:
echo "target 0 0 address 192.168.250.248" > /proc/scsi/iscsi/3
Where the syntax is:
echo "target address " >
/proc/scsi/iscsi/
The host system must have multiple network interfaces to effectively
utilize this failover feature.
----------------
iSCSI HBA STATUS
----------------
The directory /proc/scsi/iscsi will contain a special file that can be
used to get status from your iSCSI HBA. The name of the file will
be the iSCSI HBA's host number, which is assigned to the driver
by Linux.

When the file is read, it will show the driver's version number,
followed by a list all iSCSI targets and LUNs the driver has found
and can use.
Each line will show the iSCSI bus number, target id number, and
logical unit number, as well as the IP address, TCP port, and
iSCSI TargetName. If an iSCSI session exists, but no LUNs have
yet been found for a target, the LUN number field will contain a
question mark. If a TCP connection is not currently established,
the IP address and port number will both appear as question marks.
----------------------------
USING MULTIPATH I/O SOFTWARE
----------------------------
If a third-party multipath I/O software application is being used in
conjunction with the iSCSI driver (e.g., HP Secure Path), it may be
necessary to modify the configuration of the driver to allow the
multi-pathing software to operate more efficiently. If you are using
a multipath I/O application, you may need to set the "ConnFailTimeout"
parameter of the iSCSI driver to a smaller value so that SCSI commands
will fail more quickly when an iSCSI network connection drops allowing
the multipath application to try a different path to for access to the
storage device. Also, you may need to set the "MaxDiskCommandTimeout"
to a smaller value (e.g., 5 or 10 seconds), so that SCSI commands to
unreachable or unresponsive devices will fail more quickly and the
multipath software will know to try a different path to the storage device.
Multipath support in the iSCSI driver can be turned on by setting
Multipath=<"yes" or "portal" or "portalgroup"> in /etc/iscsi.conf.
If Multipath=<"yes" or "portal">, then the discovered targets that
are configured to allow access via multiple paths will have a separate
iSCSI session created for each path (i.e., iSCSI portal). The target
portal failover feature should not be used if Multipath=<"yes" or "portal">
since multiple sessions will be established with all available paths.
------------------------------------
MAKING STORAGE CONFIGURATION CHANGES
------------------------------------
Making changes to your storage configuration, including adding or
removing targets or LUNs, remapping targets, or modifying target
access, may change how the devices are presented to the host operating
system. This may require corresponding changes in the iSCSI driver
configuration and /etc/vfstab file.
It is important to understand the ramifications of SCSI routing
service configuration changes on the hosts accessing the associated
storage devices. For example, changing the instance configuration
may change the device presentation to the host's iSCSI driver,
effectively changing the name or number assigned to the device
by the host operating system. Certain configuration changes,
such as adding or deleting targets, adding or deleting LUNs
within a particular target, or adding or deleting entire instances
may change the order of the devices presented to the host.
Even if the host is only associated with one SCSI routing
service instance, the device order could make a difference.
Typically, the host operating system assigns drive identifications
in the order they are received based on certain criteria. Changing
the order of the storage device discovery may result in a changed
drive identification. Applications running on the host may require
modifications to appropriately access the current drives.
If an entire SCSI routing service instance is removed, or there
are no targets available for the host, the host's iSCSI driver
configuration file must be updated to remove the appropriate
reference before restarting the iSCSI driver. If a host's iSCSI
configuration file contains an IP address of a SCSI routing
service instance that does not exist, or has no targets available
for the host, the iSCSI driver will not complete a login and
will keep on trying to discover targets associated with this SCSI
routing service instance.
In general, the following steps are normally required when reconfiguring
iSCSI storage:
1. Unmount any filesystems and stop any applications using iSCSI
devices.
2. Stop the iSCSI driver by entering:
/etc/init.d/iscsi stop
3. Make the appropriate changes to the iSCSI driver
configuration file. Remove any references to iSCSI
DiscoveryAddresses that have been removed, or that
no longer have valid targets for this host.
4. Modify /etc/fstab.iscsi and application configurations as
appropriate.
5. Restart the iSCSI driver by entering:
/etc/init.d/iscsi start
Failure to appropriately update the iSCSI configuration using
the above procedure may result in a situation that prevents
the host from accessing iSCSI storage resources.
-------------------------------
TARGET AND LUN DISCOVERY LIMITS
-------------------------------
The bus ID and target ID are assigned by the iSCSI initiator driver
whereas the lun ID is assigned by the iSCSI target. The driver provides
access to a maximum of 256 bus IDs with each bus supporting 256 targets
and each target capable of supporting 256 LUNs. Any discovered iSCSI
device will be allocated the next available target ID on bus 0.
If a target ID > 256 on bus 0, then a next available target ID on bus 1
will be allocated. If a bus ID > 256 and LUN ID > 256 it will be ignored
by the driver and will not be configured in the system.
--------------------------------
DYNAMIC TARGET AND LUN DISCOVERY
--------------------------------
When using iSCSI targets that support long-lived iSCSI discovery sessions,
such as the Cisco 5400 Series, the driver will keep a discovery session
open waiting for change notifications from the target. When a notification
is received, the driver will rediscover targets, add any new targets, and
activate LUNs on all targets.
If a new LUN is dynamically added to an existing target on a SCSI routing
instance with which the driver has established a connection, then the driver
does not automatically activate the new LUN. The user can manually activate
the new LUN by executing the following command:
echo "scsi add-single-device " >
/proc/scsi/scsi
where;
HBA#: is the controller number present under /proc/scsi/iscsi/
bus-id: is the bus number present on controller .
target-id: is the target ID present on ,.
LUN: new LUN added dynamically to the target.

-------------------------
PERSISTENT TARGET BINDING
-------------------------
This feature ensures that the same iSCSI bus and target id number are used
for every iSCSI session to a particular iSCSI TargetName, and a Linux SCSI
target always maps to the same physical storage device from one reboot to
the next.
This feature ensures that the SCSI numbers in the device symlinks described
above will always map to the same iSCSI target.
Note that because of the way Linux dynamically allocates SCSI device nodes
as SCSI devices are found, the driver does not and cannot ensure that any
particular SCSI device node (e.g., /dev/sda) will always map to the same
iSCSI TargetName. The symlinks described in the section on Device Names are
intended to provide a persistent device mapping for use by applications and
fstab files, and should be used instead of direct references to particular
SCSI device nodes.
The file /etc/iscsi.bindings is used by the iSCSI daemon to store bindings of
iSCSI target names to SCSI target ID's. If the file doesn't exist,
it will be created when the driver is started. If an entry exists for a
discovered target, the Linux target ID from the entry is assigned to the
target. If no entry exists for a discovered target, an entry is written to
the file. Each line of the file contains the following fields:
BusId TargetId TargetName
An example file would be:
*****************************************************************************
0 0 iqn.1987-05.com.cisco.00.7e9d6f942e45736be69cb65c4c22e54c.disk_one
0 1 iqn.1987-05.com.cisco.00.4d678bd82965df7765c788f3199ac15f.disk_two
0 2 iqn.1987-05.com.cisco.00.789ac4483ac9114bc6583b1c8a332d1e.disk_three
*****************************************************************************
Note that the /etc/iscsi.bindings file will permanently contain entries
for all iSCSI targets ever logged into from this host. If a target is
no longer available to a host you can manually edit the file and remove
entries so the obsolete target no longer consumes a SCSI target ID.
If you know the iSCSI target name of a target in advance, and you want
it to be assigned a particular SCSI target ID, you can add an entry
manually. You should stop the iSCSI driver before editing the
/etc/iscsi.bindings file. Be careful to keep an entire entry on a single
line, with only whitespace characters between the three fields. Do not
use a target ID number that already exists in the file.
*****************************************************************************
NOTE: iSCSI driver versions prior to 3.2 used the file /var/iscsi/bindings
instead of /etc/iscsi.bindings. The first time you start the new driver
version, it will change the location and the name of the bindings file
to /etc/iscsi.bindings
*****************************************************************************
---------------------
TARGET AUTHENTICATION
---------------------
The CHAP authentication mechanism provides for two way authentication between
the target and the initiator. The authentication feature on the SN 5400
system has to be enabled to make use of this feature. The username and
password for both initiator side and target side authentication needs to be
listed in /etc/iscsi.conf. The username and password can be specified as
global values or can be made specific to the target address. Please refer to
the Editing The iscsi.conf File section of this document for a more detailed
description of these parameters.
---------------------------
EDITING THE ISCSI.CONF FILE
---------------------------
The /etc/iscsi.conf file is used to control the operation of the iSCSI driver
by allowing the user to configure the values for a number of programmable
parameters. These parameters can be setup to apply to specific configuration
types or they can be setup to apply globally. The configuration types that are
supported are:
- DiscoveryAddress = SCSI routing instance IP address with format a.d.c.d or
a.b.c.d:n or hostname.
- TargetName = Target name in 'iqn' or 'eui' format
eg: TargetName = iqn.1987-05.com.cisco:00.0d1d898e8d66.t0
- TargetIPAddress = Target name with format a.b.c.d/n
- Subnet = Network portal IP address with format a.b.c.d/n or a.b.c.d&hex
- Address = Network portal IP address with format a.b.c.d/32
The complete list of parameters that can be applied either globally or to the
configuration types listed above are shown below. Not all parameters are
applicable to all configuration types.
- Username = CHAP username used for initiator authentication by the target.
- OutgoingUsername = <>
- Password = CHAP password used for initiator authentication by the target.
- OutgoingPassword = <>
- IncomingUsername = CHAP username for target authentication by the initiato
r.
- IncomingPassword = CHAP password for target authentication by the initiato
r.
- HeaderDigest = Type of header digest support the initiator is requesting
of the target.
- DataDigest = Type of data digest support the initiator is requesting of
the target.
- PortalFailover = Enabling/disabling of target portal failover feature.
- PreferredSubnet = IP address of the subnet that should be used for a
portal failover.
- PreferredPortal = IP address of the portal that should be used for a
portal failover.
- Multipath = Enabling/disabling of multipathing feature.
- LoginTimeout = Time interval to wait for a response to a login request to
be received from a target before failing a connection
attempt.
- AuthTimeout = Time interval to wait for a response to a login request
containing authentication information to be received from a
target before failing a connection attempt.
- IdleTimeout = Time interval to wait on a connection with no traffic before
sending out a ping.
- PingTimeout = Time interval to wait for a ping response after a ping is
sent before failing a connection.
- ConnFailTimeout = Time interval to wait before failing SCSI commands back
to an application for unsuccessful commands.
- AbortTimeout = Time interval to wait for a abort command to complete
before declaring the abort command failed.
- ResetTiemout = Time interval to wait for a reset command to complete
before declaring the reset command failed.
- InitialR2T = Enabling/disabling of R2T flow control with the target.
- MaxRecvDataSegmentLength = Maximum number of bytes that the initiator can
receive in an iSCSI PDU.
- FirstBurstLength = Maximum number of bytes of unsolicited data the
initiator is allowed to send.
- MaxBurstLength = Maximum number of bytes for the SCSI payload negotiated
by initiator.
- TCPWindowSize = Maximum number of bytes that can be sent over a TCP
connection by the initiator before receiving an
acknowledgement from the target.
- Continuous = Enabling/disabling the discovery session to be kept alive.
A detailed description for each of these parameters is included in both the
man page and the included sample iscsi.conf file. Please consult these sources
for examples and more detailed programming instructions.
----------------------------
iSCSI COMMANDS AND UTILITIES
----------------------------
This section gives a description of all the commands and utilities available
with the iSCSI driver.
- "iscsi-ls" lists information about the iSCSI devices available to the
driver. Please refer to the man page for more information.
-------------------

Mount EMC BCVs at the same host

Example:
I have created a volumegroup, a logical volume, afilesystem and a file on two EMC standard volumes.(For this test you need to have two hdisks hdisk and hdisk andtwo BCVs dev and available)
# mkvg -f -y MyName_vg -s 16 hdisk hdisk
# mklv -y MyName_lv -b n MyName_vg 20
# crfs -v jfs -d MyName_lv -m /MyName_mp -A yes -p rw
# mount /MyName_mp
# lptest > /MyName_mp/lptest.out

For using EMCs TimeFinder I have to create a device group.(AIX is working with volumegroups. EMCs TimeFinder is working withdiskgroups.)With the following command the AIX volumegroup MyName_vg is convertedto the diskgroup MyName_dg)

# symvg vg2dg MyName_vg MyName_dg -dgtype RDF1

For to use TimeFinder I have to associate two BCVs to this devicegroup

# symbcv -g MyName_dg associate dev
# symbcv -g MyName_dg associate dev

Now I have to set the BCVs to the defined-state

# rmbcv -a

Using the establish I mirror all data from the original hdisks to the BCVs (including the PVIDs!)

# symmir -g MyName_dg establish -full -exac

I have to wait until the establish is done

# symmir -g MyName_dg -i 10

Query When the establish is done, I have to unmount my filesystem andvaryoff the volumegroup
# umount /MyName_mp
# varyoffvg MyName_vg

Now I am in the right state to split the BCV copies
# symmir -g MyName_dg split -noprompt

When the split is done, I can varyon my volumegroup and mount myFilesystem

# symmir -g MyName_dg -i 10 query
# varyonvg MyName_vg
# mount /MyName_mp

I configure the BCVs

# mkbcv -a

Now I am able to create a new volumegroup from the BCVs
# recreatevg -y MyName_bcv_vg -Y test -L /bcv hdisk hdisk
# lsvg -l MyName_bcv_vgMyName_bcv_vg:LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINTtestMyName_lv jfs 20 20 1 closed/syncd /bcv/MyName_mptestloglv00 jfslog 1 1 1 closed/syncd N/A

# lspv grep -v None
hdisk0 000039386adb2317 rootvg
hdisk 00003938874658c8 MyName_vg
hdisk 0000393887468473 MyName_vg
hdisk 000039388794adb8 MyName_bcv_vg
hdisk 000039388794b7f5 MyName_bcv_vg

LUN Management

Posted by Diwakar ADD COMMENTS

LUN Basics

Simply stated, a LUN is a logical entity that converts raw physical disk space into logical storage space that a host server's operating system can access and use. Any computer user recognizes the logical drive letter that has been carved out of their disk drive. For example, a computer may boot from the C: drive and access file data from a different D: drive. LUNs do the same basic job. "LUNs differentiate between different chunks of disk space. "A LUN is part of the address of the storage that you're presenting to a [host] server."

LUNs are created as a fundamental part of the storage provisioning process using software tools that typically accompany the particular storage platform. However, there is not a 1-to-1 ratio between drives and LUNs. Numerous LUNs can easily be carved out of a single disk drive. For example, a 500 GB drive can be partitioned into one 200 GB LUN and one 300 GB LUN, which would appear as two unique drives to the host server. Conversely, storage administrators can employ Logical Volume Manager software to combine multiple LUNs into a larger volume. Veritas Volume Manager from Symantec Corp. is just one example of this software. In actual practice, disks are first gathered into a RAID group for larger capacity and redundancy (e.g., RAID-50), and then LUNs are carved from that RAID group.

LUNs are often referred to as logical "volumes," reflecting the traditional use of "drive volume letters," such as volume C: or volume F: on your computer. But some experts warn against mixing the two terms, noting that the term "volume" is often used to denote the large volume created when multiple LUNs are combined with volume manager software. In this context, a volume may actually involve numerous LUNs and can potentially confuse storage allocation. "The 'volume' is a piece of a volume group, and the volume group is composed of multiple LUNs,"
Once created, LUNs can also be shared between multiple servers. For example, a LUN might be shared between an active and standby server. If the active server fails, the standby server can immediately take over. However, it can be catastrophic for multiple servers to access the same LUN simultaneously without a means of coordinating changed blocks to ensure data integrity. Clustering software, such as a clustered volume manager, a clustered file system, a clustered application or a network file system using NFS or CIFS, is needed to coordinate data changes.

SAN zoning and masking

LUNs are the basic vehicle for delivering storage, but provisioning SAN storage isn't just a matter of creating LUNs or volumes; the SAN fabric itself must be configured so that disks and their LUNs are matched to the appropriate servers. Proper configuration helps to manage storage traffic and maintain SAN security by preventing any server from accessing any LUN.
Zoning makes it possible for devices within a Fibre Channel network to see each other. By limiting the visibility of end devices, servers (hosts) can only see and access storage devices that are placed into the same zone. In more practical terms, zoning allows certain servers to see one or more ports on a disk array. Bandwidth, and thus minimum service levels, can be reserved by dedicating certain ports to a zone or isolate incompatible ports from one another.
Consequently, zoning is an important element of SAN security and high-availability SAN design. Zoning can typically be broken down into hard and soft zoning. With hard zoning, each device is assigned to a zone, and that assignment can never change. In soft zoning, the device assignments can be changed by the network administrator.
LUN masking adds granularity to this concept. Just because you zone a server and disk together doesn't mean that the server should be able to see all of the LUNs on that disk. Once the SAN is zoned, LUNs are masked so that each host server can only see specific LUNs. For example, suppose that a disk has two LUNs, LUN_A and LUN_B. If we zoned two servers to that disk, both servers would see both LUNs. However, we can use LUN masking to allow one server to see only LUN_A and mask the other server to see only LUN_B. Port-based LUN masking is granular to the storage array port, so any disks on a given port will be accessible to any servers on that port. Server-based LUN masking is a bit more granular where a server will see only the LUNs assigned to it, regardless of the other disks or servers connected.

LUN scaling and performance
LUNs are based on disks, so LUN performance and reliability will vary for the same reasons. For example, a LUN carved from a Fibre Channel 15K rpm disk will perform far better than a LUN of the same size taken from a 7,200 rpm SATA disk. This is also true of LUNs based on RAID arrays where the mirroring of a RAID-0 group may offer significantly different performance than the parity protection of a RAID-5 or RAID-6/dual parity (DP) group. Proper RAID group configuration will have a profound impact on LUN performance.
An organization may utilize hundreds or even thousands of LUNs, so the choice of storage resources has important implications for the storage administrator. Not only is it necessary to supply an application with adequate capacity (in gigabytes), but the LUN must also be drawn from disk storage with suitable characteristics. "We go through a qualification process to understand the requirements of the application that will be using the LUNs for performance, availability and cost," For example, a LUN for a mission-critical database application might be taken from a RAID-0 group using Tier-1 storage, while a LUN slated for a virtual tape library (VTL) or archive application would probably work with a RAID-6 group using Tier-2 or Tier-3 storage.

LUN management tools
A large enterprise array may host more than 10,000 LUNs, so software tools are absolutely vital for efficient LUN creation, manipulation and reporting. Fortunately, management tools are readily available, and almost every storage vendor provides some type of management software to accompany products ranging from direct-attached storage (DAS) devices to large enterprise arrays.
Administrators can typically opt for vendor-specific or heterogeneous tools. A data center with one storage array or a single-vendor shop would probably do well with the indigenous LUN management tool that accompanied their storage system. Multivendor shops should at least consider heterogeneous tools that allow LUN management across all of the storage platforms. Mack uses EMC ControlCenter for LUN masking and mapping, which is just one of several different heterogeneous tools available in the marketplace. While good heterogeneous tools are available, he advises caution when selecting a multiplatform tool. "Sometimes, if the tool is written by a particular vendor, it will manage 'their' LUNs the best," he says. "LUNs from the other vendors can take the back seat -- the management may not be as well integrated."
In addition to vendor support, a LUN management tool should support the entire storage provisioning process. Features should include mapping to specific array ports and masking specific host bus adapters (HBA), along with comprehensive reporting. The LUN management tool should also be able to reclaim storage that is no longer needed. Although a few LUN management products support autonomous provisioning, experts see some reluctance toward automation. "It's hard to do capacity planning when you don't have any checks and balances over provisioning," Mack says, also noting that automation can circumvent strict change control processes in an IT organization.

LUNs at work

Significant storage growth means more LUNs, which must be created and managed efficiently while minimizing errors, reigning in costs and maintaining security. For Thomas Weisel Partners LLC, an investment firm based in San Francisco, storage demands have simply exploded to 80 terabytes (TB) today -- up from about 8 TB just two years ago. Storage continues to flood the organization's data center at about 2 TB to 3 TB each month depending on projects and priorities.
This aggressive growth pushed the company out of a Hitachi Data Systems (HDS) storage array and into a 3PARdata Inc. S400 system. LUN deployment starts by analyzing realistic space and performance requirements for an application. "Is it something that needs a lot of fast access, like a database or something that just needs a file share?" asks Kevin Fiore, director of engineering services at Thomas Weisel. Once requirements are evaluated, a change ticket is generated and a storage administrator provisions the resources from a RAID-5 or RAID-1 group depending on the application. Fiore emphasizes the importance of provisioning efficiency, noting that the S400's internal management tools can provision storage in just a few clicks.
Fiore also notes the importance of versatility in LUN management tools and the ability to move data. "Dynamic optimization allows me to move LUNs between disk sets," he says. Virtualization has also played an important role in LUN management. VMware has allowed Fiore to consolidate about 50 servers enterprise-wide along with the corresponding reduction in space, power and cooling. this lets the organization manage more storage with less hardware.
LUNs getting large
As organizations deal with spiraling storage volumes, experts suggest that efficiency enhancing features, such as automation, will become more important in future LUN management. Experts also note that virtualization and virtual environments will play a greater role in tomorrow's LUN management. For example, it's becoming more common to provision very large chunks of storage (500 GB to 1 TB or more) to virtual machines. "You might provision a few terabytes to a cluster of VMware servers, and then that storage will be provisioned out over time.

Very simply, RAID striping is a means of improving the performance of large storage systems. For most normal PCs or laptops, files are stored in their entirety on a single disk drive, so a file must be read from start to finish and passed to the host system. With large storage arrays, disks are often organized into RAID groups that can enhance performance and protect data against disk failures. Striping is actually RAID-0; a technique that breaks up a file and interleaves its contents across all of the disks in the RAID group. This allows multiple disks to access the contents of a file simultaneously. Instead of a single disk reading a file from start to finish, striping allows one disk to read the next stripe while the previous disk is passing its stripe data to the host system -- this enhances the overall disk system performance, which is very beneficial for busy storage arrays.

Parity can be added to protect the striped data. Parity data is calculated for the stripes and placed on another disk drive. If one of the disks in the RAID group fails, the parity data can be used to rebuild the failed disk. However, multiple simultaneous disk failures may result in data loss because conventional parity only accommodates a single disk failure.

RAID striping
The performance impact of RAID striping at the array and operating system level.
RAID striping or concatenation: Which has better performance?
Designing storage for performance is a very esoteric effort by nature. There are quite a few variables that need to be taken into account.
RAID-50: RAID-5 with suspenders
RAID-50 combines striping with distributed parity for higher reliability and data transfer capabilities.
RAID-53: RAID by any other name
RAID-53 has a higher transaction rate than RAID-3, and offers all the protection of RAID-10, but there are disadvantages as well.
RAID-10 and RAID-01: Same or different?
The difference between RAID-10 and RAID-01 is explained.
RAID explained
RAID, or redundant array of independent disks, can make many smaller disks appear as one large disk to a server for better performance and higher availability.

You have two fabrics running off of two switches. You'd like to make them one fabric. How to do that? For the most part, it's simply connecting the two switches via e_ports.

Before doing that, however, realize there's several factors that can prevent them from mergingg

  1. Incompatible operating parameters such as RA_TOV and ED_TOV
  2. Duplicate domain IDs.
  3. Incompatible zoning configurations
  4. No principal switch (priority set to 255 on all switches)
  5. No response from the switch (hello sent every 30 seconds)

To avoid the issues above:

  1. Check IPs on all Service Processors and switches; deconflict as necessary.
  2. Ensure that all switches have unique domain ids.
  3. Ensure that operating parameters are the same.
  4. Ensure there aren't any zoning conflicts in the fabric (port zones, etc).

Once that's done:

  1. Physically link the switches
  2. View the active zone set to ensure the merge happens.
  3. Save the active zone set
  4. Activate the new zone set.

EMC recommends no more than four connectrix switches per fabric based on the following formulae:

One Switch

-32 Total ports
- 4 ports reserved for card failure
28 ports remaining.
- (int(28/5)) No more than 4:1 ratio, hosts : fa
23 Possible host connections
-2 to support multi-pathing
-11 total host connections


Two Switch

- 64 Total ports
- 4 ports reserved for card failure
- 4 ports reserved for E_ports
- 56 ports remaining.
-(int(56/5)) No more than 4:1 ratio, hosts : fa
-45 Possible host connections
-/ 2 to support multi-pathing
22 host connections (gain of 11)

Three switches

- 96 total ports
- 4 ports reserved for card failure
- 12 ports reserved for E_ports
-80 ports remaining
- (int(80/5)) No more than 4:1 ratio, hosts : fa
-64 Possible host connections
-/ 2 to support multi-pathing
-32 host connections (gain of 10)

Four switches
-128 total ports
- 4 ports reserved for card failure
- 24 ports reserved for E_ports
-100 ports remaining
- (int(100/5)) No more than 4:1 ratio, hosts : fa
-80 Possible host connections
- / 2 to support multi-pathing
-40 host connections (gain of 8)


Putting in that fourth connectrix means that you gain only 8 host connections from a 32 port connectrix switch.

Kashya (EMC Acquired last year ) develops unique algorithmic technologies to enable an order of magnitude improvement in the reliability, cost, and performance of an enterprise’s data protection capabilities. Based on the Kashya Data Protection Appliance platform, Kashya’s powerful solutions deliver superior data protection at a fraction of the cost of existing solutions. Kashya’s Data Protection Appliance connects to the SAN and IP infrastructure and provides bi-directional replication across any distance for heterogeneous storage, SAN, and server environments.

The recent Storage industry challange is minimize downtime and how to keep business running 24 X 7 X 365. The data that drives today’s globally oriented businesses is stored on large networks of interconnected computers and data storage devices. This data must be 100% available and always accessible and up-to-date, even in the face of local or regional disasters. Moreover, these conditions must be met at a cost that is affordable, and without in any way hampering normal company operations.

To reduce the business risk of an unplanned event of this type, an enterprise must ensure that a copy of its business-critical data is stored at a secondary location. Synchronous replication, used so effectively to create perfect copies in local networks, performs poorly over longer distances.

Replication Method:

1) Synchronous – Every write transaction committed must be acknowledged from the
secondary site. This method enables efficient replication of data within the local
SAN environment.

2) Asynchronous – Every write transaction is acknowledged locally and then added to a
queue of writes waiting to be sent to the secondary site. With this method, some
data will normally be lost in the event of a disaster. This requires the same
bandwidth as a synchronous solution.

3) Snapshot –A consistent image of the storage subsystem is periodically transferred to the secondary site. Only the changes made since the previous snapshot must be transferred, resulting in significant savings in bandwidth. By definition, this solution produces a copy that is not up-to-date; however, increasing the frequency of the snapshots can reduce the extent of this lag.

4) Small-Aperture Snapshot – Kashya’s system offers the unique ability to take frequent snapshots, just seconds apart. This innovative feature is utilized to minimize the risk of data loss due to data corruption that typically follows rolling disasters.


Kashya’s advanced architecture can be summarized as follows:
 Positioning at the junction between the SAN and the IP infrastructure enables Kashya
solutions to:
 Deploy enterprise-class data protection non-disruptively and non-invasively
 Support heterogeneous server, SAN, and storage platforms
 Monitor SAN and WAN behavior on an ongoing basis, to maximize the data
protection process.
 Advanced algorithms, that:- Automatically manage the replication process, with strict adherence to userdefined policies that are tied to user-specified business objectives

About Me

My photo
Sr. Solutions Architect; Expertise: - Cloud Design & Architect - Data Center Consolidation - DC/Storage Virtualization - Technology Refresh - Data Migration - SAN Refresh - Data Center Architecture More info:- diwakar@emcstorageinfo.com
Blog Disclaimer: “The opinions expressed here are my personal opinions. Content published here is not read or approved in advance by EMC and does not necessarily reflect the views and opinions of EMC.”
EMC Storage Product Knowledge Sharing