Understanding the Noop IO Scheduler

Updated

Issue

  • How do I turn on noop scheduler for a device?
  • What are the tunables for noop scheduler and what do they do?
  • How does the logic within the scheduler work in choosing which IO to dispatch next?

Environment

  • Red Hat Enterprise Linux 4
  • Red Hat Enterprise Linux 5
  • Red Hat Enterprise Linux 6
  • Red Hat Enterprise Linux 7

Resolution

 
Enable noop io scheduler: device at a time
 

$ echo 'noop' > /sys/block/sda/queue/scheduler
$ cat           /sys/block/sda/queue/scheduler
[noop] anticipatory deadline cfq

 
Enable noop io scheduler: all devices/as default scheduler
 
You can also set noop as the default scheduler at boot time via adding "elevator=noop" to the end of the kernel line in /etc/grub.conf file. This sets noop scheduler for all devices.
 

title Red Hat Enterprise Linux Server (2.6.9-67.EL)
root (hd0,0)
kernel /vmlinuz-2.6.9-67.EL ro root=/dev/vg0/lv0 elevator=noop
initrd /initrd-2.6.9-67.EL.img
  **noop tunables**  
$ ls /sys/block/sda/queue/iosched

That's right, the directory of tunables is empty - there are no tunables for noop.

 
 

Overview

'Noop' is a good description of this scheduler -- it does no elevator sorting, or any other kind of re-ordering, of the IO. It will do some merging of back to back io. It is basically a First-in, First-out (FIFO) queue within the scheduler personality to hold io until the underlying device driver is ready for them. IO is queued in the order they arrive at the scheduler and are dispatched in the same order without re-ordering to the driver. Essentially, this scheduler looks like the following:

IO requests -> scheduler::[noop queue] -> Driver.

The noop scheduler is sometimes referred to as a pass-through scheduler because it simply passes the incoming IO through to the driver without altering it in any substantial way. IO arriving in A, B, C order are dispatched to the driver in the same A, B, C ordering.

The next IO to be dispatched to the driver will be the oldest IO within the scheduler, that is out of the noop scheduler's queue. If the underlying device driver's queue for the target device has room then the scheduler sends the incoming IO to the driver, otherwise the driver calls back to the scheduler upon completing an IO to get the next IO out of the noop scheduler's FIFO queue.

As an aside, when you remove all structure definitions, returns, routine-name(), lines only with curly braces, #include statements and the like from the source file for the noop scheduler you'll end up with a total of 12 lines of code within RHEL 5.... that's how minimally simple this scheduler is.

 
 

Noop Uses

The noop scheduler has uses in virtualized environments where the underlying bare metal operating system instance may be running an io scheduler also -- it doesn't make sense to reorder/merge at a virtual machine instance only to consume additional cpu resources doing the same thing at the bare metal layer.

With controllers that have a large amount of cache and deep lun queues, noop can be useful.

With some backplane raid controllers and san controller environments where very low IO latency is needed, using the noop scheduler will provide this low latency.  There is a cost however, usually in total bandwidth/througput.  Thats the typical tradeoff for low latency -- a lot more IO commands are sent to storage with much smaller average transfer sizes.  Since there is a finite number of commands per second that can be executed by such controllers, less work is done (less througput) per command.  Whereas, using a typical scheduler that works harder to merge more io then more data is moved per command resulting in higher throughput but longer latency times.  The additional latency is the cost of the scheduler holding on to the io long enough to try and mate it up and merge it with an adjacent io.

The noop scheduler is also used sometimes for testing and debug purposes to help isolate trouble spots or bottlenecks within the system.

Kernel Internal Data Structure(s)

While the io scheduler is under kabi control so that it doesn't change across minor versions of a release, the schedule personalities (elevator types) of deadline, noop, cfq, and the like are not under kabi control and can have their data structures change between minor releases. While this happens often with cfq, noop data structures are fairly stable.

Each /dev/sdN device has a gendisk structure. Each gendisk points to an associated request_queue structure:


genhd.h::
RHEL6
struct gendisk {
        /* major, first_minor and minors are input parameters only,
         * don't use directly.  Use disk_devt() and disk_max_parts().
         */
        int major;                      /* major number of driver */
        int first_minor;
        int minors;                     /* maximum number of minors, =1 for
                                         * disks that can't be partitioned. */

        char disk_name[DISK_NAME_LEN];  /* name of major driver */
        char *(*devnode)(struct gendisk *gd, mode_t *mode);
        /* Array of pointers to partitions indexed by partno.
         * Protected with matching bdev lock but stat and other
         * non-critical accesses use RCU.  Always access through
         * helpers.
         */
        struct disk_part_tbl *part_tbl;
        struct hd_struct part0;

        const struct block_device_operations *fops;
        struct request_queue *queue;
:

The request_queue is the main io scheduler data structure and points to an elevator_queue structure that contains elevator type information, like pointers to elevator functions for the type in use with this request_queue.


blkdev.h:: 
RHEL6
struct request_queue
{
        /*
         * Together with queue_head for cacheline sharing
         */
        struct list_head        queue_head;
        struct request          *last_merge;
        struct elevator_queue   *elevator;
        :

The elevator queue structure contains a pointer to list of required function pointers and other information. The list of functions is common across all elevator types, but the code behind those functions is specific to the current scheduler personality in use by the request queue (cfq, deadline, noop, and others). Additional information is included within the elevator_type structure including elevator name (string of "noop" for example), along with a pointer to an elevator private data structure called elevator_data.


elevator.h::
RHEL6
struct elevator_queue
{
        struct elevator_ops *ops;
        void *elevator_data;
        struct kobject kobj;
        struct elevator_type *elevator_type;
        struct mutex sysfs_lock;
        struct hlist_head *hash;
:
};

/*
 * identifies an elevator type, such as AS or deadline
 */
struct elevator_type
{
        struct list_head list;
        struct elevator_ops ops;
        struct elv_fs_entry *elevator_attrs;
        char elevator_name[ELV_NAME_MAX];         "noop"
        struct module *elevator_owner;
};

Up until this point, all the data structures referenced are under kabi control: they cannot change definition between minor versions only across major releases. However note that the elevator_data is a void type! The elevator_data points to a private data structure that is passed to the elevator functions, but is private (non-kabi) to the elevator type (scheduler personality). Becuase its private to the code of the elevator type, its free to change definition between minor releases.

We can see the elevator_queue structure (or its equivalent in RHEL 4 and 5) structure recieve the private elevator_data pointer within the appropriate elevator type initialization routine. For noop:


noop-iosched.c::
RHEL4
....none, no noop_data private structure exists in this release...
a check of the source code for assignments to elevator_data shows only
cfq, as (anticipatory), and deadline have assignments to elevator_data

RHEL5
static void *noop_init_queue(request_queue_t *q, elevator_t *e)
{
        struct noop_data *nd;

        nd = kmalloc(sizeof(*nd), GFP_KERNEL);
        if (!nd)
                return NULL;
        INIT_LIST_HEAD(&nd->queue);
        return nd;
}

RHEL6
static void *noop_init_queue(struct request_queue *q)
{
        struct noop_data *nd;

        nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node);
        if (!nd)
                return NULL;
        INIT_LIST_HEAD(&nd->queue);
        return nd;
}

RHEL7
static int noop_init_queue(struct request_queue *q, struct elevator_type *e)
{
        struct noop_data *nd;
        struct elevator_queue *eq;

        eq = elevator_alloc(q, e);
        if (!eq)
                return -ENOMEM;

        nd = kmalloc_node(sizeof(*nd), GFP_KERNEL, q->node);
        if (!nd) {
                kobject_put(&eq->kobj);
                return -ENOMEM;
        }
        eq->elevator_data = nd;

        INIT_LIST_HEAD(&nd->queue);

        spin_lock_irq(q->queue_lock);
        q->elevator = eq;
        spin_unlock_irq(q->queue_lock);
        return 0;
}

...In RHEL 5 and 6, the caller to noop_init_queue() recieves the noop_data structure address upon return and updates the elevator_queue structure.


RHEL6
int elevator_init(struct request_queue *q, char *name)
{
        struct elevator_queue *eq;
:
        data = elevator_init_queue(q, eq);  return eq->ops->elevator_init_fn(q)
:
        elevator_attach(q, eq, data);       eq->elevator_data = data
        return 0;
}

The good news for noop, is that the elevator private noop_data structure is fairly simple and as such is fairly stable and hasn't changed between RHEL 4, 5, or 6. Differences between the major releases are underlined:


noop-iosched.c::
RHEL4
...none...

RHEL5
struct noop_data {
        struct list_head queue;
};

RHEL6
struct noop_data {
        struct list_head queue;
};

RHEL7
struct noop_data {
        struct list_head queue;
};
Article Type