Dynamic Threading and Scheduling With Dynamic C

Dynamic C is a C-like programming language for Rabbit microprocessor-based products from ZWorld. Dynamic C includes language extensions for writing cooperatively multithreaded programs, but does not directly support dynamic creation and destruction of threads at runtime or user-defined thread scheduling algorithms. Instead, the programmer must statically allocate a separate CoData structure for each thread of execution that might be used at runtime; Dynamic C will run each of these threads sequentially. This page discusses some techniques for working around the limitations of Dynamic C's multithreading capabilities to accomplish dynamic thread management and dynamic scheduling.

Dynamic Scheduling

/****
 * Example 1: A Traditional DynC Cooperatively Multithreaded Program
 ****/
main(void) {
    /* CoData structures for two named tasks */
    CoData t1, t2;

    while(1) {
        /* Keyboard handler */
        costate {
            while(1) {
                waitfor(kbhit());
                switch(getchar()) {
                    case '-':
                        /* Stop one of the currently running tasks */
                        if      (isCoRunning(&t1)) CoPause(&t1);
                        else if (isCoRunning(&t2)) CoPause(&t2);
                        else    printf("No tasks running.\n");
                        break;

                    case '+':
                        /* Start one of the currently stopped tasks */
                        if      (!isCoRunning(&t1)) CoResume(&t1);
                        else if (!isCoRunning(&t2)) CoResume(&t2);
                        else    printf("Both tasks running.\n");
                        break;
                }
            }
        }

        /* Task 1 (initially on) */
        costate t1 init_on {
            while (1) {
                /* Task 1 work goes here. */
                printf("Task 1\n");
                yield;
            }
        }

        /* Task 2 (initially off) */
        costate t2 {
            while (1) {
                /* Task 2 work goes here. */
                printf("Task 2\n");
                yield;
            }
        }
    }
}

The code above illustrates a typical program that uses costatements for cooperative multithreading. A separate costate block is written for each task that should run concurrently; in the example above there are three such tasks: two that perform actual "work" by printing "Task 1" or "Task 2" to stdout and a third task which waits for system events and turns the other two on or off when appropriate.

The above code has the advantage of being easy to read and understand; the structure of the program makes it immediately obvious that there are three tasks that may run concurrently and it is easy to figure out what code is executed by each task. For many applications, code similar to that above is sufficient to handle the multitasking needs of the application.

One disadvantage of code written in the above style is that it is difficult to change the order in which the threads run, especially when there there are a large number of tasks. A common cooperative multithreading language feature is the "named yield" which allows a thread to yield and specify which thread should run next. Named yields are especially useful when:

Since DynamicC does not support named yields directly, programs such as the example above would have to simulate a named yield by performing a CoPause() on each intervening costatement. For example:

while(1) {
    costate foo always_on {
        printf("Foo");
        if (problem_detected()) {
            CoPause(bar);
            CoPause(baz);
            yield;
        } else {
            printf("No problem detected.\n");
        }
    }

    costate bar always_on {
        printf("Bar");
    }

    costate baz always_on {
        printf("Baz");
    }

    costate recovery_thread always_on {
        fix_problem();
        CoResume(bar);
        CoResume(baz);
    }
}

For programs with lots of threads or a large number of named yields, it grows increasingly difficult to keep track of which costatements are paused, which are running, and which ones need to be toggled at any given point in the program. It is also important to note that even when a costatement is disabled with CoPause(), there is still some overhead associated with checking the STOPPED and INIT flags of its CoData structure -- i.e. the time required to enter the desired thread is still proportional to the total number of threads in the system.

Dynamic C also allows a costatement to be associated with a pointer to a CoData structure. Although the ZWorld examples only show this is a way of simplifying the creation of multiple identical tasks (see Samples/Costate/Costate2.c in the Dynamic C distribution), it turns out we can also use this feature to implement all kinds of dynamic scheduling algorithms, including the simulation of named yields, without facing the shortcomings of CoPause and CoResume. The following example (code available here) illustrates the key concepts behind such dynamic scheduling.

/****
 * Example 2: CoData pointers for scheduling selection
 ****/

#define NUMTASKS 4

typedef struct task {
    CoData cd;
    int id;
    int cycles;
    int next;
} task;

main(void) {
    /* CoData structures for tasks */
    task t[NUMTASKS];

    /* First "on" task and "off" task */
    int firston, firstoff;

    /* Next task to execute, and the one to execute after that */
    int nexttask, pending;

    /* CoData pointer for scheduling */
    CoData* pcd;

    int i;

    /* Initialize tasks */
    for (i = 0; i < NUMTASKS; ++i) {
        t[i].id = i+1;
        t[i].cycles = 0;
        t[i].next = i+1;
        CoBegin(&t[i].cd);
    }
    t[0].next = -1;
    t[NUMTASKS-1].next = -1;
    firston = 0;
    firstoff = 1;

    nexttask = 0;
    while(1) {
        /*
         * Keyboard handler for moving tasks between enabled and disabled
         * stacks omitted for brevity.
         */

        /* Task selection code */
        if (nexttask == -1 && firston == -1)
            /* Nothing enabled, nothing to run */
            continue;
        else if (nexttask == -1)
            nexttask = firston;
        pending = t[nexttask].next;
        pcd = &t[nexttask].cd;
        ++t[nexttask].cycles;

        /* Task execution */
        costate pcd {
            while (1) {
                switch (t[nexttask].id) {
                    // Task 1 and 4's code; just print its name each time
                    case 1:
                    case 4:
                        printf("Task #%d\n", t[nexttask].id);
                        break;

                    // Task 2's code; four parts, yield between each
                    case 2:
                        printf("Task #2, part 1\n");
                        yield;
                        printf("Task #2, part 2\n");
                        yield;
                        printf("Task #2, part 3\n");
                        yield;
                        printf("Task #2, part 4\n");
                        break;

                    // Task 3's code; run once for every 10 cycles of task 1
                    // then run task #1 again immediately (usually #2 would run
                    // next and then #1...).
                    case 3:
                        waitfor(t[0].cycles % 10 == 0);
                        printf("Task #3\n");
                        pending = 0;        /* Named yield to thread #1 */
                        break;
                }
                yield;
            }
        }

        /* Select the next task to run */
        nexttask = pending;
    }
}

The code above only has one costate block (well, actually two if you include a keyboard handler for enabling/disabling threads), but it handles 4 different worker tasks. The idea behind this code is that we keep two linked lists of tasks: one for enabled tasks and one for disabled tasks. Instead of using CoPause and CoResume to turn tasks on and off, we just switch the CoData pointer between each of the tasks in the enabled list which avoids the extra overhead associated with testing costatements that have been paused. Generally we traverse the active list in order, but it is possible to dynamically reschedule tasks by setting the 'nexttask' and 'pending' variables. An example of this appears in task #3's code: setting 'pending' to 0, causes task #1 to execute next (out of order); this is effectively a named yield. If we wanted to do some really fancy scheduling we could actually re-link our list in a different order or even create multiple "enabled" lists for threads of different priorities.

Dynamic Threading

Dynamic C has no built in mechanism for creating or destroying threads at runtime. Some applications may need to use a separate thread for each application-level event and can not determine statically how many threads will be required at any given point in time. The common solution to this problem is to create enough spare CoData structures to handle the expected load of the system. Unfortunately this solution has two problems:

/****
 * Example 3: Dynamic Process Creation and Destruction
 ****/

#define XMEM_ADDR unsigned long

typedef struct Thread {
    int id;
    CoData cd;
    XMEM_ADDR next;
} Thread;

int numthreads;                     // number of threads running
int nextid;                         // next available thread ID
XMEM_ADDR firstfree;                // first free thread structure
XMEM_ADDR firstthread;              // first active thread

/****
 * newthread()
 *
 * Creates a new thread of the specified type and adds it to the
 * 'active' queue.  If there are any pre-allocated thread structures
 * left over from threads that we've killed, we'll grab on of those
 * and use it.  Otherwise we'll xalloc a new thread structure.
 ****/
void newthread() {
    XMEM_ADDR addr;
    Thread t;

#GLOBAL_INIT {
    numthreads = 0;
    nextid = 0;
    firstthread = 0;
    firstfree = 0;
}

    if (firstfree == 0) {
        // No free thread structures available; xalloc one
        addr = xalloc(sizeof(Thread));
        if (addr == 0) {
            printf("ERROR: Can not allocate any more threads (out of xmem)\n");
            return;
        }
    } else {
        // Grab the first free thread structure from the free list and use that
        // for our new thread.
        addr = firstfree;
        xmem2root(&t, addr, sizeof(Thread));
        firstfree = t.next;
    }

    // Update the thread information structure
    t.id = ++nextid;
    t.next = firstthread;
    CoBegin(&t.cd);
    root2xmem(addr, &t, sizeof(Thread));
    firstthread = addr;
    ++numthreads;
    printf("Creating thread #%d; there are now %d running.\n",
            nextid, numthreads);
}


/****
 * killthread()
 *
 * Kill the first thread on the active queue.  Since Dynamic C does not
 * allow the de-allocation of xalloc'd memory, the best we can do is put
 * it on a freelist so that it can be reused later.
 *
 * For simplicity, this just kills the first thread on the active queue
 * (i.e. the last thread created) and runs in O(1).  It would be possible 
 * to kill a specific thread by ID, by performing a linear search on the 
 * linked list; this would make thread destruction O(n).
 *
 * @return The address of the thread killed
 ****/
XMEM_ADDR killthread() {
    XMEM_ADDR addr;
    Thread t;

    addr = firstthread;
    if (addr == 0)
        // no active threads
        return 0;

    // Move thread from active list to free list
    xmem2root(&t, addr, sizeof(Thread));
    firstthread = t.next;
    t.next = firstfree;
    root2xmem(addr, &t, sizeof(Thread));
    firstfree = addr;
    --numthreads;
    printf("Killed a thread; there are now %d left.\n", numthreads);
    return addr;
}


main(void) {
    Thread t;

    // Address of next thread to run
    XMEM_ADDR nextthread;

    // CoData pointer for scheduling
    CoData* pcd;

    nextthread = 0;
    while(1) {
        // Keyboard handler
        costate {
            while(1) {
                waitfor(kbhit());
                switch(getchar()) {
                    /* Stop one of the currently running threads */
                    case '-':
                        /* Make sure there is an enabled thread */
                        if (firstthread == 0) {
                            printf("All threads stopped.\n");
                            break;
                        }
                        if (nextthread == killthread())
                            nextthread = firstthread;
                        break;

                    case '+':
                        newthread();
                        break;
                }
            }
        }

        /* Task selection code */
        if (nextthread == 0 && (nextthread = firstthread) == 0)
            continue;

        xmem2root(&t, nextthread, sizeof(Thread));
        pcd = &t.cd;

        /* Task execution */
        costate pcd {
            // In this example, all threads are identical.  If we wanted,
            // we could switch on the thread ID to run different code
            // or we could add a task type field to the thread structure
            // that would be used here to determine what to do.
            while(1) {
                printf("Thread #%d running.\n", t.id);
                yield;
            }
        }

        /* Select the next task to run */
        (nextthread = t.next) || (nextthread = firstthread);
    }
}

(I was able to create 432 processes with the above program on a R2200 board before running out of memory.)

For simplicity, the example above stores all threads in extended memory and only copies them into root memory one at a time. It should be obvious that this introduces a lot of overhead (an xmem2root() copy for every single context switch). In a real application, it would make more sense to pre-allocate a set number of CoData structures in root memory and then only use xmem for the "excess" threads. It would also be more efficient to allocate blocks of CoData structures together in extended memory and then copy a whole block of CoData's to root memory at once and execute each one sequentially. This would reduce the flexibility of the dynamic scheduler somewhat, but would increase the performance of the application.

One shortcoming of this technique is that it does not appear that cofunctions can be called from dynamically allocated threads. We are not aware of any way of allocating extra copies of indexed cofunctions so all code will have to be placed directly in the costatement.

Conclusion and Future Work

This website has described some techniques for simulating advanced CM-programming models in Dynamic C. We have not had time to formally investigate the performance impact of these techniques yet, but it is on our todo list. We are also thinking about a writing a Dynamic C preprocessor that would allow higher level specification of tasks and schedulers. For example we would like to be able to write code like:

mainloop {
    task foo init_on {
        ...
        yield(baz);             // named yield
    }

    task bar always_on {
        enable(foo);            // like CoResume
        setpriority(baz, high); // change priority
    }

    // Two identical tasks
    task baz, buz always_on {
        disable(foo);           // like CoPause, but no overhead
        yield;                  // traditional (unnamed) yield
    }
}

A precompiler should be able to expand the above code to Dynamic C code using the dynamic scheduling and threading techniques described on this website.

Dynamic Threading and Scheduling With Dynamic C

Introduction

Dynamic Scheduling

Dynamic Threading

Conclusion and Future Work