29 February 2012

Hash tables with 2 keys

Let's say you have a structure, which has an identifier (a string), an index (a number), and various data. I tried to find a good solution to create a hash table in which I would be able to search for an element after its identifier or its number. This is what I came up with:
typedef struct {
    char* identifier;
    int index;
    void* data;
} my_type_t;
So obviously, when we're adding the element to the table, we calculate the hash value of the identifier.
int hash = hash_function(my_struct->identifier);
Next, we generate the index based on the hash value. We simply keep an incremented value in each hash table slot, multiply it by the hash table size and add the hash value, and then simply add the structure to the list.
my_struct->index = (hash_table[hash]->add_structure_to_list(my_struct, hash_table[hash]);
hash_table[hash]->current_index += 1;
Now, when we want to search for an element in the hash table we do the following:
In order to look after an identifier:
int hash = hash_function(identifier);
for (element=hash_table[hash]->head;element;element=element->next) {
    if (!strcmp(element->identifier, identifier)) {
        return element;
    }
}
In order to look after an index we simply do a modulo operation. The value resulted is exactly the hash value that was generated when the element was added to the list, and will be the same as the hash value calculated by the function if applied to the identifier of the element.
int hash = index % hash_size;
for (element=hash_table[hash]->head;element;element=element->next) {
    if (element->index == index) return element;
}

06 February 2012

Read-Write Locks

Using locks inevitably serializes a multiprocessed/multithreaded application. Now this would not be a problem if the locks are gotten for very short periods of time. However, if not, the whole advantage of parallel processing is lost. A solution for that are read-write locks. Data can be read by multiple threads/processes in parallel, so as long as no process/thread modifies the data, it would be ok. For this situation, read-write locks can be used.
The way these work is as follows:
Read-Write locks actually contain 2 locks, a read lock and a write lock. The write lock is set when a process/thread is writing, while the read lock is incremented when  a process/thread is reading, and decremented when done.
To avoid reading while the data is being written, a reader gets the write lock. Once the lock is gotten, the read lock is incremented and the write lock is released again. This way, the write lock is only set for a very short time amount of time.
To avoid writing while the data is being read, a writer first gets the write lock. This will prevent any new reader to continue reading the data. The writer then waits until the reader lock decrements to 0, meaning that all readers who had the lock before the write lock was gotten, finish reading. After this, the data is modified, and then the write lock is released again, allowing the readers who were waiting for the write lock to continue their processing.
Here's the code for it (I won't add the implementation for the get_lock and release_lock function, to save space, and because I am lazy to write it here).
The Lock structure is easy, it basically looks like this:
typedef struct {
    volatile int read;
    volatile int write;
} rwlock_t;
Function for getting the read lock. The function increases the read lock, with the write lock gotten.
void rwlock_get_rd(rwlock_t *lock)
{
    get_lock(&(lock->wr));
    lock->rd += 1;
    release_lock(&(lock->wr));
}
Function for releasing the read lock. The function just decrements the read lock.
void rwlock_release_rd(rwlock_t *lock)
{
    lock->rd -= 1;
}
Function for getting the write lock. The function also waits for all the readers to finish. The get_lock_no_set function is almost the same as get_lock function, the difference being that this one doesn't modify the lock, it just waits for the lock to be 0 and then returns.
void rwlock_get_wr(rwlock_t *lock) 
{
    get_lock(&(lock->wr));
    get_lock_no_set(&(lock->rd));
}
Function for releasing the write lock.
void rwlock_release_wr(rwlock_t *lock) 
{
    release_lock(&(lock->wr));
}
Here's what would happen if you use this kind of locks. The writer has to have both locks when it writes, and while it does, both readers have to wait for it to finish before reading. The timeline shows an application with 3 reader threads and one writer (I know, the 'drawing' is bad, but meh, it's what I could come up with at one in the morning ....)
______________________________________________________________________________ Reader 1 | RD | -========= -========== -=========== | WR | -= -= -------------= ______________________________________________________________________________ Reader 2 | RD | -========= -=========== -=========== | WR | --= -= ------------= ______________________________________________________________________________ Reader 3 | RD | -========= -=========== -=========== | WR | -= -= ------------= ______________________________________________________________________________ Writer | RD | ---------============= | WR | -====================== ______________________________________________________________________________ - thread waits for lock = thread has lock