Lab 2: Graphs and Tours

Overview

There are few better ways of getting acquainted with a language than to simply sit down and write some reasonably complex code, so for this next lab we'll continue to forgo the "systems" aspect of the course — which we haven't begun to cover in earnest yet anyway — and jump back into data structures territory.

One of the common data structures most ripe with related algorithms and open research problems is the graph. A graph, as you probably know, is a data structure that consists of a set of vertices (aka nodes) and edges between those vertices. Edges may be directed or undirected, and often have weights associated with them. The following graph is an undirected, weighted graph that represents the distances between Chicago and some nearby suburbs.

Chicago Suburb Graph

Some things we like to do with graphs are:

Note that although the last two seem very similar, there's a fairly straightforward and efficient "greedy" solution to the minimum spanning tree problem, but we still don't have a good (fast) solution for the TSP — and that's after nearly a century of trying!

It's likely that you've already implemented the graph data structure and some related algorithms in your data structures and algorithms course, but we're going to go ahead and do it this time using C.

There are a number of different approaches to representing graphs as data structures — two common ones are the adjacency matrix and the adjacency list.

For the graph above, we would have the following adjacency matrix:

Chicago Suburb Graph Adjacency Matrix

Note that if a graph has a large number of vertices but each vertex is only connected to a few others (we call this a sparse graph) an adjacency matrix is a pretty inefficient representation. We'd be allocating a huge two-dimensional array, essentially, to store very little information.

In that case, an adjacency list is a better choice. The following is an adjacency list representation of the Chicago suburb graph:

Chicago Suburb Graph Adjacency List

In an adjacency list, each vertex is associated with a linked list of those vertices (and corresponding edge weights) adjacent to it.

Your task for this lab is to construct an adjacency list representation of a graph specified on the command line, and to print out the following:

Your program would be invoked at the command line as follows to construct the above Chicago suburb graph:

./graph Chicago Plainfield 30 Chicago OakPark 8  Chicago Schaumburg 30 \
        Chicago Evanston 12 OakPark Evanston 14 Schaumburg Evanston 24

Note that use of the '\' character allows us to spread the command line entry over multiple lines.

The command line arguments are used to specify each edge (and its weight) present in the graph to be constructed. Spaces aren't allowed in vertex names (so "Oak Park" is written OakPark), and the program automatically assumes strings that match in separate edges refer to the same vertex. Edge weights are integers (and fit in 32-bit ints). The graph specified is also assumed to be connected — i.e., any two vertices can be connected via some path through the graph.

Another important point to mention is that, because the graph is undirected, an edge between two vertices is only specified once. E.g., the Chicago-Plainfield edge is not specified again as Plainfield-Chicago. That said, the command line invocation above could be written in several other equivalent ways. Here's one of them:

./graph Plainfield Chicago 30 Chicago OakPark 8 OakPark Evanston 14 \
        Evanston Schaumburg 24 Schaumburg Chicago 30 Evanston Chicago 12 

Your program should handle any equivalent permutation.

The output for either of the above invocations would be as follows: (the order of vertices listed is not important)

Adjacency list:
  Chicago: OakPark(8) Evanston(12) Schaumburg(30) Plainfield(30)
  OakPark: Chicago(8) Evanston(14)
  Evanston: Chicago(12) OakPark(14) Schaumburg(24)
  Schaumburg: Chicago(30) Evanston(24)
  Plainfield: Chicago(30)

Tour path:
  Plainfield Chicago Schaumburg Evanston OakPark

Tour length: 98

Note that there may very well be multiple possible tours — your program just needs to find one of them (and not necessarily the shortest).

Preliminaries

In this lab you'll be working in the labs/2_graphlab directory.

As before, don't forget to commit your previous work and pull the latest changes from the central repository before starting!

Implementation Details

You'll be working on the following three files for this lab:

We advise you to partition your work into phases so that you don't get overwhelmed. Your code should compile and run correctly before you move onto each subsequent phase.

First, you should make sure you're comfortable with the processing of command line arguments and with string handling. You should look into the atoi standard library function for converting strings to integers (and thereby easily accessing edge weights). For starters, have your program parse the command line arguments and echo the vertex names and edge weights. Comment out the code currently in the main function as you do this.

Next, check out the structure declarations given to you in the graph.h file — there are two, duplicated below:

typedef struct vertex vertex_t;
typedef struct adj_vertex adj_vertex_t;

struct vertex {
    char *name;
    adj_vertex_t *adj_list;
    vertex_t *next;
};

struct adj_vertex {
    vertex_t *vertex;
    int edge_weight;
    adj_vertex_t *next;        
};

Consider the following simple graph:

Simple graph

Included in main.c is sample code, listed below, that manually creates an adjacency list for the above graph (note that the code does not free the adjacency list — this is something you'll have to do!):

vertex_t *v1, *v2, *v3, *vlist_head;
adj_vertex_t *adj_v;

vlist_head = v1 = malloc(sizeof(vertex_t));
v1->name = "A";
v2 = malloc(sizeof(vertex_t));
v2->name = "B";
v3 = malloc(sizeof(vertex_t));
v3->name = "C";

v1->next = v2;
v2->next = v3;
v3->next = NULL;

adj_v = v1->adj_list = malloc(sizeof(adj_vertex_t));
adj_v->vertex = v2;
adj_v->edge_weight = 10;
adj_v->next = NULL;

adj_v = v2->adj_list = malloc(sizeof(adj_vertex_t));
adj_v->vertex = v1;
adj_v->edge_weight = 10;
adj_v = adj_v->next = malloc(sizeof(adj_vertex_t));
adj_v->vertex = v3;
adj_v->edge_weight = 5;
adj_v->next = NULL;

adj_v = v3->adj_list = malloc(sizeof(adj_vertex_t));
adj_v->vertex = v2;
adj_v->edge_weight = 5;
adj_v->next = NULL;

This next figure depicts the structures allocated in the code and their interrelationships:

Make sure you understand how the various structure types and pointers are used to create the adjacency list.

The last bit of code you're given prints out the adjacency list for the graph:

vertex_t *vp;
printf("Adjacency list:\n");
for (vp = vlist_head; vp != NULL; vp = vp->next) {
    printf("  %s: ", vp->name);
    for (adj_v = vp->adj_list; adj_v != NULL; adj_v = adj_v->next) {
        printf("%s(%d) ", adj_v->vertex->name, adj_v->edge_weight);
    }
    printf("\n");
}

The output produced is:

Adjacency list:
  A: B(10) 
  B: A(10) C(5) 
  C: B(5) 

You should delete all the given code (except for the traversal, which you can reuse if you wish) and get your program to create an adjacency list using the command line parameters.

The graph.h file contains the following prototype, which you should provide an implementation for in graph.c.

/* This is the one function you really should implement as part of your 
 * graph data structure's public API. 
 *
 * `add_edge` adds the specified edge to the graph passed in via the 
 * first argument. If either of the edge's vertices are not already 
 * in the graph, they are added before their adjacency lists are 
 * updated. If the graph is currently empty (i.e., *vtxhead == NULL), 
 * a new graph is created, and the caller's vtxhead pointer is 
 * modified. 
 *
 * `vtxhead`: the pointer to the graph (more specifically, the head 
 *            of the list of vertex_t structures)
 * `v1_name`: the name of the first vertex of the edge to add
 * `v2_name`: the name of the second vertex of the edge to add
 * `weight` : the weight of the edge to add
 */
void add_edge (vertex_t **vtxhead, char *v1_name, char *v2_name, 
               int weight);

A correct implementation of add_edge should allow you create a graph using a sequence of calls like this:

vertex_t *vlist_head = NULL;
add_edge(&vlist_head, "Chicago", "Plainfield", 30);
add_edge(&vlist_head, "Chicago", "OakPark", 8);
add_edge(&vlist_head, "OakPark", "Evanston", 14);
add_edge(&vlist_head, "Evanston", "Schaumburg", 24);
add_edge(&vlist_head, "Schaumburg", "Chicago", 30);
add_edge(&vlist_head, "Evanston", "Chicago", 12);

When you have this working, you're finally ready to start traversing your graph and searching for a tour. To do this, you should define appropriate functions in graph.h and provide their implementations in graph.c. You should find that the problem lends itself fairly naturally to a recursive implementation. Have fun!

Building

A simple makefile is provided for you that compiles and links 'main.c' and 'graph.c', and builds the executable 'graph'. You can start the build process with the command:

make

Sometimes, when you're running into weird problems with a build, it helps to delete all the intermediate build files and recompile from scratch. You can do this with the command:

make clean ; make

If your program builds successfully, you can run it with the command:

./graph

Of course, you'll be testing it with command line parameters, so you'll more likely do something like:

./graph A B 10 B C 5

Grading

This lab is worth a total of 40 points. Below is the rubric I will use to grade your work:

20 points: Graph construction

10 points: Tour search

10 points: Memory allocation & Code modularity