SimGrid 3.6.2
Scalable simulation of distributed systems
Lesson 8: Handling errors through exceptions

Table of Contents


Introduction

Exceptions are a great mecanism to deal with error exception, everyone knows that.

Without exceptions, you have to rely on returning a value indicating whether the function call were right or not, and check the return values of every single function you call. If there is one point in the calling sequence where your forgot this check, the chain is broken and caller won't notice the issue. In practice, dealing with error without exceptions loads user code with *tons* of those stupid checks and you loose your functional code in the middle of that miasm.

With them, you simply write your code. If you want to deal with errors (ie, you actually know how to react to errors at this point of your code), you write a catching block. If you don't, you don't. And exceptions flow through from trowing point to catching point without bothering you.

At this point, you may be a bit surprised by the previous paragraphs. SimGrid and GRAS are written in C, and everybody knows that there is no exception in C but only in C++, Java and such. This is true, exceptions are not part of the C language, but this is such a great tool that we implemented an exception mecanism as part of the SimGrid library (with setjmp and longjmp, for the curious).

Being "home-grown" make our exception mecanic both stronger and weaker at the same time. First it is weaker because, well, we are more limitated within the library as we are than if we could change the compiler itself to add some extra checks here and specific treatment there. But it is also a advantage for us, since the exception mecanism is perfectly fitted to the distributed settings of GRAS processes. They can easily propagate on the net, as we will see in the next lesson (Lesson 10: Remote Procedure Calling (RPC)) and contain information about the host on which they were thrown (xbt_ex_t) along with the thrown point in the source code.

The syntax of XBT exceptions should not sound unfamilliar to most of you. You throw them using the THROW and THROWF macros. They take 2 arguments: an error category (of type xbt_errcat_t) and an error "value" (an integer; pratically, this is often left to 0 in my own code). THROWF also takes a message string as extra argument which is a printf-like format string with its own arguments. So, you may have something like the following:

THROWF(system_error, 0, "Cannot connect to %s:%d because of %s", hostname, port, reason);

Then, you simply add a TRY/CATCH block around your code:

TRY{ 
  /* your code */ 
}
CATCH(e) {
  /* error handling code */
} 

Another strange thing is that you should actually free the memory allocated to the exception with xbt_ex_fres() if you manage to deal with them. There is a bit more than this on the picture (TRY_CLEANUP blocks, for example), and you should check the section Exception support for more details.

You should be very carfull when using the exceptions. They work great when used correctly, but there is a few golden rules you should never break. Moreover, the error messages and symptom can get really crude when misusing the exceptions.

So, as you can see, you don't want to include large sections of your program in TRY blocks. If you do so, it's quite sure that one day, you'll do a break or a return within this block. And believe me, finding such typos is a real pain.

If you are suspecting this kind of error, I made a little script for you: check tools/xbt_exception_checker from the CVS. Given a set of C files, it extracts the TRY blocks and display them on the standard output so that you can grep for return, break and such forbidden words.

Putting exceptions into action

Okay. I hope those little warnings didn't discourage you from using the exceptions, because they really are a nice mecanism. We will now change a bit our program to take advantage of them. The only issue is that when a program run properly, it usually don't raise any exception. We could protect the calls we already have with exception handling, but it wouldn't be really exciting since we know this code does not throw any exception under the condition we use (actually, most of the GRAS functions may throw exception on problem).

Instead, we will code a little game between the client and the server. We won't tell the client the exact port on which the server listen, and it will have to find from itself. For this, it will try to open socket and send the kill message to each ports of the search range. If it manage to close the socket after sending the message without being interrupted by an exception, it can assume that it killed the server and stop searching.

To make the game a bit more fun (and to show you what an exception actually look like when it's not catched), we add a potential command line argument to the server, asking it to cheat and to not open its port within the search range but elsewhere:

  if (argc > 1 && !strcmp(argv[1], "--cheat")) {
    mysock = gras_socket_server(9999);
    XBT_INFO("Hi! hi! I'm not in the search range, but in 9999...");
  } else {
    mysock = gras_socket_server((rand() % 10) + 3000);
    XBT_INFO("Ok, I'm hidden on port %d. Hope for the best.",
          gras_socket_my_port(mysock));
  }

Then, when the client detects that it didn't manage to find&destroy the server, it throws a suicide exception (sorry for the bad jokes):

Recapping everything together

Here is the output produced by this new program. Note that when the program bails out because of an uncatched exception, it displays its backtrace just like a JVM would do (ok, it'a a bit cruder than the one of the JVM, but anyway). For each function frame of the calling stack, it displays the function name and its location in the source files (if it manage to retrieve it). Don't be jalous, you can display such stacks wherever you want with xbt_backtrace_display() ;)

Unfortunately, this feature is only offered under Linux for now since I have no idea of how to retrieve the call stack of the current process under the other operating systems. But help is always welcome in this area too ;)

$ ./test_server & ./test_client 127.0.0.1 
[arthur:client:(27889) 0.000013] [test/INFO] Damn, the server is not on 3000
[arthur:client:(27889) 0.000729] [test/INFO] Yeah! I found the server on 3001! It's eradicated by now.
[arthur:client:(27889) 0.000767] [gras/INFO] Exiting GRAS
[arthur:server:(27886) 0.000013] [test/INFO] Ok, I'm hidden on port 3001. Hope for the best.
[arthur:server:(27886) 1.500772] test.c:15: [test/CRITICAL] Argh, killed by 127.0.0.1:1024! Bye folks, I'm out of here...
[arthur:server:(27886) 1.500819] [gras/INFO] Exiting GRAS
$
$ ./test_server --cheat & ./test_client 127.0.0.1 
[arthur:client:(27901) 0.000014] [test/INFO] Damn, the server is not on 3000
[arthur:client:(27901) 0.000240] [test/INFO] Damn, the server is not on 3001
[arthur:client:(27901) 0.000386] [test/INFO] Damn, the server is not on 3002
[arthur:client:(27901) 0.000532] [test/INFO] Damn, the server is not on 3003
[arthur:client:(27901) 0.000671] [test/INFO] Damn, the server is not on 3004
[arthur:client:(27901) 0.000815] [test/INFO] Damn, the server is not on 3005
[arthur:client:(27901) 0.000960] [test/INFO] Damn, the server is not on 3006
[arthur:client:(27901) 0.001100] [test/INFO] Damn, the server is not on 3007
[arthur:client:(27901) 0.001257] [test/INFO] Damn, the server is not on 3008
[arthur:client:(27901) 0.001396] [test/INFO] Damn, the server is not on 3009
** SimGrid: UNCAUGHT EXCEPTION received on arthur(27901): category: not found; value: 0
** Damn, I failed to find the server! I cannot survive this humilliation.
** Thrown by client() in this process
[arthur:client:(27901) 0.001475] xbt/ex.c:113: [xbt_ex/CRITICAL] Damn, I failed to find the server! I cannot survive this humilliation.

**   In client() at /home/mquinson/Code/simgrid-git/doc/gtut-files/test.c:80
$ killall test_server
[arthur:server:(27897) 0.000014] [test/INFO] Hi! hi! I'm not in the search range, but in 9999...
$
$ ./test_simulator platform.xml test.xml
[Jacquelin:server:(1) 0.000000] [test/INFO] Ok, I'm hidden on port 3000. Hope for the best.
[Boivin:client:(2) 1.500552] [test/INFO] Yeah! I found the server on 3000! It's eradicated by now.
[Boivin:client:(2) 1.500552] [gras/INFO] Exiting GRAS
[Jacquelin:server:(1) 1.500552] test.c:15: [test/CRITICAL] Argh, killed by Boivin:1024! Bye folks, I'm out of here...
[Jacquelin:server:(1) 1.500552] [gras/INFO] Exiting GRAS
$

The complete program reads:

/* Copyright (c) 2006, 2007, 2010. The SimGrid Team.
 * All rights reserved.                                                     */

/* This program is free software; you can redistribute it and/or modify it
  * under the terms of the license (GNU LGPL) which comes with this package. */

#include <gras.h>

XBT_LOG_NEW_DEFAULT_CATEGORY(test, "My little example");

typedef struct {
  int killed;
} server_data_t;


int server_kill_cb(gras_msg_cb_ctx_t ctx, void *payload)
{
  gras_socket_t client = gras_msg_cb_ctx_from(ctx);
  server_data_t *globals = (server_data_t *) gras_userdata_get();

  XBT_CRITICAL("Argh, killed by %s:%d! Bye folks, I'm out of here...",
            gras_socket_peer_name(client), gras_socket_peer_port(client));

  globals->killed = 1;

  return 0;
}                               /* end_of_kill_callback */

int server(int argc, char *argv[])
{
  gras_socket_t mysock;         /* socket on which I listen */
  server_data_t *globals;

  gras_init(&argc, argv);

  globals = gras_userdata_new(server_data_t *);
  globals->killed = 0;

  gras_msgtype_declare("kill", NULL);
  gras_cb_register("kill", &server_kill_cb);

  if (argc > 1 && !strcmp(argv[1], "--cheat")) {
    mysock = gras_socket_server(9999);
    XBT_INFO("Hi! hi! I'm not in the search range, but in 9999...");
  } else {
    mysock = gras_socket_server((rand() % 10) + 3000);
    XBT_INFO("Ok, I'm hidden on port %d. Hope for the best.",
          gras_socket_my_port(mysock));
  }

  while (!globals->killed) {
    gras_msg_handle(-1);        /* blocking */
  }

  gras_exit();
  return 0;
}

int client(int argc, char *argv[])
{
  gras_socket_t mysock;         /* socket on which I listen */
  gras_socket_t toserver;       /* socket used to write to the server */
  int found;                    /* whether we found peer */
  int port;                     /* where we think that the server is */
  xbt_ex_t e;

  gras_init(&argc, argv);

  gras_msgtype_declare("kill", NULL);
  mysock = gras_socket_server_range(1024, 10000, 0, 0);

  XBT_VERB("Run little server, run. I'll get you. (sleep 1.5 sec)");
  gras_os_sleep(1.5);

  for (port = 3000, found = 0; port < 3010 && !found; port++) {
    TRY {
      toserver = gras_socket_client(argv[1], port);
      gras_msg_send(toserver, "kill", NULL);
      gras_socket_close(toserver);
      found = 1;
      XBT_INFO("Yeah! I found the server on %d! It's eradicated by now.",
            port);
    }
    CATCH(e) {
      xbt_ex_free(e);
    }
    if (!found)
      XBT_INFO("Damn, the server is not on %d", port);
  }                             /* end_of_loop */

  if (!found)
    THROWF(not_found_error, 0,
           "Damn, I failed to find the server! I cannot survive this humilliation.");


  gras_exit();
  return 0;
}

Go to Lesson 9: Exchanging simple data


Back to the main Simgrid Documentation page The version of Simgrid documented here is v3.6.2.
Documentation of other versions can be found in their respective archive files (directory doc/html).
Generated for SimGridAPI by doxygen