1 - A slightly edited irc discussion with Josh Triplett.
   2 - Describes most data structures used in sparse.
   3 
   4 As far as the parsing structures go...
   5 The C parser exists in two main files: parse.c, which parses statements, and expression.c, which parses expressions.
   6 parse.h contains the definition of struct statement, which represents a C statement.
   7 That includes only those things which can't appear as an expression, which primarily includes control flow statements such as if, loops, switch/case, and goto.
   8 expression.h contains the definition of struct expression, which represents a C expression.  That has a lot more content, since most C constructs can appear in expressions.
   9 A series of statements forms a compound statement (STMT_COMPOUND).
  10 That appears as another struct statement which has a statement_list member.
  11 A function body consists of a compound statement.
  12 When you look at a loop body, if or else body, or case body, you'll notice that they just have a struct statement, not a statement_list; they can have multiple statements by using a compound statement.
  13 Also note that all loops get turned into a single "iterator" statement.
  14 for, while, and do-while.
  15 A symbol, then, represents a name in a C file.  A symbol might represent a variable, a function, a label, or various other things.
  16 See symbol.h.
  17 "struct symbol" represents one symbol.
  18 As with the various other structures, it has some common data and a union of sub-structures for the parts that differ between different types.
  19 Most of the interesting bits come in the NS_SYMBOL case.
  20 Among other things, it has a struct statement for the body of a function (if any), a list of symbols for the arguments, an expression for a variable initializer, and so on.
  21 Together, struct symbol, struct statement, and struct expression represent most of the abstract syntax tree for C.
  22 So, that represents most of the "front-end" of Sparse: parsing C and generating that abstract syntax tree.
  23 That much occurs in pretty much any program using the Sparse frontend.
  24 The backend varies among programs.
  25 For instance, the c2xml backend goes that far, then outputs XML.
  26 The sparse static analysis backend has a few steps: it generates linearized bytecode, does some evaluation on that, and outputs some warnings.
  27 Several other backends run that linearized bytecode stage.
  28 The linearized bytecode itself has a set of nested structures.
  29 linearize.h defines all of them.
  30 At the top level, it has struct entrypoint.
  31 That represents an entrypoint to the code, which would normally mean a function.
  32 An entrypoint has a list of basic blocks.
  33 struct basic_block.
  34 A basic block represents a series of instructions with no branches.
  35 Straight-line code.
  36 A branch only occurs at the end of a basic block, and branches can only target the beginning of a basic block.
  37 Typically, a conditional will consist of a basic block leading up to the branch, a basic block for the true case, a basic block for the false case, and a basic block where the two paths merge back together.
  38 Either the true or the false case may not exist.
  39 A loop will normally have a basic block for the loop body, which can branch to the top at the end or continue to the next basic block.
  40 So basic blocks represent a node in the control flow graph.
  41 The edges in that graph lead from one basic block to a basic block which can follow it in the execution of the program.
  42 Each basic block has a series of instructions, "struct instruction".
  43 "enum opcode" lists all the instructions.
  44 Fairly high-level instruction set, corresponding directly to bits of C.
  45 So you have an entrypoint, which has a graph of basic blocks, each of which has a list of instructions.
  46 An entrypoint also has a pointer to the first instruction.
  47 One last bit of trickiness: struct pseudo.
  48 Have you ever heard of "static single assignment" or SSA form?
  49 struct pseudo represents one of those single-assignment variables.
  50 Each one has a pointer to the symbol it represents (which may have many pseudos referencing it).
  51 Each one also has a pointer to the instruction that defines it.
  52 That covers most of the major data structures in Sparse.
  53 Now, given all that, some of the top-level stuff in sparse.c may make more sense.
  54 For instance, the context checking works in terms of basic blocks.
  55 Hopefully some of that helped you understand Sparse better.