Print this page
new smatch

*** 1,3 **** - There are some documents under the Documentation/ directory. - For parsing implicit dependencies, see smatch_scripts/implicit_dependencies. --- 1,72 ---- For parsing implicit dependencies, see smatch_scripts/implicit_dependencies. + ======= + sparse (spärs), adj,., spars-er, spars-est. + 1. thinly scattered or distributed; "a sparse population" + 2. thin; not thick or dense: "sparse hair" + 3. scanty; meager. + 4. semantic parse + [ from Latin: spars(us) scattered, past participle of + spargere 'to sparge' ] + + Antonym: abundant + + Sparse is a semantic parser of source files: it's neither a compiler + (although it could be used as a front-end for one) nor is it a + preprocessor (although it contains as a part of it a preprocessing + phase). + + It is meant to be a small - and simple - library. Scanty and meager, + and partly because of that easy to use. It has one mission in life: + create a semantic parse tree for some arbitrary user for further + analysis. It's not a tokenizer, nor is it some generic context-free + parser. In fact, context (semantics) is what it's all about - figuring + out not just what the grouping of tokens are, but what the _types_ are + that the grouping implies. + + And no, it doesn't use lex and yacc (or flex and bison). In my personal + opinion, the result of using lex/yacc tends to end up just having to + fight the assumptions the tools make. + + The parsing is done in five phases: + + - full-file tokenization + - pre-processing (which can cause another tokenization phase of another + file) + - semantic parsing. + - lazy type evaluation + - inline function expansion and tree simplification + + Note the "full file" part. Partly for efficiency, but mostly for ease of + use, there are no "partial results". The library completely parses one + whole source file, and builds up the _complete_ parse tree in memory. + + Also note the "lazy" in the type evaluation. The semantic parsing + itself will know which symbols are typedefines (required for parsing C + correctly), but it will not have calculated what the details of the + different types are. That will be done only on demand, as the back-end + requires the information. + + This means that a user of the library will literally just need to do + + struct string_list *filelist = NULL; + char *file; + + action(sparse_initialize(argc, argv, filelist)); + + FOR_EACH_PTR(filelist, file) { + action(sparse(file)); + } END_FOR_EACH_PTR(file); + + and he is now done - having a full C parse of the file he opened. The + library doesn't need any more setup, and once done does not impose any + more requirements. The user is free to do whatever he wants with the + parse tree that got built up, and needs not worry about the library ever + again. There is no extra state, there are no parser callbacks, there is + only the parse tree that is described by the header files. The action + funtion takes a pointer to a symbol_list and does whatever it likes with it. + + The library also contains (as an example user) a few clients that do the + preprocessing, parsing and type evaluation and just print out the + results. These clients were done to verify and debug the library, and + also as trivial examples of what you can do with the parse tree once it + is formed, so that users can see how the tree is organized.