1 # 2 # CDDL HEADER START 3 # 4 # The contents of this file are subject to the terms of the 5 # Common Development and Distribution License, Version 1.0 only 6 # (the "License"). You may not use this file except in compliance 7 # with the License. 8 # 9 # You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE 10 # or http://www.opensolaris.org/os/licensing. 11 # See the License for the specific language governing permissions 12 # and limitations under the License. 13 # 14 # When distributing Covered Code, include this CDDL HEADER in each 15 # file and include the License file at usr/src/OPENSOLARIS.LICENSE. 16 # If applicable, add the following below this CDDL HEADER, with the 17 # fields enclosed by brackets "[]" replaced with your own identifying 18 # information: Portions Copyright [yyyy] [name of copyright owner] 19 # 20 # CDDL HEADER END 21 # 22 # Copyright (c) 1995 Sun Microsystems, Inc. All Rights Reserved 23 # 24 #ident "%W% %E% SMI" 25 # 26 # design notes that are likely to be of general (rather than 27 # merely historical) interest. 28 29 Table of Contents 30 31 Overview what filesync does 32 33 Primary Data Structures 34 general principles why they exist 35 key concepts what they represent 36 data structures major structures and their contents 37 38 Overview of Passes main phases of program execution 39 40 Modules list and descriptions of files 41 42 Studying the Code 43 active ingredients a reading list of high points 44 the whole thing a suggested order for everything 45 46 Gross calling structure who calls whom 47 48 Helpful hints good things to know 49 50 Overview 51 52 The purpose of this program is to compare pairs of directory 53 trees with a baseline snapshot, to determine which files have 54 changed, and to propagate the changes in order to bring the 55 trees back into congruency. The baseline snapshot describes 56 size, ownership, ... for all files that filesync is managing 57 WHEN THEY WERE LAST IN SYNC. 58 59 The files and directory trees to be compared are determined 60 by a relatively flexible (user editable) rules file, whose 61 format (packingrules.4) permits files and or trees to be 62 specified, explicitly, implicitly, or with wild cards. 63 There are also provisions for filtering out unwanted files 64 and for running programs to generate lists of files and 65 directories to be included or excluded. 66 67 The comparisons begin by comparing the structured name 68 spaces. For names that appear in both trees, the files 69 are then compared on the basis of type, size, contents, 70 ownership and protections. For files that are already 71 in the baseline snapshot, if the sizes and modification 72 times have not changed, we do not bother to recheck the 73 contents. 74 75 The reconciliation process (resolving the differences) 76 will only propagate a change if it is obvious what should 77 be done (one side has changed relative to the snapshot, 78 while the other has not). If there are conflicting changes, 79 the file is flagged and the user is asked to reconcile the 80 differences manually. There are, however a few switches 81 that can be used to constrain the analysis or reconciliation, 82 or to force one particular side to win in case of a conflict. 83 84 85 Primary Data Structures 86 87 general principles: 88 we will build up an in-memory tree that represents 89 the union of the name spaces found in the baseline 90 and on the source and destination sides. 91 92 keep in mind that the baseline recalls the state of 93 files THE LAST TIME THEY WERE IN AGREEMENT. If files 94 have disagreed for a long time, the baseline still 95 remembers what they were like when they agreed. If 96 files have never agreed, the baseline has no notions 97 of how they "used to be". 98 99 key concepts: 100 a "base pair" is a pair of directories whose 101 contents (or a subset of whose contents) are to 102 be syncrhonized. The "base pairs" to be managed 103 are specified in the packing rules file. 104 105 associated with each "base pair" is a set of rules 106 that describe which files (under those directories) 107 are to be kept in sync. Each rule is a list of: 108 files and or directories to be included 109 wild cards for files or directories to be included 110 programs to generate lists of names for inclusion 111 file names to be ignored 112 wild cards for file names to be ignored 113 programs to generate lists of names for ignoring 114 115 as a result of the "evaluation" process we build up 116 (under each base pair) a tree that represents all of 117 the files that we are supposed to keep in sync, and 118 contains everything we need to know about each one 119 of those files. The structure of the tree mirrors 120 the directory hierarchy ... actually the union of the 121 three hiearchies (baseline, source and destination). 122 123 for each file, we record interesting information (type, 124 size, owner, protection, mod time) and keep separate 125 note of what these values were: 126 in the baseline last time two sides agreed 127 on the source side, as we just examined it 128 on the destination side, as we just examined it 129 130 data structures: 131 132 there is an ordered list of "base" structures 133 for each base, we maintain 134 three lists of associated "rule" descriptions: 135 inclusion rules 136 exclusion rules 137 restriction rules (from the command line) 138 a "file" tree, representing all files below the bases 139 a list of statistics to be printed as a summary 140 141 for each "rule", we maintain 142 some flags describing the type of rule 143 the character string that is the rule 144 145 for each "file", we maintain 146 sibling and child pointers to give them tree structure 147 flags to describe what we have done/should do 148 "fileinfo" information from the src, dest, and baseline 149 150 in addition there are some fields that are used 151 to add the file to a list of files requiring 152 reconciliation and record what happened to it. 153 154 a "fileinfo" structure contains a subset of the information 155 that we obtain from a stat call: 156 major/minor/inum 157 type 158 link count 159 ownership, protection, and acls 160 size 161 modification time 162 163 there is also, built up during analysis, a reconciliation 164 list. This is an ordered list of "file" structures which 165 are believed to descibe files that have changed and require 166 reconciliation. The ordering is important both for correctness 167 and to preserve relative modification times. 168 169 Overview of passes: 170 171 pass I (evaluate) 172 173 stat every file that we might be interested in 174 (on both src/dest sides). This includes walking 175 the trees under all directories in order to 176 find out what files exist and stating all of 177 them. 178 179 the main trick in this pass is that there may be 180 files we don't want to evaluate (because we are 181 limiting our attention to specific files and trees). 182 There is a LISTED flag kept in the database that 183 tells me whether or not I need to stat/descend any 184 given node. 185 186 all restrictions and ignores take effect during this pass. 187 188 pass II (analyze) 189 190 given the baseline and all of the current stat information 191 gained during pass I, figure out what might conceivably 192 have changed and queue it for pass III. This pass doesn't 193 try to figure out what happened or who should win ... it 194 merely identifies candidates for pass III. This pass 195 ignores any nodes that were not evaluated during pass I. 196 197 the queueing process, however, determines the order in 198 which the files will be processed in pass III, and the 199 order is very important. 200 201 pass III (reconcile) 202 203 process the list of candidates, figuring out what has 204 actually changed and which versions deserve to win. If 205 is clear what needs doing, we actually do it in this 206 pass. 207 208 Modules 209 210 filesync.h 211 defines for limits, sizes and return codes 212 declarations for global variables (mostly cmd-line parms) 213 defines for default file names 214 declarations for routines of general interest 215 216 database.h 217 data-structures for recording rules 218 data-structures for recording information about files 219 declarations for routines that operate on/with those structures 220 221 messages.h 222 the text of all localizable messages 223 224 debug.h 225 definitions and declarations for routines for error 226 simulation and bit-map display. 227 228 acls.c 229 routines to get, set, compare, and display Access Control Lists 230 action.c 231 routines to do the real work of copying, deleting, or 232 changing ownership in order to make one side agree 233 with the other. 234 anal.c 235 routines to examine the in-core list of files and 236 determine what has changed (and therefore what is 237 files are candidates for reconciliation). This 238 analysis includes figuring out which files should 239 be links rather than copies. 240 base.c 241 routines to read and write the baseline file 242 routines to search and manipulate the in-core base list 243 debug.c 244 data structures and routines, used to sumulate errors 245 and produce debug output, that map between bits (as found 246 in various flag words) character string names for their 247 meanings. 248 249 eval.c 250 routines to build up the internal tree that describes 251 the status of all of the files that are described 252 by the current rules. 253 files.c 254 routines to manipulate file name arguments, including 255 wild cards and embedded environment variables. 256 ignore.c 257 routines to maintain a list of names or patterns for 258 files to be ignored, and to check file names against 259 that list. 260 main.c 261 global variables, cmd-line parameter processing, 262 parameter validation, error reporting, and the 263 main loop. 264 recon.c 265 routines to examine a list of files that appear to 266 have changed, and figure out what the appropriate 267 reconciliation course of action is. 268 rename.c 269 routines to search the tree to determine whether 270 or not any creates/deletes are actually renames. 271 rules.c 272 routines to read and write the rules file 273 routines to add rules and enumerate in-core rules 274 275 filecheck.c 276 not really a part of filesync, but rather a utility 277 program that is used in the test suite. It extracts 278 information about files that is not readily available 279 from other unix commands. 280 281 Comments on studying the code 282 283 if you are only interested in the "active ingredients": 284 285 read the above notes on data structures and then 286 287 read the structure declarations in database.h 288 289 read the above notes overviewing the passes 290 291 in recon.c: read reconcile 292 293 this routine almost makes sense on its own, 294 and it is unquestionably the most important 295 routine in the entire program. Everything 296 else just gathers data for reconcile to use, 297 or updates the books to reflect the changes. 298 299 in eval.c: read evaluate, eval_file, walker, and note_info 300 301 this is the main guts of pass I 302 303 in anal.c: read analyze, check_file, check_changes & queue_file 304 305 this is the main guts of pass II 306 307 if you want to read the whole thing: 308 309 the following routines do fundamentally simple things 310 in simple ways, and can (for the most part) be understood 311 in vaccuuo. The things they do are probably sufficiently 312 obvious that you can probably understand the more interesting 313 code without having read them at all. 314 315 base.c 316 rules.c 317 files.c 318 debug.c 319 ignore.c 320 acls.c 321 322 the following routines constitute the real meat of the 323 program, and while they are broken into specialized 324 modules, they probably need to be understood as an 325 organic whole: 326 327 main.c setup and control 328 eval.c pass I 329 anal.c pass II 330 recon.c pass III 331 action.c execution and book-keeping 332 rename.c a special case for a common situation 333 334 335 Gross calling structure / flow of control 336 337 main.c:main 338 findfiles 339 read_baseline 340 read_rules 341 if new rules 342 add_base 343 add_include 344 evaluate 345 analyze 346 write_baseline 347 write_summary 348 349 eval.c:evaluate 350 add_file_to_base 351 add_glob 352 add_run 353 ignore_pgm 354 ignore_file 355 ignore_expr 356 eval_file 357 358 eval.c:eval_file 359 note_info 360 nftw 361 walker 362 note_info 363 364 anal.c:analyze 365 check_file 366 reconcile 367 368 anal.c:check_file 369 check_changes 370 queue_file 371 372 373 recon.c:reconcile 374 samedata 375 samestuff 376 do_copy 377 copy 378 do_like 379 update_info 380 do_like 381 do_remove 382 383 Helpful Hints 384 385 the "file" structure contains a bunch of flags. Many of them 386 just summarize what we know about the file (e.g. where it was 387 found). Others are more subtle and control the evaluation 388 process or the writing out of the baseline file. You can't 389 really understand the processing unless you understand what 390 these flags mean. 391 392 F_NEW added by a new rule 393 394 F_LISTED this name was generated by a rule 395 396 F_SPARSE this directory is an intermediate on 397 the way to a name generated by a rule 398 and should not be recursively walked. 399 400 F_EVALUATE this node was found in evaluation and 401 has up-to-date stat information 402 403 F_CONFLICT there is a conflict on this node so 404 baseline should remain unchanged 405 406 F_REMOVE this node should be purged from the baseline 407 408 F_STAT_ERROR it was impossible to stat this file 409 (and anything below it) 410 411 the implications of these flags on processing are 412 413 F_NEW, F_LISTED, F_SPARSE 414 415 affect whether or not a particular node should 416 be included in the evaluation pass. 417 418 in some situations, only new rules are interpreted. 419 420 listed files and directories should be evaluated 421 and analyzed. sparse directories should not be 422 recursively enumerated. 423 424 F_EVALUATE 425 426 determines whether or not a node is included 427 in the analysis pass. Only nodes that have 428 been evaluated will be analyzed. 429 430 F_CONFLICT, F_REMOVE, F_EVALUATE 431 432 affect how a node should be written back into the baseline file. 433 434 if there is a conflict or we haven't evaluated 435 a node, we won't update the baseline. 436 437 if a node is marked for removal, it will be 438 excluded from the baseline when it is written out. 439 440 F_STAT_ERROR 441 442 if we could not get proper status information 443 about a file (or the tree under it) we cannot, 444 with any confidence, determine what its state 445 is or do anything about it. Such files are 446 flagged as "in conflict". 447 448 it is somewhat kinky that we put error flagged 449 files on the reconciliation list. We do this 450 because this is the easiest way to pull them 451 out for reporting as conflicts. 452 453