Print this page
Update from fsd_sep3 webrev to fsd_sep9

@@ -33,80 +33,99 @@
  * 1. Abstract.
  * The main goal of the filesystem hook framework is to provide an easy way to
  * inject client-defined behaviour into vfs/vnode calls. fsh works on
  * vfs_t granularity.
  *
+ * Note: In this document, both an fsh_t structure and hooking function for a
+ * vnodeop/vfsop is referred to as *hook*.
  *
+ *
  * 2. Overview.
  * fsh_t is the main object in the fsh. An fsh_t is a structure containing:
- *      - pointers to hooking functions (named after corresponding
- *      vnodeops/vfsops)
- *      - a pointer to an argument to pass (this is shared for all the
- *      hooks in a given fsh_t)
- *      - a pointer to the *hook remove callback* - it's being fired after a
- *      hook is removed and the hook has stopped executing. It's safe to destroy
- *      any data associated with this hook.
+ *      - pointers to hooking functions
+ *      - an argument to pass (this is shared for all the hooks in a given
+ *      fsh_t)
+ *      - a pointer to the *hook remove callback*
  *
  * The information from fsh_t is copied by the fsh and an fsh_handle_t
  * is returned. It should be used for further removing.
  *
  *
  * 3. Usage.
- * It is expected that vfs_t/vnode_t that are passed to fsh_foo() functions
- * are held by the caller when needed. fsh does no vfs_t/vnode_t locking.
+ * It is expected that vfs_t/vnode_t passed to fsh_foo() functions are held by
+ * the caller when needed. fsh does no vfs_t/vnode_t locking.
  *
- * fsh_t is a structure filled out by the client. If a client does not want
- * to add/remove a hook for function foo(), he should fill the foo field of
- * fsh_t with NULL. Every hook has a type of corresponding vfsop/vnodeop with
- * two additional arguments:
- *      - fsh_int_t *fsh_int - this argument MUST be passed to
- *      hook_next_foo(). fsh wouldn't know which hook to execute next
- *      without it
- *      - void *arg - this is the argument passed with fsh_t during
- *      installation
- *      - void (*remove_cb)(void *, fsh_handle_t) - hook remove callback
- *      (mentioned earlier); it's first argument is arg, the second is the
- *      handle
+ * fsh_t is a structure filled out by the client. It contains:
+ *      - pointers to hooking functions
+ *      - the argument passed to the hooks
+ *      - the *hook remove callback*
  *
+ * If a client does not want to add a hook for function foo(), he should fill
+ * corresponding fields with NULLs. For every vfsop/vnodeop there are two
+ * fields: pre_foo() and post_foo(). These are the functions called before and
+ * after the next hook or underlying vfsop/vnodeop.
+ *
+ * Pre hooks take:
+ *      - arg
+ *      - pointer to a field containing void* - it should be filled whenever
+ *      the client wants to have some data shared by the pre and post hooks in
+ *      the same syscall execution. This is called the *instance data*.
+ *      - pointers to the arguments passed to the underlying vfsop/vnodeop
+ * Pre hooks return void.
+ *
+ * Post hooks take:
+ *      - value returned by the previous post hook or underlying vfsop/vnodeop
+ *      - arg
+ *      - pointer to the *instance data*
+ *      - arguments passed to the underlying vfsop/vnodeop
+ * Post hooks return an int, which should be treated as the vfsop/vnodeop
+ * return value.
+ * Memory allocated by pre hook must be deallocated by the post hook.
+ *
+ * Execution path of hooks A, B, C is as follows:
+ * foo()
+ *      preA(argA, &instancepA, ...);
+ *      preB(argB, &instancepB, ...);
+ *      preC(argC, &instancepC, ...);
+ *      ret = VOP_FOO();
+ *      ret = postC(ret, argC, instancepC, ...);
+ *      ret = postB(ret, argB, instancepB, ...);
+ *      ret = postC(ret, argA, instancepA, ...);
+ *      return (ret);
+ *
  * After installation, an fsh_handle_t is returned to the caller.
  *
- * Every hook function is responsible for passing the control to the next
- * hook associated with a particular call. In order to provide an easy way to
- * modify the behaviour of a function call both before and after the
- * underlying vfsop/vnodeop (or next hook) execution, a hook has to call
- * fsh_next_foo() at some point. This function does necessary internal
- * operations and calls the next hook, until there's no hook left, then it
- * calls the underlying vfsop/vnodeop.
- * Example:
- * my_freefs(fsh_int_t *fsh_int, void *arg, vfs_t *vfsp) {
- *      cmn_err(CE_NOTE, "freefs called!\n");
- *      return (fsh_next_freefs(fsh_int, vfsp));
- * }
+ * Hook remove callback - it's a function being fired after a hook is removed
+ * and no thread is going to execute it anymore. It's safe to destroy all the
+ * data associated with this hook inside it.
  *
+ * It is guaranteed, that whenever a pre_hook() is called, there will be also
+ * post_hook() called within the same syscall.
  *
- * A client might want to fire callbacks when vfs_t's are being mounted
+ * If a hook (HNew) is installed/removed on/from a vfs_t within execution of
+ * another hook (HExec) installed on this vfs_t, the syscall that executes
+ * HExec won't fire HNew.
+ *
+ * A client might want to fire callbacks when vfs_ts are being mounted
  * or freed. There's an fsh_callback_t structure provided to install such
  * callbacks along with the API.
  * It is legal to call fsh_hook_{install,remove}() inside a mount callback
  * WITHOUT holding the vfs_t.
  *
  * After vfs_t's free callback returns, all the handles associated with the
  * hooks installed on this vfs_t are invalid and must not be used.
  *
- *
  * 4. API
  * None of the APIs should be called during interrupt context above lock
- * level. The only exceptions are fsh_next_foo() functions, which do not use
- * locks.
+ * level.
  *
  * a) fsh.h
- * Any of these functions could be called inside a hook or a hook remove
- * callback.
- * fsh_callback_{install,remove}() must not be called inside a {mount,free}
- * callback. Doing so will cause a deadlock. Other functions can be called
- * inside {mount,free} callbacks.
+ * Any of these functions could be called in a hook or a hook remove callback.
+ * The only functions that must not be called inside a {mount,free} callback are
+ * fsd_callback_{install,remove}. Using them will cause a deadlock.
  *
+ *
  * fsh_fs_enable(vfs_t *vfsp)
  * fsh_fs_disable(vfs_t *vfsp)
  *      Enables/disables fsh for a given vfs_t.
  *
  * fsh_hook_install(vfs_t *vfsp, fsh_t *hooks)

@@ -126,24 +145,20 @@
  *      fired, it is guaranteed that the hook won't be executed anymore. It is
  *      safe to remove all the internal data associated with this hook inside
  *      the hook remove callback. The hook remove callback could be called
  *      inside fsh_hook_remove().
  *
- * fsh_next_foo(fsh_int_t *fsh_int, void *arg, ARGUMENTS)
- *      This is the function which should be called once in every hook. It
- *      does the necessary internal operations and passes control to the
- *      next hook or, if there's no hook left, to the underlying
- *      vfsop/vnodeop.
  *
  * fsh_callback_install(fsh_callback_t *callback)
  * fsh_callback_remove(fsh_callback_handle_t handle)
  *      Installs/removes callbacks for vfs_t mount/free. The mount callback
  *      is executed right before domount() returns. The free callback is
  *      called right before VFS_FREEVFS() is called.
  *      The fsh_callback_install() returns a correct handle, or (-1) if
  *      hook/callback limit exceeded.
  *
+ *
  * b) fsh_impl.h (for vfs.c and vnode.c only)
  * fsh_init()
  *      This call has to be done in vfsinit(). It initialises the fsh. It
  *      is absolutely necessary that this call is made before any other fsh
  *      operation.

@@ -155,11 +170,11 @@
  * fsh_fsrec_destroy(struct fsh_fsrecord *fsrecp)
  *      Destroys an fsh_fsrecord structure. All the hooks installed on this
  *      vfs_t are then destroyed. free callback is called before this function.
  *
  * fsh_foo(ARGUMENTS)
- *      Function used to start executing the hook chain for a given call.
+ *      Function used to execute the hook chain for a given syscall.
  *
  *
  * 5. Internals.
  * fsh_int_t is an internal hook structure. It is reference counted.
  * fshi_hold() and fshi_rele() should be used whenever needed.

@@ -187,118 +202,143 @@
  *      - an rw-lock that protects the structure
  *      - a list of hooks installed on this vfs_t
  *      - a flag which tells whether fsh is enabled on this vfs_t
  *
  *
- * fsh_prepare_fsrec rule:
+ * fsh_fsrec_prepare rule:
  * Every function that needs vfsp->vfs_fshrecord has to call
- * fsh_prepare_fsrec() first. If and only if the call is made, it is safe to
+ * fsh_fsrec_prepare() first. If and only if the call is made, it is safe to
  * use vfsp->vfs_fshrecord.
  *
  * Unfortunately, because of unexpected behaviour of some filesystems (no use
  * of vfs_alloc()/vfs_init()) there's no good place to initialise the
  * fsh_fshrecord_t structure. The approach being used here is to check if it's
  * initialised in every call. Because of the fact that no lock could be used
  * here (the same problem with initialisation), a spinlock is used.  This is
- * explained in more detail in a comment before fsh_prepare_fsrec(). After
+ * explained in more detail in a comment before fsh_fsrec_prepare(). After
  * calling fsh_preapre_fsrec() it's completely safe to keep the vfs_fshrecord
  * pointer locally, because it won't be changed until vfs_free() is called.
  *
- * The only exception from the fsh_prepare_fsrec() rule is vfs_free(),
- * where there is expected that no other fsh calls would be made for the
+ * Exceptions from this rule:
+ * - vfs_free() - it is expected that no other fsh calls would be made for the
  * vfs_t that's being freed. That's why vfs_fshrecord could be only NULL or a
  * valid pointer and could not be concurrently accessed.
+ * - fshi_rele() - fsh_hook_install() comes before first fshi_rele() call;
+ * the fsh_fsrecord_t has been initialised there
  *
+ *
  * When there are no fsh functions (that use a particular fsh_fsrecord_t)
  * executing, the vfs_fshrecord pointer won't be equal to fsh_res_ptr. It
  * would be NULL or a pointer to an initialised fsh_fsrecord_t.
  *
+ * It is required and sufficient to check if fsh_fsrecord_t is not NULL before
+ * passing it to fsh_fsrec_destroy. We don't have to check if it is not equal
+ * to fsh_res_ptr, because all the fsh API calls involving this vfs_t should
+ * end before vfs_free() is called (outside the fsh, fsh_fsrecord is never
+ * equal to fsh_res_ptr). That is guaranteed by the explicit requirement that
+ * the caller of fsh API holds the vfs_t when needed. fsh_hook_remove() must not
+ * be called either, because the handles are invalidated after free callback has
+ * fired.
  *
+ *
  * Callbacks:
  * Mount callbacks are executed by a call to fsh_exec_mount_callbacks() right
  * before returning from domount()@vfs.c.
  *
  * Free callbacks are executed by a call to fsh_exec_free_callbacks() right
  * before calling VFS_FREEVFS(), after vfs_t's reference count drops to 0.
  *
  *
- * fsh_next_foo(fsh_int_t *fshi, ARGUMENTS)
- *      This function is quite simple. It takes the fsh_int_t and passes control
- *      to the next hook or to the underlying vnodeop/vfsop.
- *
- *
  * 6. Locking
  * a) public
  * fsh does no vfs_t nor vnode_t locking. It is expected that whenever it is
  * needed, the client does that.
  *
- * fsh_callback_{install,remove} must not be called inside a callback, because
- * it will cause a deadlock.
+ * No locks are held across hooks or hook remove callbacks execution. It is
+ * safe to use fsh API inside hooks and hook remove callbacks.
  *
- * b) internal
+ * fsh_cb_lock is held across {mount,free} callbacks. Calling
+ * fsh_callback_{install,remove} inside of a callback will cause a deadlock.
+ *
+ * b) internals
  * Locking diagram:
  *
- *     fsh_hook_install()    fsh_hook_remove()   fsh_fsrec_destroy()
+ *     fsh_hook_remove()          fsh_hook_install()   fsh_fsrec_destroy()
  *           |                     |                |
  *           |                     |                |
  *           +------------------+  |   +------------+
+ *           |                  |         |   |
+ *           |                  V         |   |
+ *           V               +------------|---|-+
+ *      fshi_rele()          |  fsh_lock  |   | |
+ *      (sometimes)          +------------|---|-+
  *                              |  |   |
- *                              V  V   V
- *                              fsh_lock
+ *                                 |      +---+-- fshfsr_lock, RW_WRITER -+
  *                                 |   |
- *                                 |   +----- fshfsr_lock, RW_WRITER ---+
- *                                 |                                    |
  *                                 V                                    |
  *               +---------------------------------------+              |
  *               |               fsh_map                 |              |
  *               |                                       |              |
- *          +----|-> vfsp->vfs_fshrecord->fshfsr_list <--|--------------+
+ *          +----|-> vfsp->vfs_fshrecord->fshfsr_list <--|----------------+
  *          |    +------------------------------^--------+
  *          |                                   |
  *          |                                   |
  * fshfsr_lock, RW_READER              fshfsr_lock, RW_WRITER
  *          |                                   |
  *          |                                   |
  *   fsh_read(),                            fshi_rele()
  *   fsh_write(),
- *   ...,                               Might be called from:
- *   fsh_next_read(),                    fsh_hook_remove()
- *   fsh_next_write(),                   fsh_read(), fsh_write(), ...
- *   ...                                 fsh_next_read(), fsh_next_write(), ...
+ *   ...                                Might be called from:
+ *                                        fsh_hook_remove()
+ *                                        fsh_read(), fsh_write(), ...
  *
+ *
  * fsh_lock is a global lock for adminsitrative path (fsh_hook_install,
  * fsh_hook_remove) and fsh_fsrec_destroy() (which is semi-administrative, since
  * it destroys the unremoved hooks). It is used only when fsh_map needs to be
  * locked. The usage of this lock guarantees that the data in fsh_map and
  * fshfsr_lists is consistent.
+ *
+ * In order to make calling callbacks inside callbacks possible, fsh_cb_owner is
+ * set by fsh_exec_{mount,free} callbacks to the thread that owns the
+ * fsh_cb_lock.  It's always checked if we are owners of the mutex before
+ * entering it.
+ *
  */
 
 
 /* Internals */
-struct fsh_int {
+typedef struct fsh_int {
         fsh_handle_t    fshi_handle;
         fsh_t           fshi_hooks;
         vfs_t           *fshi_vfsp;
 
         kmutex_t        fshi_lock;
         uint64_t        fshi_ref;
         uint64_t        fshi_doomed;    /* changed inside fsh_lock */
 
         /* next node in fshfsr_list */
-        list_node_t     fshi_next;
+        list_node_t     fshi_node;
 
         /* next node in fsh_map */
         list_node_t     fshi_global;
-};
+} fsh_int_t;
 
 typedef struct fsh_callback_int {
         fsh_callback_t  fshci_cb;
         fsh_callback_handle_t fshci_handle;
-        list_node_t     fshci_next;
+        list_node_t     fshci_node;
 } fsh_callback_int_t;
 
 
+typedef struct fsh_exec {
+        fsh_int_t       *fshe_fshi;
+        void            *fshe_instance;
+        list_node_t     fshe_node;
+} fsh_exec_t;
+
+
 static kmutex_t fsh_lock;
 
 /*
  * fsh_fsrecord_t is the main internal structure. It's content is protected
  * by fshfsr_lock. The fshfsr_list is a list of fsh_int_t hook entries for

@@ -316,11 +356,13 @@
 static list_t fsh_map;
 
 /*
  * Global list of fsh_callback_int_t.
  */
-static krwlock_t fsh_cblist_lock;
+static kmutex_t fsh_cb_lock;
+static kmutex_t fsh_cb_owner_lock;
+static kthread_t *fsh_cb_owner;
 static list_t fsh_cblist;
 
 /*
  * A reserved pointer for fsh purposes. It is used because of the method
  * chosen for solving concurrency issues with vfs_fshrecord. The full

@@ -333,11 +375,11 @@
 
 int fsh_limit = INT_MAX;
 static id_space_t *fsh_idspace;
 
 /*
- * fsh_prepare_fsrec()
+ * fsh_fsrec_prepare()
  *
  * Important note:
  * Before using this function, fsh_init() MUST be called. We do that in
  * vfsinit()@vfs.c.
  *

@@ -364,11 +406,11 @@
  *      vfs_fshrecord is already initialised, so we can use it. It won't change
  *      until vfs_free() is called. It can't happen when someone is holding
  *      the vfs_t, which is expected from the caller of fsh API.
  */
 static void
-fsh_prepare_fsrec(vfs_t *vfsp)
+fsh_fsrec_prepare(vfs_t *vfsp)
 {
         fsh_fsrecord_t *fsrec;
 
         while ((fsrec = atomic_cas_ptr(&vfsp->vfs_fshrecord, NULL,
             fsh_res_ptr)) == fsh_res_ptr)

@@ -389,21 +431,21 @@
  * These functions must NOT be called in a hook.
  */
 void
 fsh_fs_enable(vfs_t *vfsp)
 {
-        fsh_prepare_fsrec(vfsp);
+        fsh_fsrec_prepare(vfsp);
 
         rw_enter(&vfsp->vfs_fshrecord->fshfsr_lock, RW_WRITER);
         vfsp->vfs_fshrecord->fshfsr_enabled = 1;
         rw_exit(&vfsp->vfs_fshrecord->fshfsr_lock);
 }
 
 void
 fsh_fs_disable(vfs_t *vfsp)
 {
-        fsh_prepare_fsrec(vfsp);
+        fsh_fsrec_prepare(vfsp);
 
         rw_enter(&vfsp->vfs_fshrecord->fshfsr_lock, RW_WRITER);
         vfsp->vfs_fshrecord->fshfsr_enabled = 0;
         rw_exit(&vfsp->vfs_fshrecord->fshfsr_lock);
 }

@@ -410,18 +452,10 @@
 
 /*
  * API used for installing hooks. fsh_handle_t is returned for further
  * actions (currently just removing) on this set of hooks.
  *
- * fsh_t fields:
- * - arg - argument passed to every hook
- * - remove_cb - remove callback, called after a hook is removed and all the
- *      threads stops executing it
- * - read, write, ... - pointers to hooks for corresponding vnodeops/vfsops;
- *      if there is no hook desired for an operation, it should be set to
- *      NULL
- *
  * It's important that the hooks are executed in LIFO installation order (they
  * are added to the head of the hook list).
  *
  * The caller is expected to hold the vfs_t.
  *

@@ -431,11 +465,11 @@
 fsh_hook_install(vfs_t *vfsp, fsh_t *hooks)
 {
         fsh_handle_t    handle;
         fsh_int_t       *fshi;
 
-        fsh_prepare_fsrec(vfsp);
+        fsh_fsrec_prepare(vfsp);
 
         if ((handle = id_alloc(fsh_idspace)) == -1)
                 return (-1);
 
         fshi = kmem_alloc(sizeof (*fshi), KM_SLEEP);

@@ -497,18 +531,16 @@
         if (destroy) {
                 /*
                  * At this point, we are sure that fsh_hook_remove() has been
                  * called, that's why we don't remove the fshi from fsh_map.
                  * fsh_hook_remove() did that already.
+                 * There is also no need to call fsh_fsrec_prepare() here.
                  */
                 fsh_fsrecord_t *fsrecp;
 
-                if (fshi->fshi_hooks.remove_cb != NULL)
-                        (*fshi->fshi_hooks.remove_cb)(
-                            fshi->fshi_hooks.arg, fshi->fshi_handle);
                 /*
-                 * We don't have to call fsh_prepare_fsrec() here.
+                 * We don't have to call fsh_fsrec_prepare() here.
                  * fsh_fsrecord_t is already initialised, because we've found a
                  * mapping for the given handle.
                  */
                 fsrecp = fshi->fshi_vfsp->vfs_fshrecord;
                 ASSERT(fsrecp != NULL);

@@ -516,10 +548,14 @@
 
                 rw_enter(&fsrecp->fshfsr_lock, RW_WRITER);
                 list_remove(&fsrecp->fshfsr_list, fshi);
                 rw_exit(&fsrecp->fshfsr_lock);
 
+                if (fshi->fshi_hooks.remove_cb != NULL)
+                        (*fshi->fshi_hooks.remove_cb)(
+                            fshi->fshi_hooks.arg, fshi->fshi_handle);
+
                 id_free(fsh_idspace, fshi->fshi_handle);
                 mutex_destroy(&fshi->fshi_lock);
                 kmem_free(fshi, sizeof (*fshi));
         }
 }

@@ -583,14 +619,13 @@
  * The first argument of these callbacks is the vfs_t that is mounted/freed.
  * The second one is the fshc_arg.
  *
  * fsh_callback_handle_t is filled out by this function.
  *
- * This function must NOT be called in a callback, because it will cause
- * a deadlock.
- *
  * Returns (-1) if hook/callback limit exceeded.
+ *
+ * Calling this function in a {mount,free} callback will cause a deadlock.
  */
 fsh_callback_handle_t
 fsh_callback_install(fsh_callback_t *callback)
 {
         fsh_callback_int_t *fshci;

@@ -601,42 +636,41 @@
 
         fshci = (fsh_callback_int_t *)kmem_alloc(sizeof (*fshci), KM_SLEEP);
         (void) memcpy(&fshci->fshci_cb, callback, sizeof (fshci->fshci_cb));
         fshci->fshci_handle = handle;
 
-        /* If it is called in a {mount,free} callback, causes deadlock. */
-        rw_enter(&fsh_cblist_lock, RW_WRITER);
+        mutex_enter(&fsh_cb_lock);
         list_insert_head(&fsh_cblist, fshci);
-        rw_exit(&fsh_cblist_lock);
+        mutex_exit(&fsh_cb_lock);
 
         return (handle);
 }
 
 /*
  * API for removing global mount/free callbacks.
  *
- * This function must NOT be called in a callback, because it will cause
- * a deadlock.
- *
  * Returns (-1) if callback wasn't found, 0 otherwise.
+ *
+ * Calling this function in a {mount,free} callback will cause a deadlock.
  */
 int
 fsh_callback_remove(fsh_callback_handle_t handle)
 {
         fsh_callback_int_t *fshci;
 
-        /* If it is called in a {mount,free} callback, causes deadlock. */
-        rw_enter(&fsh_cblist_lock, RW_WRITER);
+        mutex_enter(&fsh_cb_lock);
+
         for (fshci = list_head(&fsh_cblist); fshci != NULL;
             fshci = list_next(&fsh_cblist, fshci)) {
                 if (fshci->fshci_handle == handle) {
                         list_remove(&fsh_cblist, fshci);
                         break;
                 }
         }
-        rw_exit(&fsh_cblist_lock);
 
+        mutex_exit(&fsh_cb_lock);
+
         if (fshci == NULL)
                 return (-1);
 
         kmem_free(fshci, sizeof (*fshci));
         id_free(fsh_idspace, handle);

@@ -657,19 +691,38 @@
 void
 fsh_exec_mount_callbacks(vfs_t *vfsp)
 {
         fsh_callback_int_t *fshci;
         fsh_callback_t *cb;
+        int fsh_context;
 
-        rw_enter(&fsh_cblist_lock, RW_READER);
+        mutex_enter(&fsh_cb_owner_lock);
+        fsh_context = fsh_cb_owner == curthread;
+        mutex_exit(&fsh_cb_owner_lock);
+
+        if (!fsh_context) {
+                mutex_enter(&fsh_cb_lock);
+                mutex_enter(&fsh_cb_owner_lock);
+                fsh_cb_owner = curthread;
+                mutex_exit(&fsh_cb_owner_lock);
+        }
+
+        ASSERT(MUTEX_HELD(&fsh_cb_lock));
+
         for (fshci = list_head(&fsh_cblist); fshci != NULL;
             fshci = list_next(&fsh_cblist, fshci)) {
                 cb = &fshci->fshci_cb;
                 if (cb->fshc_mount != NULL)
                         (*(cb->fshc_mount))(vfsp, cb->fshc_arg);
         }
-        rw_exit(&fsh_cblist_lock);
+
+        if (!fsh_context) {
+                mutex_enter(&fsh_cb_owner_lock);
+                fsh_cb_owner = NULL;
+                mutex_exit(&fsh_cb_owner_lock);
+                mutex_exit(&fsh_cb_lock);
+        }
 }
 
 /*
  * This function is executed right before VFS_FREEVFS() is called in
  * vfs_rele()@vfs.c. We are sure that it's called only after fsh_init().

@@ -681,28 +734,47 @@
 void
 fsh_exec_free_callbacks(vfs_t *vfsp)
 {
         fsh_callback_int_t *fshci;
         fsh_callback_t *cb;
+        int fsh_context;
 
-        rw_enter(&fsh_cblist_lock, RW_READER);
+        mutex_enter(&fsh_cb_owner_lock);
+        fsh_context = fsh_cb_owner == curthread;
+        mutex_exit(&fsh_cb_owner_lock);
+
+        if (!fsh_context) {
+                mutex_enter(&fsh_cb_lock);
+                mutex_enter(&fsh_cb_owner_lock);
+                fsh_cb_owner = curthread;
+                mutex_exit(&fsh_cb_owner_lock);
+        }
+
+        ASSERT(MUTEX_HELD(&fsh_cb_lock));
+
         for (fshci = list_head(&fsh_cblist); fshci != NULL;
             fshci = list_next(&fsh_cblist, fshci)) {
                 cb = &fshci->fshci_cb;
                 if (cb->fshc_free != NULL)
                         (*(cb->fshc_free))(vfsp, cb->fshc_arg);
         }
-        rw_exit(&fsh_cblist_lock);
+
+        if (!fsh_context) {
+                mutex_enter(&fsh_cb_owner_lock);
+                fsh_cb_owner = NULL;
+                mutex_exit(&fsh_cb_owner_lock);
+                mutex_exit(&fsh_cb_lock);
+        }
 }
 
 /*
  * API for vnode.c/vfs.c to start executing the fsh for a given operation.
  *
  * fsh_xxx() tries to find the first non-NULL xxx hook on the fshfsr_list. If it
  * does, it executes it. If not, underlying vnodeop/vfsop is called.
  *
- * These interfaces are using fsh_res_ptr (in fsh_prepare_fsrec()), so it's
+ * These interfaces are using fsh_res_ptr (in fsh_fsrec_prepare()), so it's
  * absolutely necessary to call fsh_init() before using them. That's done in
  * vfsinit().
  *
  * While these functions are executing, it's expected that necessary vfs_t's
  * are held so that vfs_free() isn't called. vfs_free() expects that noone

@@ -717,139 +789,251 @@
         caller_context_t *ct)
 {
         int ret;
         fsh_fsrecord_t *fsrecp;
         fsh_int_t *fshi;
+        fsh_exec_t *fshe;
+        list_t exec_list;
 
-        fsh_prepare_fsrec(vp->v_vfsp);
+        fsh_fsrec_prepare(vp->v_vfsp);
         fsrecp = vp->v_vfsp->vfs_fshrecord;
 
         rw_enter(&fsrecp->fshfsr_lock, RW_READER);
         if (!(fsrecp->fshfsr_enabled)) {
                 rw_exit(&fsrecp->fshfsr_lock);
-                return ((*(vp->v_op->vop_read))(vp, uiop, ioflag, cr, ct));
+                return ((*vp->v_op->vop_read)(vp, uiop, ioflag, cr, ct));
         }
 
+        list_create(&exec_list, sizeof (fsh_exec_t),
+            offsetof(fsh_exec_t, fshe_node));
+
         for (fshi = list_head(&fsrecp->fshfsr_list); fshi != NULL;
             fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.read != NULL)
-                        if (fshi_hold(fshi))
-                                break;
+                if (fshi->fshi_hooks.pre_read != NULL ||
+                    fshi->fshi_hooks.post_read != NULL) {
+                        if (fshi_hold(fshi)) {
+                                fshe = kmem_alloc(sizeof (*fshe), KM_SLEEP);
+                                fshe->fshe_fshi = fshi;
+                                list_insert_tail(&exec_list, fshe);
         }
+                }
+        }
         rw_exit(&fsrecp->fshfsr_lock);
 
-        if (fshi == NULL)
-                return ((*(vp->v_op->vop_read))(vp, uiop, ioflag, cr, ct));
+        /* Execute pre hooks */
+        for (fshe = list_head(&exec_list); fshe != NULL;
+            fshe = list_next(&exec_list, fshe)) {
+                if (fshe->fshe_fshi->fshi_hooks.pre_read != NULL)
+                        (*fshe->fshe_fshi->fshi_hooks.pre_read)(
+                            fshe->fshe_fshi->fshi_hooks.arg,
+                            &fshe->fshe_instance,
+                            &vp, &uiop, &ioflag, &cr, &ct);
+        }
 
-        ret = (*fshi->fshi_hooks.read)(fshi, fshi->fshi_hooks.arg,
+        ret = (*vp->v_op->vop_read)(vp, uiop, ioflag, cr, ct);
+
+        /* Execute post hooks */
+        while ((fshe = list_remove_tail(&exec_list)) != NULL) {
+                if (fshe->fshe_fshi->fshi_hooks.post_read != NULL)
+                        ret = (*fshe->fshe_fshi->fshi_hooks.post_read)(
+                            ret, fshe->fshe_fshi->fshi_hooks.arg,
+                            fshe->fshe_instance,
             vp, uiop, ioflag, cr, ct);
-        fshi_rele(fshi);
+                fshi_rele(fshe->fshe_fshi);
+                kmem_free(fshe, sizeof (*fshe));
+        }
+        list_destroy(&exec_list);
+
         return (ret);
 }
 
 int
 fsh_write(vnode_t *vp, uio_t *uiop, int ioflag, cred_t *cr,
         caller_context_t *ct)
 {
-        fsh_int_t *fshi;
         int ret;
         fsh_fsrecord_t *fsrecp;
+        fsh_int_t *fshi;
+        fsh_exec_t *fshe;
+        list_t exec_list;
 
-        fsh_prepare_fsrec(vp->v_vfsp);
+        fsh_fsrec_prepare(vp->v_vfsp);
         fsrecp = vp->v_vfsp->vfs_fshrecord;
 
         rw_enter(&fsrecp->fshfsr_lock, RW_READER);
-        if (!(vp->v_vfsp->vfs_fshrecord->fshfsr_enabled)) {
+        if (!(fsrecp->fshfsr_enabled)) {
                 rw_exit(&fsrecp->fshfsr_lock);
-                return ((*(vp->v_op->vop_write))(vp, uiop, ioflag, cr, ct));
+                return ((*vp->v_op->vop_write)(vp, uiop, ioflag, cr, ct));
         }
 
+        list_create(&exec_list, sizeof (fsh_exec_t),
+            offsetof(fsh_exec_t, fshe_node));
+
         for (fshi = list_head(&fsrecp->fshfsr_list); fshi != NULL;
             fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.write != NULL)
-                        if (fshi_hold(fshi))
-                                break;
+                if (fshi->fshi_hooks.pre_write != NULL ||
+                    fshi->fshi_hooks.post_write != NULL) {
+                        if (fshi_hold(fshi)) {
+                                fshe = kmem_alloc(sizeof (*fshe), KM_SLEEP);
+                                fshe->fshe_fshi = fshi;
+                                list_insert_tail(&exec_list, fshe);
         }
+                }
+        }
         rw_exit(&fsrecp->fshfsr_lock);
 
-        if (fshi == NULL)
-                return ((*(vp->v_op->vop_write))(vp, uiop, ioflag, cr, ct));
+        /* Execute pre hooks */
+        for (fshe = list_head(&exec_list); fshe != NULL;
+            fshe = list_next(&exec_list, fshe)) {
+                if (fshe->fshe_fshi->fshi_hooks.pre_write != NULL)
+                        (*fshe->fshe_fshi->fshi_hooks.pre_write)(
+                            fshe->fshe_fshi->fshi_hooks.arg,
+                            &fshe->fshe_instance,
+                            &vp, &uiop, &ioflag, &cr, &ct);
+        }
 
-        ret = (*fshi->fshi_hooks.write)(fshi, fshi->fshi_hooks.arg,
+        ret = (*vp->v_op->vop_write)(vp, uiop, ioflag, cr, ct);
+
+        /* Execute post hooks */
+        while ((fshe = list_remove_tail(&exec_list)) != NULL) {
+                if (fshe->fshe_fshi->fshi_hooks.post_write != NULL)
+                        ret = (*fshe->fshe_fshi->fshi_hooks.post_write)(
+                            ret, fshe->fshe_fshi->fshi_hooks.arg,
+                            fshe->fshe_instance,
             vp, uiop, ioflag, cr, ct);
-        fshi_rele(fshi);
+                fshi_rele(fshe->fshe_fshi);
+                kmem_free(fshe, sizeof (*fshe));
+        }
+        list_destroy(&exec_list);
+
         return (ret);
 }
 
 int
 fsh_mount(vfs_t *vfsp, vnode_t *mvp, struct mounta *uap, cred_t *cr)
 {
+        int ret;
         fsh_fsrecord_t *fsrecp;
         fsh_int_t *fshi;
-        int ret;
+        fsh_exec_t *fshe;
+        list_t exec_list;
 
-        fsh_prepare_fsrec(vfsp);
+        fsh_fsrec_prepare(vfsp);
         fsrecp = vfsp->vfs_fshrecord;
 
         rw_enter(&fsrecp->fshfsr_lock, RW_READER);
         if (!(fsrecp->fshfsr_enabled)) {
                 rw_exit(&fsrecp->fshfsr_lock);
-                return ((*(vfsp->vfs_op->vfs_mount))(vfsp, mvp, uap, cr));
+                return ((*vfsp->vfs_op->vfs_mount)(vfsp, mvp, uap, cr));
         }
 
+        list_create(&exec_list, sizeof (fsh_exec_t),
+            offsetof(fsh_exec_t, fshe_node));
+
         for (fshi = list_head(&fsrecp->fshfsr_list); fshi != NULL;
             fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.mount != NULL)
-                        if (fshi_hold(fshi))
-                                break;
+                if (fshi->fshi_hooks.pre_mount != NULL ||
+                    fshi->fshi_hooks.post_mount != NULL) {
+                        if (fshi_hold(fshi)) {
+                                fshe = kmem_alloc(sizeof (*fshe), KM_SLEEP);
+                                fshe->fshe_fshi = fshi;
+                                list_insert_tail(&exec_list, fshe);
         }
+                }
+        }
         rw_exit(&fsrecp->fshfsr_lock);
 
-        if (fshi == NULL)
-                return ((*(vfsp->vfs_op->vfs_mount))(vfsp, mvp, uap, cr));
+        /* Execute pre hooks */
+        for (fshe = list_head(&exec_list); fshe != NULL;
+            fshe = list_next(&exec_list, fshe)) {
+                if (fshe->fshe_fshi->fshi_hooks.pre_mount != NULL)
+                        (*fshe->fshe_fshi->fshi_hooks.pre_mount)(
+                            &fshe->fshe_fshi->fshi_hooks.arg,
+                            &fshe->fshe_instance,
+                            &vfsp, &mvp, &uap, &cr);
+        }
 
-        ret = (*fshi->fshi_hooks.mount)(fshi, fshi->fshi_hooks.arg,
+        ret = (*vfsp->vfs_op->vfs_mount)(vfsp, mvp, uap, cr);
+
+        /* Execute post hooks */
+        while ((fshe = list_remove_tail(&exec_list)) != NULL) {
+                if (fshe->fshe_fshi->fshi_hooks.post_mount != NULL)
+                        ret = (*fshe->fshe_fshi->fshi_hooks.post_mount)(
+                            ret, fshe->fshe_fshi->fshi_hooks.arg,
+                            fshe->fshe_instance,
             vfsp, mvp, uap, cr);
-        fshi_rele(fshi);
+                fshi_rele(fshe->fshe_fshi);
+                kmem_free(fshe, sizeof (*fshe));
+        }
+        list_destroy(&exec_list);
+
         return (ret);
 }
 
 int
 fsh_unmount(vfs_t *vfsp, int flag, cred_t *cr)
 {
+        int ret;
         fsh_fsrecord_t *fsrecp;
         fsh_int_t *fshi;
-        int ret;
+        fsh_exec_t *fshe;
+        list_t exec_list;
 
-        fsh_prepare_fsrec(vfsp);
+        fsh_fsrec_prepare(vfsp);
         fsrecp = vfsp->vfs_fshrecord;
 
         rw_enter(&fsrecp->fshfsr_lock, RW_READER);
         if (!(fsrecp->fshfsr_enabled)) {
                 rw_exit(&fsrecp->fshfsr_lock);
-                return ((*(vfsp->vfs_op->vfs_unmount))(vfsp, flag, cr));
+                return ((*vfsp->vfs_op->vfs_unmount)(vfsp, flag, cr));
         }
 
+        list_create(&exec_list, sizeof (fsh_exec_t),
+            offsetof(fsh_exec_t, fshe_node));
+
         for (fshi = list_head(&fsrecp->fshfsr_list); fshi != NULL;
             fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.unmount != NULL)
-                        if (fshi_hold(fshi))
-                                break;
+                if (fshi->fshi_hooks.pre_unmount != NULL ||
+                    fshi->fshi_hooks.post_unmount != NULL) {
+                        if (fshi_hold(fshi)) {
+                                fshe = kmem_alloc(sizeof (*fshe), KM_SLEEP);
+                                fshe->fshe_fshi = fshi;
+                                list_insert_tail(&exec_list, fshe);
         }
+                }
+        }
         rw_exit(&fsrecp->fshfsr_lock);
 
-        if (fshi == NULL)
-                return ((*(vfsp->vfs_op->vfs_unmount))(vfsp, flag, cr));
+        /* Execute pre hooks */
+        for (fshe = list_head(&exec_list); fshe != NULL;
+            fshe = list_next(&exec_list, fshe)) {
+                if (fshe->fshe_fshi->fshi_hooks.pre_unmount != NULL)
+                        (*fshe->fshe_fshi->fshi_hooks.pre_unmount)(
+                            fshe->fshe_fshi->fshi_hooks.arg,
+                            &fshe->fshe_instance,
+                            &vfsp, &flag, &cr);
+        }
 
-        ret = (*fshi->fshi_hooks.unmount)(fshi, fshi->fshi_hooks.arg,
+        ret = (*vfsp->vfs_op->vfs_unmount)(vfsp, flag, cr);
+
+        /* Execute post hooks */
+        while ((fshe = list_remove_tail(&exec_list)) != NULL) {
+                if (fshe->fshe_fshi->fshi_hooks.post_unmount != NULL)
+                        ret = (*fshe->fshe_fshi->fshi_hooks.post_unmount)(
+                            ret, fshe->fshe_fshi->fshi_hooks.arg,
+                            fshe->fshe_instance,
             vfsp, flag, cr);
-        fshi_rele(fshi);
+                fshi_rele(fshe->fshe_fshi);
+                kmem_free(fshe, sizeof (*fshe));
+        }
+        list_destroy(&exec_list);
+
         return (ret);
 }
 
 /*
- * This is the funtion used by fsh_prepare_fsrec() to allocate a new
+ * This is the funtion used by fsh_fsrec_prepare() to allocate a new
  * fsh_fsrecord. This function is called by the first function which
  * access the vfs_fshrecord and finds out it's NULL.
  */
 static fsh_fsrecord_t *
 fsh_fsrec_create()

@@ -856,32 +1040,24 @@
 {
         fsh_fsrecord_t *fsrecp;
 
         fsrecp = (fsh_fsrecord_t *)kmem_zalloc(sizeof (*fsrecp), KM_SLEEP);
         list_create(&fsrecp->fshfsr_list, sizeof (fsh_int_t),
-            offsetof(fsh_int_t, fshi_next));
+            offsetof(fsh_int_t, fshi_node));
         rw_init(&fsrecp->fshfsr_lock, NULL, RW_DRIVER, NULL);
         fsrecp->fshfsr_enabled = 1;
         return (fsrecp);
 }
 
 
 /*
- * This call can be used ONLY in vfs_free(). It's assumed that no other
- * fsh calls using the vfs_t that owns the fsh_fsrecord to be destroyed
- * are executing while a call to fsh_fsrec_destroy() is made. With this
- * assumptions, no concurrency issues occur.
+ * This call must be used ONLY in vfs_free().
  *
- * Before calling this function outside the fsh, it's sufficient and
- * required to check if the passed fsh_fsrecord * is not NULL. We don't
- * have to check if it is not equal to fsh_res_ptr, because all the fsh API
- * calls involving this vfs_t should end before vfs_free() is called
- * (outside the fsh, fsh_fsrecord is never equal to fsh_res_ptr). That is
- * guaranteed by the explicit requirement that the caller of fsh API holds
- * the vfs_t when needed.
+ * It is required and sufficient to check if fsh_fsrecord_t is not NULL before
+ * passing it to fsh_fsrec_destroy.
  *
- * All the remaining hooks are being removed.
+ * All the remaining hooks are being removed here.
  */
 void
 fsh_fsrec_destroy(struct fsh_fsrecord *volatile fsrecp)
 {
         fsh_int_t *fshi;

@@ -889,12 +1065,13 @@
         VERIFY(fsrecp != NULL);
 
         _NOTE(CONSTCOND)
         while (1) {
                 mutex_enter(&fsh_lock);
-                /* No need here to hold fshfsr_lock */
+                rw_enter(&fsrecp->fshfsr_lock, RW_WRITER);
                 fshi = list_remove_head(&fsrecp->fshfsr_list);
+                rw_exit(&fsrecp->fshfsr_lock);
                 if (fshi == NULL) {
                         mutex_exit(&fsh_lock);
                         break;
                 }
                 ASSERT(fshi->fshi_doomed == 0);

@@ -902,10 +1079,11 @@
                 mutex_exit(&fsh_lock);
 
                 if (fshi->fshi_hooks.remove_cb != NULL)
                         (*fshi->fshi_hooks.remove_cb)(fshi->fshi_hooks.arg,
                             fshi->fshi_handle);
+
                 id_free(fsh_idspace, fshi->fshi_handle);
                 mutex_destroy(&fshi->fshi_lock);
                 kmem_free(fshi, sizeof (*fshi));
 
         }

@@ -920,138 +1098,20 @@
  * before every other fsh call.
  */
 void
 fsh_init(void)
 {
-        rw_init(&fsh_cblist_lock, NULL, RW_DRIVER, NULL);
+        mutex_init(&fsh_cb_lock, NULL, MUTEX_DRIVER, NULL);
+        mutex_init(&fsh_cb_owner_lock, NULL, MUTEX_DRIVER, NULL);
         list_create(&fsh_cblist, sizeof (fsh_callback_int_t),
-            offsetof(fsh_callback_int_t, fshci_next));
+            offsetof(fsh_callback_int_t, fshci_node));
 
         mutex_init(&fsh_lock, NULL, MUTEX_DRIVER, NULL);
 
         list_create(&fsh_map, sizeof (fsh_int_t), offsetof(fsh_int_t,
             fshi_global));
 
-        /* See comment above fsh_prepare_fsrec() */
+        /* See comment above fsh_fsrec_prepare() */
         fsh_res_ptr = (void *)-1;
 
         fsh_idspace = id_space_create("fsh", 0, fsh_limit);
-}
-
-/*
- * These functions are used to pass control to the next hook or underlying
- * vop or vfsop. It's client doesn't have to worry about any locking.
- */
-int
-fsh_next_read(fsh_int_t *fshi, vnode_t *vp, uio_t *uiop, int ioflag,
-        cred_t *cr, caller_context_t *ct)
-{
-        int ret;
-        fsh_fsrecord_t *fsrecp = vp->v_vfsp->vfs_fshrecord;
-
-        /*
-         * The passed fshi is the previous hook (the one from which we've been
-         * called). We need to find the next one.
-         */
-        rw_enter(&fsrecp->fshfsr_lock, RW_READER);
-        for (fshi = list_next(&fsrecp->fshfsr_list, fshi); fshi != NULL;
-            fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.read != NULL)
-                        if (fshi_hold(fshi))
-                                break;
-        }
-        rw_exit(&fsrecp->fshfsr_lock);
-
-        if (fshi == NULL)
-                return ((*vp->v_op->vop_read)(vp, uiop, ioflag, cr, ct));
-
-        ret = (*fshi->fshi_hooks.read)(fshi, fshi->fshi_hooks.arg,
-            vp, uiop, ioflag, cr, ct);
-        fshi_rele(fshi);
-        return (ret);
-}
-
-int
-fsh_next_write(fsh_int_t *fshi, vnode_t *vp, uio_t *uiop, int ioflag,
-        cred_t *cr, caller_context_t *ct)
-{
-        fsh_fsrecord_t *fsrecp = vp->v_vfsp->vfs_fshrecord;
-        int ret;
-
-        /*
-         * The passed fshi is the previous hook (the one from which we've been
-         * called). We need to find the next one.
-         */
-        rw_enter(&fsrecp->fshfsr_lock, RW_READER);
-        for (fshi = list_next(&fsrecp->fshfsr_list, fshi); fshi != NULL;
-            fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.write != NULL)
-                        if (fshi_hold(fshi))
-                                break;
-        }
-        rw_exit(&fsrecp->fshfsr_lock);
-
-        if (fshi == NULL)
-                return ((*vp->v_op->vop_write)(vp, uiop, ioflag, cr, ct));
-
-        ret = (*fshi->fshi_hooks.write)(fshi, fshi->fshi_hooks.arg,
-            vp, uiop, ioflag, cr, ct);
-        fshi_rele(fshi);
-        return (ret);
-}
-
-int
-fsh_next_mount(fsh_int_t *fshi, vfs_t *vfsp, vnode_t *mvp, struct mounta *uap,
-        cred_t *cr)
-{
-        fsh_fsrecord_t *fsrecp = vfsp->vfs_fshrecord;
-        int ret;
-
-        /*
-         * The passed fshi is the previous hook (the one from which we've been
-         * called). We need to find the next one.
-         */
-        rw_enter(&fsrecp->fshfsr_lock, RW_READER);
-        for (fshi = list_next(&fsrecp->fshfsr_list, fshi); fshi != NULL;
-            fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.mount != NULL)
-                        if (fshi_hold(fshi))
-                                break;
-        }
-        rw_exit(&fsrecp->fshfsr_lock);
-
-        if (fshi == NULL)
-                return ((*(vfsp->vfs_op->vfs_mount))(vfsp, mvp, uap, cr));
-
-        ret = (*fshi->fshi_hooks.mount)(fshi, fshi->fshi_hooks.arg,
-            vfsp, mvp, uap, cr);
-        fshi_rele(fshi);
-        return (ret);
-}
-
-int
-fsh_next_unmount(fsh_int_t *fshi, vfs_t *vfsp, int flag, cred_t *cr)
-{
-        fsh_fsrecord_t *fsrecp = vfsp->vfs_fshrecord;
-        int ret;
-
-        /*
-         * The passed fshi is the previous hook (the one from which we've been
-         * called). We need to find the next one.
-         */
-        rw_enter(&fsrecp->fshfsr_lock, RW_READER);
-        for (fshi = list_next(&fsrecp->fshfsr_list, fshi); fshi != NULL;
-            fshi = list_next(&fsrecp->fshfsr_list, fshi)) {
-                if (fshi->fshi_hooks.unmount != NULL)
-                        if (fshi_hold(fshi))
-                                break;
-        }
-        rw_exit(&fsrecp->fshfsr_lock);
-
-        if (fshi == NULL)
-                return ((*vfsp->vfs_op->vfs_unmount)(vfsp, flag, cr));
-
-        ret = (*fshi->fshi_hooks.unmount)(fshi, fshi->fshi_hooks.arg,
-            vfsp, flag, cr);
-        fshi_rele(fshi);
-        return (ret);
 }