PROC(4) | File Formats and Configurations | PROC(4) |
proc
—
/proc can be mounted on any mount point, in addition to the standard /proc mount point, and can be mounted several places at once. Such additional mounts are allowed in order to facilitate the confinement of processes to subtrees of the file system via chroot(2) and yet allow such processes access to commands like ps(1).
Standard system calls are used to access /proc files: open(2), close(2), read(2), and write(2) (including readv(2), writev(2), pread(2), and pwrite(2)). Most files describe process state and can only be opened for reading. ctl and lwpctl (control) files permit manipulation of process state and can only be opened for writing. as (address space) files contain the image of the running process and can be opened for both reading and writing. An open for writing allows process control; a read-only open allows inspection but not control. In this document, we refer to the process as open for reading or writing if any of its associated /proc files is open for reading or writing.
In general, more than one process can open the same /proc file at the same time. Exclusive open is an advisory mechanism provided to allow controlling processes to avoid collisions with each other. A process can obtain exclusive control of a target process, with respect to other cooperating processes, if it successfully opens any /proc file in the target process for writing (the as or ctl files, or the lwpctl file of any lwp) while specifying O_EXCL in the open(2). Such an open will fail if the target process is already open for writing (that is, if an as, ctl, or lwpctl file is already open for writing). There can be any number of concurrent read-only opens; O_EXCL is ignored on opens for reading. It is recommended that the first open for writing by a controlling process use the O_EXCL flag; multiple controlling processes usually result in chaos.
If a process opens one of its own /proc files for writing, the open succeeds regardless of O_EXCL and regardless of whether some other process has the process open for writing. Self-opens do not count when another process attempts an exclusive open. (A process cannot exclude a debugger by opening itself for writing and the application of a debugger cannot prevent a process from opening itself.) All self-opens for writing are forced to be close-on-exec (see the F_SETFD operation of fcntl(2)).
Data may be transferred from or to any locations in the address
space of the traced process by applying lseek(2) to
position the as file at the virtual address of
interest followed by read(2) or write(2)
(or by using pread(2) or pwrite(2) for
the combined operation). The address-map files
/proc/pid/map
and
/proc/pid/xmap
can be read to determine the accessible areas (mappings) of the address
space. I/O transfers may span contiguous mappings. An
I/O request extending into an unmapped area is truncated
at the boundary. A write request beginning at an unmapped virtual address
fails with EIO
; a read request beginning at an
unmapped virtual address returns zero (an end-of-file indication).
Information and control operations are provided through additional
files. <procfs.h>
contains
definitions of data structures and message formats used with these files.
Some of these definitions involve the use of sets of flags. The set types
sigset_t, fltset_t, and
sysset_t correspond, respectively, to signal, fault, and
system call enumerations defined in
<sys/signal.h>
,
<sys/fault.h>
, and
<sys/syscall.h>
. Each set
type is large enough to hold flags for its own enumeration. Although they
are of different sizes, they have a common structure and can be manipulated
by these macros:
prfillset(&set); /* turn on all flags in set */ premptyset(&set); /* turn off all flags in set */ praddset(&set, flag); /* turn on the specified flag */ prdelset(&set, flag); /* turn off the specified flag */ r = prismember(&set, flag); /* != 0 iff flag is turned on */
One of prfillset
() or
premptyset
() must be used to initialize
set before it is used in any other operation.
flag must be a member of the enumeration corresponding
to set.
Every process contains at least one light-weight process, or lwp. Each lwp represents a flow of execution that is independently scheduled by the operating system. All lwps in a process share its address space as well as many other attributes. Through the use of lwpctl and ctl files as described below, it is possible to affect individual lwps in a process or to affect all of them at once, depending on the operation.
When the process has more than one lwp, a representative lwp is chosen by the system for certain process status files and control operations. The representative lwp is a stopped lwp only if all of the process's lwps are stopped; is stopped on an event of interest only if all of the lwps are so stopped (excluding PR_SUSPENDED lwps); is in a PR_REQUESTED stop only if there are no other events of interest to be found; or, failing everything else, is in a PR_SUSPENDED stop (implying that the process is deadlocked). See the description of the status file for definitions of stopped states. See the PCSTOP control operation for the definition of “event of interest”.
The representative lwp remains fixed (it will be chosen again on the next operation) as long as all of the lwps are stopped on events of interest or are in a PR_SUSPENDED stop and the PCRUN control operation is not applied to any of them.
When applied to the process control file, every /proc control operation that must act on an lwp uses the same algorithm to choose which lwp to act upon. Together with synchronous stopping (see PCSET), this enables a debugger to control a multiple-lwp process using only the process-level status and control files if it so chooses. More fine-grained control can be achieved using the lwp-specific files.
The system supports two process data models, the traditional 32-bit data model in which ints, longs and pointers are all 32 bits wide (the ILP32 data model), and on some platforms the 64-bit data model in which longs and pointers, but not ints, are 64 bits in width (the LP64 data model). In the LP64 data model some system data types, notably size_t, off_t, time_t and dev_t, grow from 32 bits to 64 bits as well.
The /proc interfaces described here are
available to both 32-bit and 64-bit controlling processes. However, many
operations attempted by a 32-bit controlling process on a 64-bit target
process will fail with EOVERFLOW
because the address
space range of a 32-bit process cannot encompass a 64-bit process or because
the data in some 64-bit system data type cannot be compressed to fit into
the corresponding 32-bit type without loss of information. Operations that
fail in this circumstance include reading and writing the address space,
reading the address-map files, and setting the target process's registers.
There is no restriction on operations applied by a 64-bit process to either
a 32-bit or a 64-bit target processes.
The format of the contents of any /proc file depends on the data model of the observer (the controlling process), not on the data model of the target process. A 64-bit debugger does not have to translate the information it reads from a /proc file for a 32-bit process from 32-bit format to 64-bit format. However, it usually has to be aware of the data model of the target process. The pr_dmodel field of the status files indicates the target process's data model.
To help deal with system data structures that are read from 32-bit
processes, a 64-bit controlling program can be compiled with the C
preprocessor symbol _SYSCALL32
defined before system
header files are included. This makes explicit 32-bit fixed-width data
structures (like struct stat32) visible to the 64-bit
program. See types32.h(3HEAD).
ENOENT
.
Although process state and consequently the contents of /proc files can change from instant to instant, a single read(2) of a /proc file is guaranteed to return a sane representation of state; that is, the read will be atomic with respect to the state of the process. No such guarantee applies to successive reads applied to a /proc file for a running process. In addition, atomicity is not guaranteed for I/O applied to the as (address-space) file for a running process or for a process whose address space contains memory shared by another running process.
A number of structure definitions are used to describe the files. These structures may grow by the addition of elements at the end in future releases of the system and it is not legitimate for a program to assume that they will not.
typedef struct pstatus { int pr_flags; /* flags (see below) */ int pr_nlwp; /* number of active lwps in the process */ int pr_nzomb; /* number of zombie lwps in the process */ pid_tpr_pid; /* process id */ pid_tpr_ppid; /* parent process id */ pid_tpr_pgid; /* process group id */ pid_tpr_sid; /* session id */ id_t pr_aslwpid; /* obsolete */ id_t pr_agentid; /* lwp-id of the agent lwp, if any */ sigset_t pr_sigpend; /* set of process pending signals */ uintptr_t pr_brkbase; /* virtual address of the process heap */ size_t pr_brksize; /* size of the process heap, in bytes */ uintptr_t pr_stkbase; /* virtual address of the process stack */ size_tpr_stksize; /* size of the process stack, in bytes */ timestruc_t pr_utime; /* process user cpu time */ timestruc_t pr_stime; /* process system cpu time */ timestruc_t pr_cutime; /* sum of children's user times */ timestruc_t pr_cstime; /* sum of children's system times */ sigset_t pr_sigtrace; /* set of traced signals */ fltset_t pr_flttrace; /* set of traced faults */ sysset_t pr_sysentry; /* set of system calls traced on entry */ sysset_t pr_sysexit; /* set of system calls traced on exit */ char pr_dmodel; /* data model of the process */ taskid_t pr_taskid; /* task id */ projid_t pr_projid; /* project id */ zoneid_t pr_zoneid; /* zone id */ lwpstatus_t pr_lwp; /* status of the representative lwp */ } pstatus_t;
pr_flags is a bit-mask holding the following process flags. For convenience, it also contains the lwp flags for the representative lwp, described later.
pr_nlwp is the total number of active lwps in the process. pr_nzomb is the total number of zombie lwps in the process. A zombie lwp is a non-detached lwp that has terminated but has not been reaped with thr_join(3) or pthread_join(3C).
pr_pid, pr_ppi, pr_pgid, and pr_sid are, respectively, the process ID, the ID of the process's parent, the process's process group ID, and the process's session ID.
pr_aslwpid is obsolete and is always zero.
pr_agentid is the lwp-ID for the /proc agent lwp (see the PCAGENT control operation). It is zero if there is no agent lwp in the process.
pr_sigpend identifies asynchronous signals pending for the process.
pr_brkbase is the virtual address of the process heap and pr_brksize is its size in bytes. The address formed by the sum of these values is the process break (see brk(2)). pr_stkbase and pr_stksize are, respectively, the virtual address of the process stack and its size in bytes. (Each lwp runs on a separate stack; the distinguishing characteristic of the process stack is that the operating system will grow it when necessary.)
pr_utime, pr_stime, pr_cutime, and pr_cstime are, respectively, the user CPU and system CPU time consumed by the process, and the cumulative user CPU and system CPU time consumed by the process's children, in seconds and nanoseconds.
pr_sigtrace and pr_flttrace contain, respectively, the set of signals and the set of hardware faults that are being traced (see PCSTRACE and PCSFAULT).
pr_sysentry and pr_sysexit contain, respectively, the sets of system calls being traced on entry and exit (see PCSENTRY and PCSEXIT).
pr_dmodel indicates the data model of the process. Possible values are:
The pr_taskid, pr_projid, and pr_zoneid fields contain respectively, the numeric IDs of the task, project, and zone in which the process was running.
The constant PR_MODEL_NATIVE reflects the data model of the controlling process, that is, its value is PR_MODEL_ILP32 or PR_MODEL_LP64 according to whether the controlling process has been compiled as a 32-bit program or a 64-bit program, respectively.
pr_lwp contains the status information for the representative lwp:
typedef struct lwpstatus { int pr_flags; /* flags (see below) */ id_t pr_lwpid; /* specific lwp identifier */ short pr_why; /* reason for lwp stop, if stopped */ short pr_what; /* more detailed reason */ short pr_cursig; /* current signal, if any */ siginfo_t pr_info; /* info associated with signal or fault */ sigset_t pr_lwppend; /* set of signals pending to the lwp */ sigset_t pr_lwphold; /* set of signals blocked by the lwp */ struct sigaction pr_action;/* signal action for current signal */ stack_t pr_altstack; /* alternate signal stack info */ uintptr_t pr_oldcontext; /* address of previous ucontext */ short pr_syscall; /* system call number (if in syscall) */ short pr_nsysarg; /* number of arguments to this syscall */ int pr_errno; /* errno for failed syscall */ long pr_sysarg[PRSYSARGS]; /* arguments to this syscall */ long pr_rval1; /* primary syscall return value */ long pr_rval2; /* second syscall return value, if any */ char pr_clname[PRCLSZ]; /* scheduling class name */ timestruc_t pr_tstamp; /* real-time time stamp of stop */ timestruc_t pr_utime; /* lwp user cpu time */ timestruc_t pr_stime; /* lwp system cpu time */ uintptr_t pr_ustack; /* stack boundary data (stack_t) address */ ulong_t pr_instr; /* current instruction */ prgregset_t pr_reg; /* general registers */ prfpregset_t pr_fpreg; /* floating-point registers */ } lwpstatus_t;
pr_flags is a bit-mask holding the following lwp flags. For convenience, it also contains the process flags, described previously.
pr_lwpid names the specific lwp.
pr_why and pr_what together describe, for a stopped lwp, the reason for the stop. Possible values of pr_why and the associated pr_what are:
pr_cursig names the current signal, that is, the
next signal to be delivered to the lwp, if any. pr_info,
when the lwp is in a PR_SIGNALLED or
PR_FAULTED stop, contains additional information pertinent
to the particular signal or fault (see
<sys/siginfo.h>
).
pr_lwppend identifies any synchronous or directed signals pending for the lwp. pr_lwphold identifies those signals whose delivery is being blocked by the lwp (the signal mask).
pr_action contains the signal action information pertaining to the current signal (see sigaction(2)); it is undefined if pr_cursig is zero. pr_altstack contains the alternate signal stack information for the lwp (see sigaltstack(2)).
pr_oldcontext, if not zero, contains the address on the lwp stack of a ucontext structure describing the previous user-level context (see ucontext.h(3HEAD)). It is non-zero only if the lwp is executing in the context of a signal handler.
pr_syscall is the number of the system call, if any, being executed by the lwp; it is non-zero if and only if the lwp is stopped on PR_SYSENTRY or PR_SYSEXIT, or is asleep within a system call (PR_ASLEEP is set). If pr_syscall is non-zero, pr_nsysarg is the number of arguments to the system call and pr_sysarg contains the actual arguments.
pr_rval1, pr_rval2, and
pr_errno are defined only if the lwp is stopped on
PR_SYSEXIT or if the PR_VFORKP flag is
set. If pr_errno is zero, pr_rval1 and
pr_rval2 contain the return values from the system call.
Otherwise, pr_errno contains the error number for the
failing system call (see
<sys/errno.h>
).
pr_clname contains the name of the lwp's scheduling class.
pr_tstamp, if the lwp is stopped, contains a time stamp marking when the lwp stopped, in real time seconds and nanoseconds since an arbitrary time in the past.
pr_utime is the amount of user level CPU time used by this LWP.
pr_stime is the amount of system level CPU time used by this LWP.
pr_ustack is the virtual address of the stack_t that contains the stack boundaries for this LWP. See getustack(2) and _stack_grow(3C).
pr_instr contains the machine instruction to which the lwp's program counter refers. The amount of data retrieved from the process is machine-dependent. On SPARC based machines, it is a 32-bit word. On x86-based machines, it is a single byte. In general, the size is that of the machine's smallest instruction. If PR_PCINVAL is set, pr_instr is undefined; this occurs whenever the lwp is not stopped or when the program counter refers to an invalid virtual address.
pr_reg is an array holding the contents of a stopped lwp's general registers.
The preceding constants are listed in
<sys/regset.h>
.
Note that a 32-bit process can run on an x86 64-bit system, using the constants listed above.
The preceding constants are listed in
<sys/regset.h>
.
pr_fpreg is a structure holding the contents of the floating-point registers.
SPARC registers, both general and floating-point, as seen by a 64-bit controlling process are the V9 versions of the registers, even if the target process is a 32-bit (V8) process. V8 registers are a subset of the V9 registers.
If the lwp is not stopped, all register values are undefined.
typedef struct psinfo { int pr_flag; /* process flags (DEPRECATED: see below) */ int pr_nlwp; /* number of active lwps in the process */ int pr_nzomb; /* number of zombie lwps in the process */ pid_t pr_pid; /* process id */ pid_t pr_ppid; /* process id of parent */ pid_t pr_pgid; /* process id of process group leader */ pid_t pr_sid; /* session id */ uid_t pr_uid; /* real user id */ uid_t pr_euid; /* effective user id */ gid_t pr_gid; /* real group id */ gid_t pr_egid; /* effective group id */ uintptr_t pr_addr; /* address of process */ size_t pr_size; /* size of process image in Kbytes */ size_t pr_rssize; /* resident set size in Kbytes */ dev_t pr_ttydev; /* controlling tty device (or PRNODEV) */ ushort_t pr_pctcpu; /* % of recent cpu time used by all lwps */ ushort_t pr_pctmem; /* % of system memory used by process */ timestruc_t pr_start; /* process start time, from the epoch */ timestruc_t pr_time; /* cpu time for this process */ timestruc_t pr_ctime; /* cpu time for reaped children */ char pr_fname[PRFNSZ]; /* name of exec'ed file */ char pr_psargs[PRARGSZ]; /* initial characters of arg list */ int pr_wstat; /* if zombie, the wait() status */ int pr_argc; /* initial argument count */ uintptr_t pr_argv; /* address of initial argument vector */ uintptr_t pr_envp; /* address of initial environment vector */ char pr_dmodel; /* data model of the process */ taskid_t pr_taskid; /* task id */ projid_t pr_projid; /* project id */ poolid_t pr_poolid; /* pool id */ zoneid_t pr_zoneid; /* zone id */ ctid_t pr_contract; /* process contract id */ lwpsinfo_t pr_lwp; /* information for representative lwp */ } psinfo_t;
Some of the entries in psinfo, such as pr_addr, refer to internal kernel data structures and should not be expected to retain their meanings across different versions of the operating system.
psinfo_t.pr_flag is a deprecated interface that should no longer be used. Applications currently relying on the SSYS bit in pr_flag should migrate to checking PR_ISSYS in the pstatus structure's pr_flags field.
pr_pctcpu and pr_pctmem are 16-bit binary fractions in the range 0.0 to 1.0 with the binary point to the right of the high-order bit (1.0 == 0x8000). pr_pctcpu is the summation over all lwps in the process.
pr_lwp contains the ps(1) information for the representative lwp. If the process is a zombie, pr_nlwp, pr_nzomb, and pr_lwp.pr_lwpid are zero and the other fields of pr_lwp are undefined:
typedef struct lwpsinfo { int pr_flag; /* lwp flags (DEPRECATED: see below) */ id_t pr_lwpid; /* lwp id */ uintptr_t pr_addr; /* internal address of lwp */ uintptr_t pr_wchan; /* wait addr for sleeping lwp */ char pr_stype; /* synchronization event type */ char pr_state; /* numeric lwp state */ char pr_sname; /* printable character for pr_state */ char pr_nice; /* nice for cpu usage */ short pr_syscall; /* system call number (if in syscall) */ char pr_oldpri; /* pre-SVR4, low value is high priority */ char pr_cpu; /* pre-SVR4, cpu usage for scheduling */ int pr_pri; /* priority, high value = high priority */ ushort_t pr_pctcpu; /* % of recent cpu time used by this lwp */ timestruc_t pr_start; /* lwp start time, from the epoch */ timestruc_t pr_time; /* cpu time for this lwp */ char pr_clname[PRCLSZ]; /* scheduling class name */ char pr_name[PRFNSZ]; /* name of system lwp */ processorid_t pr_onpro; /* processor which last ran this lwp */ processorid_t pr_bindpro;/* processor to which lwp is bound */ psetid_t pr_bindpset; /* processor set to which lwp is bound */ lgrp_id_t pr_lgrp; /* home lgroup */ } lwpsinfo_t;
Some of the entries in lwpsinfo, such as pr_addr, pr_wchan, pr_stype, pr_state, and pr_name, refer to internal kernel data structures and should not be expected to retain their meanings across different versions of the operating system.
lwpsinfo_t.pr_flag is a deprecated interface that should no longer be used.
pr_pctcpu is a 16-bit binary fraction, as described above. It represents the CPU time used by the specific lwp. On a multi-processor machine, the maximum value is 1/N, where N is the number of CPUs.
pr_contract is the id of the process contract of which the process is a member. See contract(4) and process(4).
typedef struct prcred { uid_t pr_euid; /* effective user id */ uid_t pr_ruid; /* real user id */ uid_t pr_suid; /* saved user id (from exec) */ gid_t pr_egid; /* effective group id */ gid_t pr_rgid; /* real group id */ gid_t pr_sgid; /* saved group id (from exec) */ int pr_ngroups; /* number of supplementary groups */ gid_t pr_groups[1]; /* array of supplementary groups */ } prcred_t;
The array of associated supplementary groups in
pr_groups
is of variable length; the cred file contains all of the
supplementary groups. pr_ngroups indicates the number of
supplementary groups. (See also the PCSCRED and
PCSCREDX control operations.)
typedef struct prpriv { uint32_t pr_nsets; /* number of privilege set */ uint32_t pr_setsize; /* size of privilege set */ uint32_t pr_infosize; /* size of supplementary data */ priv_chunk_t pr_sets[1]; /* array of sets */ } prpriv_t;
The actual dimension of the pr_sets[] field is
which is followed by additional information about the process state pr_infosize bytes in size.
The full size of the structure can be computed using
PRIV_PRPRIV_SIZE
(prpriv_t
*).
typedef struct prsecflags { uint32_t pr_version; /* ABI Versioning of this structure */ secflagset_t pr_effective; /* Effective flags */ secflagset_t pr_inherit; /* Inheritable flags */ secflagset_t pr_lower; /* Lower flags */ secflagset_t pr_upper; /* Upper flags */ } prsecflags_t;
The pr_version field is a version number for the structure, currently PRSECFLAGS_VERSION_1.
<sys/auxv.h>
). The values are
those that were passed by the operating system as startup information to the
dynamic linker.
<sys/sysi86.h>
, one element
for each active LDT entry.
typedef struct prmap { uintptr_tpr_vaddr; /* virtual address of mapping */ size_t pr_size; /* size of mapping in bytes */ char pr_mapname[PRMAPSZ]; /* name in /proc/pid/object */ offset_t pr_offset; /* offset into mapped object, if any */ int pr_mflags; /* protection and attribute flags */ int pr_pagesize; /* pagesize for this mapping in bytes */ int pr_shmid; /* SysV shared memory identifier */ } prmap_t;
typedef struct prxmap { uintptr_t pr_vaddr; /* virtual address of mapping */ size_t pr_size; /* size of mapping in bytes */ char pr_mapname[PRMAPSZ]; /* name in /proc/pid/object */ offset_t pr_offset; /* offset into mapped object, if any */ int pr_mflags; /* protection and attribute flags */ int pr_pagesize; /* pagesize for this mapping in bytes */ int pr_shmid; /* SysV shared memory identifier */ dev_t pr_dev; /* device of mapped object, if any */ uint64_t pr_ino; /* inode of mapped object, if any */ size_t pr_rss; /* pages of resident memory */ size_t pr_anon; /* pages of resident anonymous memory */ size_t pr_locked; /* pages of locked memory */ uint64_t pr_hatpagesize; /* pagesize of mapping */ } prxmap_t;
pr_vaddr is the virtual address of the mapping within the traced process and pr_size is its size in bytes. pr_mapname, if it does not contain a null string, contains the name of a file in the object directory (see below) that can be opened read-only to obtain a file descriptor for the mapped file associated with the mapping. This enables a debugger to find object file symbol tables without having to know the real path names of the executable file and shared libraries of the process. pr_offset is the 64-bit offset within the mapped file (if any) to which the virtual address is mapped.
pr_mflags is a bit-mask of protection and attribute flags:
A contiguous area of the address space having the same underlying mapped object may appear as multiple mappings due to varying read, write, and execute attributes. The underlying mapped object does not change over the range of a single mapping. An I/O operation to a mapping marked MA_SHARED fails if applied at a virtual address not corresponding to a valid page in the underlying mapped object. A write to a MA_SHARED mapping that is not marked MA_WRITE fails. Reads and writes to private mappings always succeed. Reads and writes to unmapped addresses fail.
pr_pagesize is the page size for the mapping, currently always the system pagesize.
pr_shmid is the shared memory identifier, if any, for the mapping. Its value is -1 if the mapping is not System V shared memory. See shmget(2).
pr_dev is the device of the mapped object, if any, for the mapping. Its value is PRNODEV (-1) if the mapping does not have a device.
pr_ino is the inode of the mapped object, if any, for the mapping. Its contents are only valid if pr_dev is not PRNODEV.
pr_rss is the number of resident pages of memory for the mapping. The number of resident bytes for the mapping may be determined by multiplying pr_rss by the page size given by pr_pagesize.
pr_anon is the number of resident anonymous memory pages (pages which are private to this process) for the mapping.
pr_locked is the number of locked pages for the mapping. Pages which are locked are always resident in memory.
pr_hatpagesize is the size, in bytes, of the HAT (MMU) translation for the mapping. pr_hatpagesize may be different than pr_pagesize. The possible values are hardware architecture specific, and may change over a mapping's lifetime.
If an entry refers to a regular file, it can be opened with normal
file system semantics but, to ensure that the controlling process cannot
gain greater access than the controlled process, with no file access modes
other than its read/write open modes in the controlled process. If an entry
refers to a directory, it can be accessed with the same semantics as
/proc/pid/cwd.
An attempt to open any other type of entry fails with
EACCES
.
typedef struct prfdinfov2 { int pr_fd; /* file descriptor number */ mode_t pr_mode; /* (see st_mode in stat(2)) */ uint64_t pr_ino; /* inode number */ uint64_t pr_size; /* file size */ int64_t pr_offset; /* current offset of file descriptor */ uid_t pr_uid; /* owner's user id */ gid_t pr_gid; /* owner's group id */ major_t pr_major; /* major number of device containing file */ minor_t pr_minor; /* minor number of device containing file */ major_t pr_rmajor; /* major number (if special file) */ minor_t pr_rminor; /* minor number (if special file) */ int pr_fileflags; /* (see F_GETXFL in fcntl(2)) */ int pr_fdflags; /* (see F_GETFD in fcntl(2)) */ short pr_locktype; /* (see F_GETLK in fcntl(2)) */ pid_t pr_lockpid; /* process holding file lock (see F_GETLK) */ int pr_locksysid; /* sysid of locking process (see F_GETLK) */ pid_t pr_peerpid; /* peer process (socket, door) */ int pr_filler[25]; /* reserved for future use */ char pr_peername[PRFNSZ]; /* peer process name */ #if __STDC_VERSION__ >= 199901L char pr_misc[]; /* self describing structures */ #else char pr_misc[1]; #endif } prfdinfov2_t;
The pr_misc element points to a list of additional miscellaneous data items, each of which has a header of type pr_misc_header_t specifying the size and type, and some data which immediately follow the header.
typedef struct pr_misc_header { uint_t pr_misc_size; uint_t pr_misc_type; } pr_misc_header_t;
The pr_misc_size field is the sum of the sizes of the header and the associated data. The end of the list is indicated by a header with a zero size.
The following miscellaneous data types can be present:
getsockname
() within the process.getpeername
() within the process.getsockopt
(SO_LINGER) within
the process.getsockopt
(SO_SNDBUF) within
the process.getsockopt
(SO_RCVBUF) within
the process.getsockopt
(IPPROTO_IP,
IP_NEXTHOP) within the process.getsockopt
(IPPROTO_IPV6,
IPV6_NEXTHOP) within the process.getsockopt
(SO_TYPE) within
the process.getsockopt
(IPPROTO_TCP,
TCP_CONGESTION) within the process. This is a
character array containing the name of the congestion algorithm in use for
the socket.The object directory makes it possible for a controlling process to gain access to the object file and any shared libraries (and consequently the symbol tables) without having to know the actual path names of the executable files.
A read(2) of the page data file descriptor returns structured page data and atomically clears the page data maintained for the file by the system. That is to say, each read returns data collected since the last read; the first read returns data collected since the file was opened. When the call completes, the read buffer contains the following structure as its header and thereafter contains a number of section header structures and associated byte arrays that must be accessed by walking linearly through the buffer.
typedef struct prpageheader { timestruc_t pr_tstamp; /* real time stamp, time of read() */ ulong_t pr_nmap; /* number of address space mappings */ ulong_t pr_npage; /* total number of pages */ } prpageheader_t;
The header is followed by pr_nmap prasmap structures and associated data arrays. The prasmap structure contains the following elements:
typedef struct prasmap { uintptr_t pr_vaddr; /* virtual address of mapping */ ulong_t pr_npage; /* number of pages in mapping */ char pr_mapname[PRMAPSZ]; /* name in /proc/pid/object */ offset_t pr_offset; /* offset into mapped object, if any */ int pr_mflags; /* protection and attribute flags */ int pr_pagesize; /* pagesize for this mapping in bytes */ int pr_shmid; /* SysV shared memory identifier */ } prasmap_t;
Each section header is followed by pr_npage bytes, one byte for each page in the mapping, plus 0-7 null bytes at the end so that the next prasmap structure begins on an eight-byte aligned boundary. Each data byte may contain these flags:
If the read buffer is not large enough to contain all of the page
data, the read fails with E2BIG
and the page data is
not cleared. The required size of the read buffer can be determined through
fstat(2). Application of lseek(2) to the
page data file descriptor is ineffective; every read starts from the
beginning of the file. Closing the page data file descriptor terminates the
system overhead associated with collecting the data.
More than one page data file descriptor for the same process can
be opened, up to a system-imposed limit per traced process. A read of one
does not affect the data being collected by the system for the others. An
open of the page data file will fail with ENOMEM
if
the system-imposed limit would be exceeded.
typedef struct prusage { id_t pr_lwpid; /* lwp id. 0: process or defunct */ int pr_count; /* number of contributing lwps */ timestruc_t pr_tstamp; /* real time stamp, time of read() */ timestruc_t pr_create; /* process/lwp creation time stamp */ timestruc_t pr_term; /* process/lwp termination time stamp */ timestruc_t pr_rtime; /* total lwp real (elapsed) time */ timestruc_t pr_utime; /* user level CPU time */ timestruc_t pr_stime; /* system call CPU time */ timestruc_t pr_ttime; /* other system trap CPU time */ timestruc_t pr_tftime; /* text page fault sleep time */ timestruc_t pr_dftime; /* data page fault sleep time */ timestruc_t pr_kftime; /* kernel page fault sleep time */ timestruc_t pr_ltime; /* user lock wait sleep time */ timestruc_t pr_slptime; /* all other sleep time */ timestruc_t pr_wtime; /* wait-cpu (latency) time */ timestruc_t pr_stoptime; /* stopped time */ ulong_t pr_minf; /* minor page faults */ ulong_t pr_majf; /* major page faults */ ulong_t pr_nswap; /* swaps */ ulong_t pr_inblk; /* input blocks */ ulong_t pr_oublk; /* output blocks */ ulong_t pr_msnd; /* messages sent */ ulong_t pr_mrcv; /* messages received */ ulong_t pr_sigs; /* signals received */ ulong_t pr_vctx; /* voluntary context switches */ ulong_t pr_ictx; /* involuntary context switches */ ulong_t pr_sysc; /* system calls */ ulong_t pr_ioch; /* chars read and written */ } prusage_t;
Microstate accounting is now continuously enabled. While this information was previously an estimate, if microstate accounting were not enabled, the current information is now never an estimate represents time the process has spent in various states.
typedef struct prheader { long pr_nent; /* number of entries */ size_t pr_entsize; /* size of each entry, in bytes */ } prheader_t;
The lwpstatus structure may grow by the addition of elements at the end in future releases of the system. Programs must use pr_entsize in the file header to index through the array. These comments apply to all /proc files that include a prheader structure (lpsinfo and lusage, below).
THREAD_NAME_MAX
bytes representing the LWP
name; the buffer is zero-filled if the thread name is shorter than the buffer.
If no thread name is set, the buffer contains the empty string. A read with a
buffer shorter than THREAD_NAME_MAX
bytes is not
guaranteed to be NUL-terminated. Writing to this file will set the LWP name
for the specific lwp. This file may not be present in older operating system
versions. THREAD_NAME_MAX
may increase in the future;
clients should be prepared for this.
<sys/regset.h>
, with the
values of those SPARC register windows that could not be stored on the stack
when the lwp stopped. Conditions under which register windows are not stored
on the stack are: the stack pointer refers to nonexistent process memory or
the stack pointer is improperly aligned. If the lwp is not stopped or if there
are no register windows that could not be stored on the stack, the file is
empty (the usual case).
<procfs.h>
, with the values of
the lwp's extra state registers. If the lwp is not stopped, all register
values are undefined. See also the PCSXREG
control operation, below.
<sys/regset.h>
, containing the
values of the lwp's platform-dependent ancillary state registers. If the lwp
is not stopped, all register values are undefined. See also the
PCSASRS control operation, below.
Multiple control messages may be combined in a single write(2) (or writev(2)) to a control file, but no partial writes are permitted. That is, each control message, operation code plus operand, if any, must be presented in its entirety to the write(2) and not in pieces over several system calls. If a control operation fails, no subsequent operations contained in the same write(2) are attempted.
Descriptions of the allowable control messages follow. In all
cases, writing a message to a control file for a process or lwp that has
terminated elicits the error ENOENT
.
PCTWSTOP is identical to PCWSTOP except that it enables the operation to time out, to avoid waiting forever for a process or lwp that may never stop on an event of interest. PCTWSTOP takes a long operand specifying a number of milliseconds; the wait will terminate successfully after the specified number of milliseconds even if the process or lwp has not stopped; a timeout value of zero makes the operation identical to PCWSTOP.
An “event of interest” is either a
PR_REQUESTED stop or a stop that has been specified in the
process's tracing flags (set by PCSTRACE,
PCSFAULT, PCSENTRY, and
PCSEXIT). PR_JOBCONTROL
and PR_SUSPENDED stops are specifically not events of
interest. (An lwp may stop twice due to a stop signal, first showing
PR_SIGNALLED if the signal is traced and again showing
PR_JOBCONTROL if the lwp is set running without clearing
the signal.) If PCSTOP or PCDSTOP is
applied to an lwp that is stopped, but not on an event of interest, the stop
directive takes effect when the lwp is restarted by the competing mechanism.
At that time, the lwp enters a PR_REQUESTED stop before
executing any user-level code.
A write of a control message that blocks is interruptible by a signal so that, for example, an alarm(2) can be set to avoid waiting forever for a process or lwp that may never stop on an event of interest. If PCSTOP is interrupted, the lwp stop directives remain in effect even though the write(2) returns an error. (Use of PCTWSTOP with a non-zero timeout is recommended over PCWSTOP with an alarm(2).)
A system process (indicated by the PR_ISSYS
flag) never executes at user level, has no user-level address space visible
through /proc, and cannot be stopped. Applying one
of these operations to a system process or any of its lwps elicits the error
EBUSY
.
When applied to an lwp control file, PCRUN
clears any outstanding directed-stop request and makes the specific lwp
runnable. The operation fails with EBUSY
if the
specific lwp is not stopped on an event of interest or has not been directed
to stop or if the agent lwp exists and this is not the agent lwp (see
PCAGENT).
When applied to the process control file, a representative lwp is
chosen for the operation as described for
/proc/pid/status.
The operation fails with EBUSY
if the representative
lwp is not stopped on an event of interest or has not been directed to stop
or if the agent lwp exists. If PRSTEP or
PRSTOP was requested, the representative lwp is made
runnable and its outstanding directed-stop request is cleared; otherwise all
outstanding directed-stop requests are cleared and, if it was stopped on an
event of interest, the representative lwp is marked
PR_REQUESTED. If, as a consequence, all lwps are in the
PR_REQUESTED or PR_SUSPENDED stop state,
all lwps showing PR_REQUESTED are made runnable.
If a signal that is included in an lwp's held signal set (the signal mask) is sent to the lwp, the signal is not received and does not cause a stop until it is removed from the held signal set, either by the lwp itself or by setting the held signal set with PCSHOLD.
<sys/siginfo.h>
). If the
specified signal number is zero, the current signal is cleared. The semantics
of this operation are different from those of kill(2) in
that the signal is delivered to the lwp immediately after execution is resumed
(even if it is being blocked) and an additional PR_SIGNALLED
stop does not intervene even if the signal is traced. Setting the current
signal to SIGKILL terminates the process immediately.
EINVAL
) to attempt to delete
SIGKILL.
<sys/fault.h>
and include the
following. Some of these may not occur on all processors; there may be
processor-specific faults in addition to these.
When not traced, a fault normally results in the posting of a signal to the lwp that incurred the fault. If an lwp stops on a fault, the signal is posted to the lwp when execution is resumed unless the fault is cleared by PCCFAULT or by the PRCFAULT option of PCRUN. FLTPAGE is an exception; no signal is posted. The pr_info field in the lwpstatus structure identifies the signal to be sent and contains machine-specific information about the fault.
When entry to a system call is being traced, an lwp stops after having begun the call to the system but before the system call arguments have been fetched from the lwp. When exit from a system call is being traced, an lwp stops on completion of the system call just prior to checking for signals and returning to user level. At this point, all return values have been stored into the lwp's registers.
If an lwp is stopped on entry to a system call
(PR_SYSENTRY) or when sleeping in an interruptible system
call (PR_ASLEEP is set), it may be instructed to go
directly to system call exit by specifying the PRSABORT
flag in a PCRUN control message. Unless exit from the
system call is being traced, the lwp returns to user level showing
EINTR
.
typedef struct prwatch { uintptr_t pr_vaddr; /* virtual address of watched area */ size_t pr_size; /* size of watched area in bytes */ int pr_wflags; /* watch type flags */ } prwatch_t;
pr_vaddr specifies the virtual address of an area of memory to be watched in the controlled process. pr_size specifies the size of the area, in bytes. pr_wflags specifies the type of memory access to be monitored as a bit-mask of the following flags:
If pr_wflags is non-empty, a watched area is established for the virtual address range specified by pr_vaddr and pr_size. If pr_wflags is empty, any previously-established watched area starting at the specified virtual address is cleared; pr_size is ignored.
A watchpoint is triggered when an lwp in the traced process makes a memory reference that covers at least one byte of a watched area and the memory reference is as specified in pr_wflags. When an lwp triggers a watchpoint, it incurs a watchpoint trap. If FLTWATCH is being traced, the lwp stops; otherwise, it is sent a SIGTRAP signal; if SIGTRAP is being traced and is not blocked, the lwp stops.
The watchpoint trap occurs before the instruction completes unless WA_TRAPAFTER was specified, in which case it occurs after the instruction completes. If it occurs before completion, the memory is not modified. If it occurs after completion, the memory is modified (if the access is a write access).
Physical i/o is an exception for watchpoint traps. In this instance, there is no guarantee that memory before the watched area has already been modified (or in the case of WA_TRAPAFTER, that the memory following the watched area has not been modified) when the watchpoint trap occurs and the lwp stops.
pr_info in the lwpstatus structure contains information pertinent to the watchpoint trap. In particular, the si_addr field contains the virtual address of the memory reference that triggered the watchpoint, and the si_code field contains one of TRAP_RWATCH, TRAP_WWATCH, or TRAP_XWATCH, indicating read, write, or execute access, respectively. The si_trapafter field is zero unless WA_TRAPAFTER is in effect for this watched area; non-zero indicates that the current instruction is not the instruction that incurred the watchpoint trap. The si_pc field contains the virtual address of the instruction that incurred the trap.
A watchpoint trap may be triggered while executing a system call
that makes reference to the traced process's memory. The lwp that is
executing the system call incurs the watchpoint trap while still in the
system call. If it stops as a result, the lwpstatus
structure contains the system call number and its arguments. If the lwp does
not stop, or if it is set running again without clearing the signal or
fault, the system call fails with EFAULT
. If
WA_TRAPAFTER was specified, the memory reference will have
completed and the memory will have been modified (if the access was a write
access) when the watchpoint trap occurs.
If more than one of WA_READ, WA_WRITE, and WA_EXEC is specified for a watched area, and a single instruction incurs more than one of the specified types, only one is reported when the watchpoint trap occurs. The precedence is WA_EXEC, WA_READ, WA_WRITE (WA_EXEC and WA_READ take precedence over WA_WRITE), unless WA_TRAPAFTER was specified, in which case it is WA_WRITE, WA_READ, WA_EXEC (WA_WRITE takes precedence).
PCWATCH fails with
EINVAL
if an attempt is made to specify overlapping
watched areas or if pr_wflags contains flags other than
those specified above. It fails with ENOMEM
if an
attempt is made to establish more watched areas than the system can support
(the system can support thousands).
The child of a vfork(2) borrows the parent's
address space. When a vfork(2) is executed by a traced
process, all watched areas established for the parent are suspended until
the child terminates or performs an exec(2). Any watched
areas established independently in the child are cancelled when the parent
resumes after the child's termination or exec(2).
PCWATCH fails with EBUSY
if
applied to the parent of a vfork(2) before the child has
terminated or performed an exec(2). The
PR_VFORKP flag is set in the pstatus
structure for such a parent process.
Certain accesses of the traced process's address space by the operating system are immune to watchpoints. The initial construction of a signal stack frame when a signal is delivered to an lwp will not trigger a watchpoint trap even if the new frame covers watched areas of the stack. Once the signal handler is entered, watchpoint traps occur normally. On SPARC based machines, register window overflow and underflow will not trigger watchpoint traps, even if the register window save areas cover watched areas of the stack.
Watched areas are not inherited by child processes, even if the traced process's inherit-on-fork mode, PR_FORK, is set (see PCSET, below). All watched areas are cancelled when the traced process performs a successful exec(2).
It is an error (EINVAL
) to specify flags
other than those described above or to apply these operations to a system
process. The current modes are reported in the pr_flags
field of
/proc/pid/status
and
/proc/pid/lwp/lwp/lwpstatus.
On SPARC based systems, only the condition-code bits of the processor-status register (R_PSR) of SPARC V8 (32-bit) processes can be modified by PCSREG. Other privileged registers cannot be modified at all.
On x86-based systems, only certain bits of the flags register (EFL) can be modified by PCSREG: these include the condition codes, direction-bit, and overflow-bit.
PCSREG fails with EBUSY
if the lwp is not stopped on an event of interest.
EBUSY
if the lwp
is not stopped on an event of interest.
EINVAL
) is returned if the system does not
support floating-point operations (no floating-point hardware and the system
does not emulate floating-point machine instructions).
PCSFPREG fails with EBUSY
if the lwp
is not stopped on an event of interest.
EINVAL
) is returned if the system
does not support extra state registers. PCSXREG fails with
EBUSY
if the lwp is not stopped on an event of
interest.
EINVAL
) is returned if either the target process or
the controlling process is not a 64-bit SPARC V9 process. Most of the
ancillary state registers are privileged registers that cannot be modified.
Only those that can be modified are set; all others are silently ignored.
PCSASRS fails with EBUSY
if the lwp
is not stopped on an event of interest.
The PCAGENT operation fails with
EBUSY
unless the process is fully stopped via
/proc, that is, unless all of the lwps in the
process are stopped either on events of interest or on
PR_SUSPENDED, or are stopped on
PR_JOBCONTROL and have been directed to stop via
PCDSTOP. It fails with EBUSY
if an
agent lwp already exists. It fails with ENOMEM
if
system resources for creating new lwps have been exhausted.
Any PCRUN operation applied to the process
control file or to the control file of an lwp other than the agent lwp fails
with EBUSY
as long as the agent lwp exists. The
agent lwp must be caused to terminate by executing the
SYS_lwp_exit system call trap before the process can be
restarted.
Once the agent lwp is created, its lwp-ID can be found by reading the process status file. To facilitate opening the agent lwp's control and status files, the directory name /proc/pid/lwp/agent is accepted for lookup operations as an invisible alias for /proc/pid/lwp/lwpid, lwpid being the lwp-ID of the agent lwp (invisible in the sense that the name “agent” does not appear in a directory listing of /proc/pid/lwp obtained from ls(1), getdents(2), or readdir(3C).
The purpose of the agent lwp is to perform operations in the controlled process on behalf of the controlling process: to gather information not directly available via /proc files, or in general to make the process change state in ways not directly available via /proc control operations. To make use of an agent lwp, the controlling process must be capable of making it execute system calls (specifically, the SYS_lwp_exit system call trap). The register values given to the agent lwp on creation are typically the registers of the representative lwp, so that the agent lwp can use its stack.
If the controlling process neglects to force the agent lwp to execute the SYS_lwp_exit system call (due to either logic error or fatal failure on the part of the controlling process), the agent lwp will remain in the target process. For purposes of being able to debug these otherwise rogue agents, information as to the creator of the agent lwp is reflected in that lwp's spymaster file in /proc. Should the target process generate a core dump with the agent lwp in place, this information will be available via the NT_SPYMASTER note in the core file (see core(4)).
The agent lwp is not allowed to execute any variation of the
SYS_fork or SYS_exec system call traps.
Attempts to do so yield ENOTSUP
to the agent
lwp.
Symbolic constants for system call trap numbers like
SYS_lwp_exit and SYS_lwp_create can be
found in the header file
<sys/syscall.h>
.
typedef struct priovec { void *pio_base; /* buffer in controlling process */ size_t pio_len; /* size of read/write request in bytes */ off_t pio_offset; /* virtual address in target process */ } priovec_t;
These operations have the same effect as
pread(2) and pwrite(2), respectively, of
the target process's address space file. The difference is that more than
one PCREAD or PCWRITE control operation
can be written to the control file at once, and they can be interspersed
with other control operations in a single write to the control file. This is
useful, for example, when planting many breakpoint instructions in the
process's address space, or when stepping over a breakpointed instruction.
Unlike pread(2) and pwrite(2), no
provision is made for partial reads or writes; if the operation cannot be
performed completely, it fails with EIO
.
EPERM
.
The limit set of the target process cannot be grown. The other privilege sets must be subsets of the intersection of the effective set of the calling process with the new limit set of the target process or subsets of the original values of the sets in the target process.
If any of the above restrictions are not met,
EPERM
is returned. If the structure written is
improperly formatted, EINVAL
is returned.
A process that is missing the basic privilege {PRIV_PROC_INFO} cannot see any processes under /proc that it cannot send a signal to.
A process that has {PRIV_PROC_OWNER} asserted in its effective set can open any file for reading. To manipulate or control a process, the controlling process must have at least as many privileges in its effective set as the target process has in its effective, inheritable, and permitted sets. The limit set of the controlling process must be a superset of the limit set of the target process. Additional restrictions apply if any of the uids of the target process are 0. See privileges(5).
Even if held by a privileged process, an open process or lwp file
descriptor (other than file descriptors for the world-readable files)
becomes invalid if the traced process performs an exec(2)
of a setuid/setgid object file or an object file that the traced process
cannot read. Any operation performed on an invalid file descriptor, except
close(2), fails with EAGAIN
. In
this situation, if any tracing flags are set and the process or any lwp file
descriptor is open for writing, the process will have been directed to stop
and its run-on-last-close flag will have been set (see
PCSET). This enables a controlling process
(if it has permission) to reopen the /proc files to
get new valid file descriptors, close the invalid file descriptors, unset
the run-on-last-close flag (if desired), and proceed. Just closing the
invalid file descriptors causes the traced process to resume execution with
all tracing flags cleared. Any process not currently open for writing via
/proc, but that has left-over tracing flags from a
previous open, and that executes a setuid/setgid or unreadable object file,
will not be stopped but will have all its tracing flags cleared.
To wait for one or more of a set of processes or lwps to stop or terminate, /proc file descriptors (other than those obtained by opening the cwd or root directories or by opening files in the fd or object directories) can be used in a poll(2) system call. When requested and returned, either of the polling events POLLPRI or POLLWRNORM indicates that the process or lwp stopped on an event of interest. Although they cannot be requested, the polling events POLLHUP, POLLERR, and POLLNVAL may be returned. POLLHUP indicates that the process or lwp has terminated. POLLERR indicates that the file descriptor has become invalid. POLLNVAL is returned immediately if POLLPRI or POLLWRNORM is requested on a file descriptor referring to a system process (see PCSTOP). The requested events may be empty to wait simply for termination.
E2BIG
EACCES
EAGAIN
EBUSY
EINVAL
EINTR
EIO
ENOENT
ENOMEM
ENOSYS
EOVERFLOW
EPERM
An attempt was made to control a process of which the E, P, and I privilege sets were not a subset of the effective set of the controlling process or the limit set of the controlling process is not a superset of limit set of the controlled process.
Any of the uids of the target process are 0 or an attempt was made to change any of the uids to 0 using PCSCRED and the security policy imposed additional restrictions. See privileges(5).
<procfs.h>
.
On SPARC based machines, the types gregset_t and
fpregset_t defined in
<sys/regset.h>
are similar
to but not the same as the types prgregset_t and
prfpregset_t defined in
<procfs.h>
.
December 3, 2019 | illumos |