Print this page
11554 Want TCP_CONGESTION socket option
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Split |
Close |
Expand all |
Collapse all |
--- old/usr/src/man/man7p/tcp.7p
+++ new/usr/src/man/man7p/tcp.7p
1 1 '\"
2 2 .\" This file and its contents are supplied under the terms of the
3 3 .\" Common Development and Distribution License ("CDDL"), version 1.0.
4 4 .\" You may only use this file in accordance with the terms of version
5 5 .\" 1.0 of the CDDL.
6 6 .\"
7 7 .\" A full copy of the text of the CDDL should have accompanied this
8 8 .\" source. A copy of the CDDL is also available via the Internet at
9 9 .\" http://www.illumos.org/license/CDDL.
10 10 .\"
11 11 .\"
12 12 .\" Copyright (c) 2006, Sun Microsystems, Inc. All Rights Reserved.
13 13 .\" Copyright (c) 2011 Nexenta Systems, Inc. All rights reserved.
14 14 .\" Copyright 2019 Joyent, Inc.
15 15 .\" Copyright 1989 AT&T
16 16 .\"
17 17 .Dd "Jan 07, 2019"
18 18 .Dt TCP 7P
19 19 .Os
20 20 .Sh NAME
21 21 .Nm tcp ,
22 22 .Nm TCP
23 23 .Nd Internet Transmission Control Protocol
24 24 .Sh SYNOPSIS
25 25 .In sys/socket.h
26 26 .In netinet/in.h
27 27 .In netinet/tcp.h
28 28 .Bd -literal
29 29 s = socket(AF_INET, SOCK_STREAM, 0);
30 30 s = socket(AF_INET6, SOCK_STREAM, 0);
31 31 t = t_open("/dev/tcp", O_RDWR);
32 32 t = t_open("/dev/tcp6", O_RDWR);
33 33 .Ed
34 34 .Sh DESCRIPTION
35 35 TCP is the virtual circuit protocol of the Internet protocol family.
36 36 It provides reliable, flow-controlled, in-order, two-way transmission of data.
37 37 It is a byte-stream protocol layered above the Internet Protocol
38 38 .Po Sy IP Pc ,
39 39 or the Internet Protocol Version 6
40 40 .Po Sy IPv6 Pc ,
41 41 the Internet protocol family's
42 42 internetwork datagram delivery protocol.
43 43 .Pp
44 44 Programs can access TCP using the socket interface as a
45 45 .Dv SOCK_STREAM
46 46 socket type, or using the Transport Level Interface
47 47 .Po Sy TLI Pc
48 48 where it supports the connection-oriented
49 49 .Po Dv BT_COTS_ORD Pc
50 50 service type.
51 51 .Pp
52 52 A checksum over all data helps TCP provide reliable communication.
53 53 Using a window-based flow control mechanism that makes use of positive
54 54 acknowledgements, sequence numbers, and a retransmission strategy, TCP can
55 55 usually recover when datagrams are damaged, delayed, duplicated or delivered
56 56 out of order by the underlying medium.
57 57 .Pp
58 58 TCP provides several socket options, defined in
59 59 .In netinet/tcp.h
60 60 and described throughout this document,
61 61 which may be set using
62 62 .Xr setsockopt 3SOCKET
63 63 and read using
64 64 .Xr getsockopt 3SOCKET .
65 65 The
66 66 .Fa level
67 67 argument for these calls is the protocol number for TCP, available from
68 68 .Xr getprotobyname 3SOCKET .
69 69 IP level options may also be used with TCP.
70 70 See
71 71 .Xr ip 7P
72 72 and
73 73 .Xr ip6 7P .
74 74 .Ss "Listening And Connecting"
75 75 TCP uses IP's host-level addressing and adds its own per-host
76 76 collection of
77 77 .Dq port addresses .
78 78 The endpoints of a TCP connection are
79 79 identified by the combination of an IPv4 or IPv6 address and a TCP
80 80 port number.
81 81 Although other protocols, such as the User Datagram Protocol
82 82 .Po Sy UDP Pc ,
83 83 may use the same host and port address format, the port space of these
84 84 protocols is distinct.
85 85 See
86 86 .Xr inet 7P
87 87 and
88 88 .Xr inet6 7P
89 89 for details on
90 90 the common aspects of addressing in the Internet protocol family.
91 91 .Pp
92 92 Sockets utilizing TCP are either
93 93 .Dq active
94 94 or
95 95 .Dq passive .
96 96 Active sockets
97 97 initiate connections to passive sockets.
98 98 Passive sockets must have their local IPv4 or IPv6 address and TCP port number
99 99 bound with the
100 100 .Xr bind 3SOCKET
101 101 system call after the socket is created.
102 102 If an active socket has not been bound by the time
103 103 .Xr connect 3SOCKET
104 104 is called, then the operating system will choose a local address and port for
105 105 the application.
106 106 By default, TCP sockets are active.
107 107 A passive socket is created by calling the
108 108 .Xr listen 3SOCKET
109 109 system call after binding, which establishes a queueing parameter for the
110 110 passive socket.
111 111 Connections to the passive socket can then be received using the
112 112 .Xr accept 3SOCKET
113 113 system call.
114 114 Active sockets use the
115 115 .Xr connect 3SOCKET
116 116 call after binding to initiate connections.
117 117 .Pp
118 118 If incoming connection requests include an IP source route option, then the
119 119 reverse source route will be used when responding.
120 120 .Pp
121 121 By using the special value
122 122 .Dv INADDR_ANY
123 123 with IPv4, or the unspecified
124 124 address (all zeroes) with IPv6, the local IP address can be left
125 125 unspecified in the
126 126 .Fn bind
127 127 call by either active or passive TCP
128 128 sockets.
129 129 This feature is usually used if the local address is either unknown or
130 130 irrelevant.
131 131 If left unspecified, the local IP address will be bound at connection time to
132 132 the address of the network interface used to service the connection.
133 133 For passive sockets, this is the destination address used by the connecting
134 134 peer.
135 135 For active sockets, this is usually an address on the same subnet as the
136 136 destination or default gateway address, although the rules can be more complex.
137 137 See
138 138 .Sy "Source Address Selection"
139 139 in
140 140 .Xr inet6 7P
141 141 for a detailed discussion of how this works in IPv6.
142 142 .Pp
143 143 Note that no two TCP sockets can be bound to the same port unless the bound IP
144 144 addresses are different.
145 145 IPv4
146 146 .Dv INADDR_ANY
147 147 and IPv6 unspecified addresses compare as equal to any IPv4 or IPv6 address.
148 148 For example, if a socket is bound to
149 149 .Dv INADDR_ANY
150 150 or the unspecified address and port
151 151 .Em N ,
152 152 no other socket can bind to port
153 153 .Em N ,
154 154 regardless of the binding address.
155 155 This special consideration of
156 156 .Dv INADDR_ANY
157 157 and the unspecified address can be changed using the socket option
158 158 .Dv SO_REUSEADDR .
159 159 If
160 160 .Dv SO_REUSEADDR
161 161 is set on a socket doing a bind, IPv4
162 162 .Dv INADDR_ANY
163 163 and the IPv6 unspecified address do not compare as equal to any IP address.
164 164 This means that as long as the two sockets are not both bound to
165 165 .Dv INADDR_ANY ,
166 166 the unspecified address, or the same IP address, then the two sockets can be
167 167 bound to the same port.
168 168 .Pp
169 169 If an application does not want to allow another socket using the
170 170 .Dv SO_REUSEADDR
171 171 option to bind to a port its socket is bound to, the
172 172 application can set the socket-level
173 173 .Po Dv SOL_SOCKET Pc
174 174 option
175 175 .Dv SO_EXCLBIND
176 176 on a socket.
177 177 The
178 178 option values of 0 and 1 mean enabling and disabling the option respectively.
179 179 Once this option is enabled on a socket, no other socket can be bound to the
180 180 same port.
181 181 .Ss "Sending And Receiving Data"
182 182 Once a connection has been established, data can be exchanged using the
183 183 .Xr read 2
184 184 and
185 185 .Xr write 2
186 186 system calls.
187 187 If, after sending data, the local TCP receives no acknowledgements from its
188 188 peer for a period of time (for example, if the remote machine crashes), the
189 189 connection is closed and an error is returned.
190 190 .Pp
191 191 When a peer is sending data, it will only send up to the advertised
192 192 .Dq receive window ,
193 193 which is determined by how much more data the recipient can fit in its buffer.
194 194 Applications can use the socket-level option
195 195 .Dv SO_RCVBUF
196 196 to increase or decrease the receive buffer size.
197 197 Similarly, the socket-level option
198 198 .Dv SO_SNDBUF
199 199 can be used to allow TCP to buffer more unacknowledged and unsent data locally.
200 200 .Pp
201 201 Under most circumstances, TCP will send data when it is written by the
202 202 application.
203 203 When outstanding data has not yet been acknowledged, though, TCP will gather
204 204 small amounts of output to be sent as a single packet once an acknowledgement
205 205 has been received.
206 206 Usually referred to as Nagle's Algorithm (RFC 896), this behavior helps prevent
207 207 flooding the network with many small packets.
208 208 .Pp
209 209 However, for some highly interactive clients (such as remote shells or
210 210 windowing systems that send a stream of keypresses or mouse events), this
211 211 batching may cause significant delays.
212 212 To disable this behavior, TCP provides a boolean socket option,
213 213 .Dv TCP_NODELAY .
214 214 .Pp
215 215 Conversely, for other applications, it may be desirable for TCP not to send out
216 216 any data until a full TCP segment can be sent.
217 217 To enable this behavior, an application can use the TCP-level socket option
218 218 .Dv TCP_CORK .
219 219 When set to a non-zero value, TCP will only send out a full TCP segment.
220 220 When
221 221 .Dv TCP_CORK
222 222 is set to zero after it has been enabled, all currently buffered data is sent
223 223 out (as permitted by the peer's receive window and the current congestion
224 224 window).
225 225 .Pp
226 226 TCP provides an urgent data mechanism, which may be invoked using the
227 227 out-of-band provisions of
228 228 .Xr send 3SOCKET .
229 229 The caller may mark one byte as
230 230 .Dq urgent
231 231 with the
232 232 .Dv MSG_OOB
233 233 flag to
234 234 .Xr send 3SOCKET .
235 235 This sets an
236 236 .Dq urgent pointer
237 237 pointing to this byte in the TCP stream.
238 238 The receiver on the other side of the stream is notified of the urgent data by a
239 239 .Dv SIGURG
240 240 signal.
241 241 The
242 242 .Dv SIOCATMARK
243 243 .Xr ioctl 2
244 244 request returns a value indicating whether the stream is at the urgent mark.
245 245 Because the system never returns data across the urgent mark in a single
246 246 .Xr read 2
247 247 call, it is possible to
248 248 advance to the urgent data in a simple loop which reads data, testing the
249 249 socket with the
250 250 .Dv SIOCATMARK
251 251 .Fn ioctl
252 252 request, until it reaches the mark.
253 253 .Ss "Congestion Control"
254 254 TCP follows the congestion control algorithm described in RFC 2581, and
255 255 also supports the initial congestion window (cwnd) changes in RFC 3390.
↓ open down ↓ |
255 lines elided |
↑ open up ↑ |
256 256 The initial cwnd calculation can be overridden by the socket option
257 257 .Dv TCP_INIT_CWND .
258 258 An application can use this option to set the initial cwnd to a
259 259 specified number of TCP segments.
260 260 This applies to the cases when the connection
261 261 first starts and restarts after an idle period.
262 262 The process must have the
263 263 .Dv PRIV_SYS_NET_CONFIG
264 264 privilege if it wants to specify a number greater than that
265 265 calculated by RFC 3390.
266 +.Pp
267 +The operating system also provides alternative algorithms that may be more
268 +appropriate for your application, including the CUBIC congestion control
269 +algorithm described in RFC 8312.
270 +These can be configured system-wide using
271 +.Xr ipadm 1M ,
272 +or on a per-connection basis with the TCP-level socket option
273 +.Dv TCP_CONGESTION ,
274 +whose argument is the name of the algorithm to use
275 +.Pq for example Dq cubic .
276 +If the requested algorithm does not exist, then
277 +.Fn setsockopt
278 +will fail, and
279 +.Va errno
280 +will be set to
281 +.Er ENOENT .
266 282 .Ss "TCP Keep-Alive"
267 283 Since TCP determines whether a remote peer is no longer reachable by timing out
268 284 waiting for acknowledgements, a host that never sends any new data may never
269 285 notice a peer that has gone away.
270 286 While consumers can avoid this problem by sending their own periodic heartbeat
271 287 messages (Transport Layer Security does this, for example),
272 288 TCP describes an optional keep-alive mechanism in RFC 1122.
273 289 Applications can enable it using the socket-level option
274 290 .Dv SO_KEEPALIVE .
275 291 When enabled, the first keep-alive probe is sent out after a TCP connection is
276 292 idle for two hours.
277 293 If the peer does not respond to the probe within eight minutes, the TCP
278 294 connection is aborted.
279 295 An application can alter the probe behavior using the following TCP-level
280 296 socket options:
281 297 .Bl -tag -offset indent -width 16m
282 298 .It Dv TCP_KEEPALIVE_THRESHOLD
283 299 Determines the interval for sending the first probe.
284 300 The option value is specified as an unsigned integer in milliseconds.
285 301 The system default is controlled by the TCP
286 302 .Nm ndd
287 303 parameter
288 304 .Cm tcp_keepalive_interval .
289 305 The minimum value is ten seconds.
290 306 The maximum is ten days, while the default is two hours.
291 307 .It Dv TCP_KEEPALIVE_ABORT_THRESHOLD
292 308 If TCP does not receive a response to the probe, then this option determines
293 309 how long to wait before aborting a TCP connection.
294 310 The option value is an unsigned integer in milliseconds.
295 311 The value zero indicates that TCP should never time
296 312 out and abort the connection when probing.
297 313 The system default is controlled by the TCP
298 314 .Nm ndd
299 315 parameter
300 316 .Sy tcp_keepalive_abort_interval .
301 317 The default is eight minutes.
302 318 .It Dv TCP_KEEPIDLE
303 319 This option, like
304 320 .Dv TCP_KEEPALIVE_THRESHOLD ,
305 321 determines the interval for sending the first probe, except that
306 322 the option value is an unsigned integer in
307 323 .Sy seconds .
308 324 It is provided primarily for compatibility with other Unix flavors.
309 325 .It Dv TCP_KEEPCNT
310 326 This option specifies the number of keep-alive probes that should be sent
311 327 without any response from the peer before aborting the connection.
312 328 .It Dv TCP_KEEPINTVL
313 329 This option specifies the interval in seconds between successive,
314 330 unacknowledged keep-alive probes.
315 331 .El
316 332 .Ss "Additional Configuration"
317 333 illumos supports TCP Extensions for High Performance (RFC 7323)
318 334 which includes the window scale and timestamp options, and Protection Against
319 335 Wrap Around Sequence Numbers
320 336 .Po Sy PAWS Pc .
321 337 Note that if timestamps are negotiated on
322 338 a connection, received segments without timestamps on that connection are
323 339 silently dropped per the suggestion in the RFC. illumos also supports Selective
324 340 Acknowledgment
325 341 .Po Sy SACK Pc
326 342 capabilities (RFC 2018) and Explicit Congestion
327 343 Notification
328 344 .Po Sy ECN Pc
329 345 mechanism (RFC 3168).
330 346 .Pp
331 347 Turn on the window scale option in one of the following ways:
332 348 .Bl -bullet -offset indent -width 4m
333 349 .It
334 350 An application can set
335 351 .Dv SO_SNDBUF
336 352 or
337 353 .Dv SO_RCVBUF
338 354 size in the
339 355 .Fn setsockopt
340 356 option to be larger than 64K.
341 357 This must be done
342 358 .Em before
343 359 the program calls
344 360 .Fn listen
345 361 or
346 362 .Fn connect ,
347 363 because the window scale
348 364 option is negotiated when the connection is established.
349 365 Once the connection
350 366 has been made, it is too late to increase the send or receive window beyond the
351 367 default TCP limit of 64K.
352 368 .It
353 369 For all applications, use
354 370 .Xr ndd 1M
355 371 to modify the configuration parameter
356 372 .Cm tcp_wscale_always .
357 373 If
358 374 .Cm tcp_wscale_always
359 375 is set to
360 376 .Sy 1 ,
361 377 the
362 378 window scale option will always be set when connecting to a remote system.
363 379 If
364 380 .Cm tcp_wscale_always
365 381 is
366 382 .Sy 0 ,
367 383 the window scale option will be set only if
368 384 the user has requested a send or receive window larger than 64K.
369 385 The default value of
370 386 .Cm tcp_wscale_always
371 387 is
372 388 .Sy 1 .
373 389 .It
374 390 Regardless of the value of
375 391 .Cm tcp_wscale_always ,
376 392 the window scale option
377 393 will always be included in a connect acknowledgement if the connecting system
378 394 has used the option.
379 395 .El
380 396 .Pp
381 397 Turn on SACK capabilities in the following way:
382 398 .Bl -bullet -offset indent -width 4m
383 399 .It
384 400 Use
385 401 .Nm ndd
386 402 to modify the configuration parameter
387 403 .Cm tcp_sack_permitted .
388 404 If
389 405 .Cm tcp_sack_permitted
390 406 is set to
391 407 .Sy 0 ,
392 408 TCP will not accept SACK or send out SACK information.
393 409 If
394 410 .Cm tcp_sack_permitted
395 411 is
396 412 set to
397 413 .Sy 1 ,
398 414 TCP will not initiate a connection with SACK permitted option in the
399 415 .Sy SYN
400 416 segment, but will respond with SACK permitted option in the
401 417 .Sy SYN|ACK
402 418 segment if an incoming connection request has the SACK permitted option.
403 419 This means that TCP will only accept SACK information if the other side of the
404 420 connection also accepts SACK information.
405 421 If
406 422 .Cm tcp_sack_permitted
407 423 is set to
408 424 .Sy 2 ,
409 425 it will both initiate and accept connections with SACK information.
410 426 The default for
411 427 .Cm tcp_sack_permitted
412 428 is
413 429 .Sy 2
414 430 .Pq active enabled .
415 431 .El
416 432 .Pp
417 433 Turn on the TCP ECN mechanism in the following way:
418 434 .Bl -bullet -offset indent -width 4m
419 435 .It
420 436 Use
421 437 .Nm ndd
422 438 to modify the configuration parameter
423 439 .Cm tcp_ecn_permitted .
424 440 If
425 441 .Cm tcp_ecn_permitted
426 442 is set to
427 443 .Sy 0 ,
428 444 then TCP will not negotiate with a peer that supports ECN mechanism.
429 445 If
430 446 .Cm tcp_ecn_permitted
431 447 is set to
432 448 .Sy 1
433 449 when initiating a connection, TCP will not tell a peer that it supports
434 450 .Sy ECN
435 451 mechanism.
436 452 However, it will tell a peer that it supports
437 453 .Sy ECN
438 454 mechanism when accepting a new incoming connection request if the peer
439 455 indicates that it supports
440 456 .Sy ECN
441 457 mechanism in the
442 458 .Sy SYN
443 459 segment.
444 460 If
445 461 .Cm tcp_ecn_permitted
446 462 is set to 2, in addition to negotiating with a peer on
447 463 .Sy ECN
448 464 mechanism when accepting connections, TCP will indicate in the outgoing
449 465 .Sy SYN
450 466 segment that it supports
451 467 .Sy ECN
452 468 mechanism when TCP makes active outgoing connections.
453 469 The default for
454 470 .Cm tcp_ecn_permitted
455 471 is 1.
456 472 .El
457 473 .Pp
458 474 Turn on the timestamp option in the following way:
459 475 .Bl -bullet -offset indent -width 4m
460 476 .It
461 477 Use
462 478 .Nm ndd
463 479 to modify the configuration parameter
464 480 .Cm tcp_tstamp_always .
465 481 If
466 482 .Cm tcp_tstamp_always
467 483 is
468 484 .Sy 1 ,
469 485 the timestamp option will always be set
470 486 when connecting to a remote machine.
471 487 If
472 488 .Cm tcp_tstamp_always
473 489 is
474 490 .Sy 0 ,
475 491 the timestamp option will not be set when connecting to a remote system.
476 492 The
477 493 default for
478 494 .Cm tcp_tstamp_always
479 495 is
480 496 .Sy 0 .
481 497 .It
482 498 Regardless of the value of
483 499 .Cm tcp_tstamp_always ,
484 500 the timestamp option will
485 501 always be included in a connect acknowledgement (and all succeeding packets) if
486 502 the connecting system has used the timestamp option.
487 503 .El
488 504 .Pp
489 505 Use the following procedure to turn on the timestamp option only when the
490 506 window scale option is in effect:
491 507 .Bl -bullet -offset indent -width 4m
492 508 .It
493 509 Use
494 510 .Nm ndd
495 511 to modify the configuration parameter
496 512 .Cm tcp_tstamp_if_wscale .
497 513 Setting
498 514 .Cm tcp_tstamp_if_wscale
499 515 to
500 516 .Sy 1
501 517 will cause the timestamp option
502 518 to be set when connecting to a remote system, if the window scale option has
503 519 been set.
504 520 If
505 521 .Cm tcp_tstamp_if_wscale
506 522 is
507 523 .Sy 0 ,
508 524 the timestamp option will
509 525 not be set when connecting to a remote system.
510 526 The default for
511 527 .Cm tcp_tstamp_if_wscale
512 528 is
513 529 .Sy 1 .
514 530 .El
515 531 .Pp
516 532 Protection Against Wrap Around Sequence Numbers
517 533 .Po Sy PAWS Pc
518 534 is always used when the
519 535 timestamp option is set.
520 536 .Pp
521 537 The operating system also supports multiple methods of generating initial sequence numbers.
522 538 One of these methods is the improved technique suggested in RFC 1948.
523 539 We
524 540 .Em HIGHLY
525 541 recommend that you set sequence number generation parameters as
526 542 close to boot time as possible.
527 543 This prevents sequence number problems on
528 544 connections that use the same connection-ID as ones that used a different
529 545 sequence number generation.
530 546 The
531 547 .Sy svc:/network/initial:default
532 548 service configures the initial sequence number generation.
533 549 The service reads the value contained in the configuration file
534 550 .Pa /etc/default/inetinit
535 551 to determine which method to use.
536 552 .Pp
537 553 The
538 554 .Pa /etc/default/inetinit
539 555 file is an unstable interface, and may change in future releases.
540 556 .Sh EXAMPLES
541 557 .Ss Example 1: Connecting to a server
542 558 .Bd -literal
543 559 $ gcc -std=c99 -Wall -lsocket -o client client.c
544 560 $ cat client.c
545 561 #include <sys/socket.h>
546 562 #include <netinet/in.h>
547 563 #include <netinet/tcp.h>
548 564 #include <netdb.h>
549 565 #include <stdio.h>
550 566 #include <string.h>
551 567 #include <unistd.h>
552 568
553 569 int
554 570 main(int argc, char *argv[])
555 571 {
556 572 struct addrinfo hints, *gair, *p;
557 573 int fd, rv, rlen;
558 574 char buf[1024];
559 575 int y = 1;
560 576
561 577 if (argc != 3) {
562 578 fprintf(stderr, "%s <host> <port>\\n", argv[0]);
563 579 return (1);
564 580 }
565 581
566 582 memset(&hints, 0, sizeof (hints));
567 583 hints.ai_family = PF_UNSPEC;
568 584 hints.ai_socktype = SOCK_STREAM;
569 585
570 586 if ((rv = getaddrinfo(argv[1], argv[2], &hints, &gair)) != 0) {
571 587 fprintf(stderr, "getaddrinfo() failed: %s\\n",
572 588 gai_strerror(rv));
573 589 return (1);
574 590 }
575 591
576 592 for (p = gair; p != NULL; p = p->ai_next) {
577 593 if ((fd = socket(
578 594 p->ai_family,
579 595 p->ai_socktype,
580 596 p->ai_protocol)) == -1) {
581 597 perror("socket() failed");
582 598 continue;
583 599 }
584 600
585 601 if (connect(fd, p->ai_addr, p->ai_addrlen) == -1) {
586 602 close(fd);
587 603 perror("connect() failed");
588 604 continue;
589 605 }
590 606
591 607 break;
592 608 }
593 609
594 610 if (p == NULL) {
595 611 fprintf(stderr, "failed to connect to server\\n");
596 612 return (1);
597 613 }
598 614
599 615 freeaddrinfo(gair);
600 616
601 617 if (setsockopt(fd, SOL_SOCKET, SO_KEEPALIVE, &y,
602 618 sizeof (y)) == -1) {
603 619 perror("setsockopt(SO_KEEPALIVE) failed");
604 620 return (1);
605 621 }
606 622
607 623 while ((rlen = read(fd, buf, sizeof (buf))) > 0) {
608 624 fwrite(buf, rlen, 1, stdout);
609 625 }
610 626
611 627 if (rlen == -1) {
612 628 perror("read() failed");
613 629 }
614 630
615 631 fflush(stdout);
616 632
617 633 if (close(fd) == -1) {
618 634 perror("close() failed");
619 635 }
620 636
621 637 return (0);
622 638 }
623 639 $ ./client 127.0.0.1 8080
624 640 hello
625 641 $ ./client ::1 8080
626 642 hello
627 643 .Ed
628 644 .Ss Example 2: Accepting client connections
629 645 .Bd -literal
630 646 $ gcc -std=c99 -Wall -lsocket -o server server.c
631 647 $ cat server.c
632 648 #include <sys/socket.h>
633 649 #include <netinet/in.h>
634 650 #include <netinet/tcp.h>
635 651 #include <netdb.h>
636 652 #include <stdio.h>
637 653 #include <string.h>
638 654 #include <unistd.h>
639 655 #include <arpa/inet.h>
640 656
641 657 void
642 658 logmsg(struct sockaddr *s, int bytes)
643 659 {
644 660 char dq[INET6_ADDRSTRLEN];
645 661
646 662 switch (s->sa_family) {
647 663 case AF_INET: {
648 664 struct sockaddr_in *s4 = (struct sockaddr_in *)s;
649 665 inet_ntop(AF_INET, &s4->sin_addr, dq, sizeof (dq));
650 666 fprintf(stdout, "sent %d bytes to %s:%d\\n",
651 667 bytes, dq, ntohs(s4->sin_port));
652 668 break;
653 669 }
654 670 case AF_INET6: {
655 671 struct sockaddr_in6 *s6 = (struct sockaddr_in6 *)s;
656 672 inet_ntop(AF_INET6, &s6->sin6_addr, dq, sizeof (dq));
657 673 fprintf(stdout, "sent %d bytes to [%s]:%d\\n",
658 674 bytes, dq, ntohs(s6->sin6_port));
659 675 break;
660 676 }
661 677 default:
662 678 fprintf(stdout, "sent %d bytes to unknown client\\n",
663 679 bytes);
664 680 break;
665 681 }
666 682 }
667 683
668 684 int
669 685 main(int argc, char *argv[])
670 686 {
671 687 struct addrinfo hints, *gair, *p;
672 688 int sfd, cfd;
673 689 int slen, wlen, rv;
674 690
675 691 if (argc != 3) {
676 692 fprintf(stderr, "%s <port> <message>\\n", argv[0]);
677 693 return (1);
678 694 }
679 695
680 696 slen = strlen(argv[2]);
681 697
682 698 memset(&hints, 0, sizeof (hints));
683 699 hints.ai_family = PF_UNSPEC;
684 700 hints.ai_socktype = SOCK_STREAM;
685 701 hints.ai_flags = AI_PASSIVE;
686 702
687 703 if ((rv = getaddrinfo(NULL, argv[1], &hints, &gair)) != 0) {
688 704 fprintf(stderr, "getaddrinfo() failed: %s\\n",
689 705 gai_strerror(rv));
690 706 return (1);
691 707 }
692 708
693 709 for (p = gair; p != NULL; p = p->ai_next) {
694 710 if ((sfd = socket(
695 711 p->ai_family,
696 712 p->ai_socktype,
697 713 p->ai_protocol)) == -1) {
698 714 perror("socket() failed");
699 715 continue;
700 716 }
701 717
702 718 if (bind(sfd, p->ai_addr, p->ai_addrlen) == -1) {
703 719 close(sfd);
704 720 perror("bind() failed");
705 721 continue;
706 722 }
707 723
708 724 break;
709 725 }
710 726
711 727 if (p == NULL) {
712 728 fprintf(stderr, "server failed to bind()\\n");
713 729 return (1);
714 730 }
715 731
716 732 freeaddrinfo(gair);
717 733
718 734 if (listen(sfd, 1024) != 0) {
719 735 perror("listen() failed");
720 736 return (1);
721 737 }
722 738
723 739 fprintf(stdout, "waiting for clients...\\n");
724 740
725 741 for (int times = 0; times < 5; times++) {
726 742 struct sockaddr_storage stor;
727 743 socklen_t alen = sizeof (stor);
728 744 struct sockaddr *addr = (struct sockaddr *)&stor;
729 745
730 746 if ((cfd = accept(sfd, addr, &alen)) == -1) {
731 747 perror("accept() failed");
732 748 continue;
733 749 }
734 750
735 751 wlen = 0;
736 752
737 753 do {
738 754 wlen += write(cfd, argv[2] + wlen, slen - wlen);
739 755 } while (wlen < slen);
740 756
741 757 logmsg(addr, wlen);
742 758
743 759 if (close(cfd) == -1) {
744 760 perror("close(cfd) failed");
745 761 }
746 762 }
747 763
748 764 if (close(sfd) == -1) {
749 765 perror("close(sfd) failed");
750 766 }
751 767
752 768 fprintf(stdout, "finished.\\n");
753 769
754 770 return (0);
755 771 }
756 772 $ ./server 8080 $'hello\\n'
757 773 waiting for clients...
758 774 sent 6 bytes to [::ffff:127.0.0.1]:59059
759 775 sent 6 bytes to [::ffff:127.0.0.1]:47448
760 776 sent 6 bytes to [::ffff:127.0.0.1]:54949
761 777 sent 6 bytes to [::ffff:127.0.0.1]:55186
762 778 sent 6 bytes to [::1]:62256
763 779 finished.
764 780 .Ed
765 781 .Sh DIAGNOSTICS
766 782 A socket operation may fail if:
767 783 .Bl -tag -offset indent -width 16m
768 784 .It Er EISCONN
769 785 A
770 786 .Fn connect
771 787 operation was attempted on a socket on which a
772 788 .Fn connect
773 789 operation had already been performed.
774 790 .It Er ETIMEDOUT
775 791 A connection was dropped due to excessive retransmissions.
776 792 .It Er ECONNRESET
777 793 The remote peer forced the connection to be closed (usually because the remote
778 794 machine has lost state information about the connection due to a crash).
779 795 .It Er ECONNREFUSED
780 796 The remote peer actively refused connection establishment (usually because no
781 797 process is listening to the port).
782 798 .It Er EADDRINUSE
783 799 A
784 800 .Fn bind
785 801 operation was attempted on a socket with a network address/port pair that has
786 802 already been bound to another socket.
787 803 .It Er EADDRNOTAVAIL
788 804 A
789 805 .Fn bind
790 806 operation was attempted on a socket with a network address for which no network
791 807 interface exists.
792 808 .It Er EACCES
793 809 A
794 810 .Fn bind
795 811 operation was attempted with a
796 812 .Dq reserved
797 813 port number and the effective user ID of the process was not the privileged user.
798 814 .It Er ENOBUFS
799 815 The system ran out of memory for internal data structures.
800 816 .El
801 817 .Sh SEE ALSO
802 818 .Xr svcs 1 ,
803 819 .Xr ndd 1M ,
804 820 .Xr svcadm 1M ,
805 821 .Xr ioctl 2 ,
806 822 .Xr read 2 ,
807 823 .Xr write 2 ,
808 824 .Xr accept 3SOCKET ,
809 825 .Xr bind 3SOCKET ,
810 826 .Xr connect 3SOCKET ,
811 827 .Xr getprotobyname 3SOCKET ,
812 828 .Xr getsockopt 3SOCKET ,
813 829 .Xr listen 3SOCKET ,
814 830 .Xr send 3SOCKET ,
815 831 .Xr smf 5 ,
816 832 .Xr inet 7P ,
817 833 .Xr inet6 7P ,
818 834 .Xr ip 7P ,
819 835 .Xr ip6 7P
820 836 .Rs
821 837 .%A "K. Ramakrishnan"
822 838 .%A "S. Floyd"
823 839 .%A "D. Black"
824 840 .%T "The Addition of Explicit Congestion Notification (ECN) to IP"
825 841 .%R "RFC 3168"
826 842 .%D "September 2001"
827 843 .Re
828 844 .Rs
829 845 .%A "M. Mathias"
830 846 .%A "J. Mahdavi"
831 847 .%A "S. Ford"
832 848 .%A "A. Romanow"
833 849 .%T "TCP Selective Acknowledgement Options"
834 850 .%R "RFC 2018"
835 851 .%D "October 1996"
836 852 .Re
837 853 .Rs
838 854 .%A "S. Bellovin"
839 855 .%T "Defending Against Sequence Number Attacks"
840 856 .%R "RFC 1948"
841 857 .%D "May 1996"
842 858 .Re
843 859 .Rs
844 860 .%A "D. Borman"
845 861 .%A "B. Braden"
846 862 .%A "V. Jacobson"
847 863 .%A "R. Scheffenegger, Ed."
848 864 .%T "TCP Extensions for High Performance"
849 865 .%R "RFC 7323"
850 866 .%D "September 2014"
851 867 .Re
852 868 .Rs
853 869 .%A "Jon Postel"
854 870 .%T "Transmission Control Protocol - DARPA Internet Program Protocol Specification"
855 871 .%R "RFC 793"
856 872 .%C "Network Information Center, SRI International, Menlo Park, CA."
857 873 .%D "September 1981"
858 874 .Re
859 875 .Sh NOTES
860 876 The
861 877 .Sy tcp
862 878 service is managed by the service management facility,
863 879 .Xr smf 5 ,
864 880 under the service identifier
865 881 .Sy svc:/network/initial:default .
866 882 .Pp
867 883 Administrative actions on this service, such as enabling, disabling, or
868 884 requesting restart, can be performed using
869 885 .Xr svcadm 1M .
870 886 The service's
871 887 status can be queried using the
872 888 .Xr svcs 1
873 889 command.
↓ open down ↓ |
598 lines elided |
↑ open up ↑ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX