doc/pull.md - librsync - Git at Google

 # Callback API {#api_pull}

 As an alternative to \ref api_streaming, librsync provides a "pull-mode"
 callback interface where it will repeatedly call application-provided
 callbacks to get more input data and to accept output data.

 Pull jobs are also created using rs_sig_begin(), rs_loadsig_begin,()
 rs_delta_begin(), rs_patch_begin().

 However, rather than calling rs_job_iter(), the application should then call
 rs_job_drive(), passing an input and an output callback. rs_job_drive() takes
 an opaque pointer for both the input and output callback: this could be a
 `FILE*` or some similar object telling them what to read and write.

 ## Non-blocking IO

 The librsync interface allows non-blocking streaming processing of data.
 This means that the library will accept input and produce output when it
 suits the application. If nonblocking file IO is used and the IO
 callbacks support it, then librsync will never block waiting for IO.

 Normally callbacks will read/write the whole buffer when they're called,
 but in some cases they might not be able to process all of it, or
 perhaps not process any at all. This might happen if the callbacks are
 connected to a nonblocking socket. Either of two things can happen in
 this case. If the callback returns ::RS_BLOCKED, then rs_job_iter() will
 also return ::RS_BLOCKED shortly.

 When an IO callback blocks, it is the responsibility of the application
 to work out when it will be able to make progress and therefore when it
 is worth calling rs_job_iter() again. Typically this involves a mechanism
 like `poll` or `select` to wait for the file descriptor to be ready.

 ## Blocking IO

 The IO callbacks are allowed to block.
 This will of course mean that the application's call to
 rs_job_drive() will also block.

 ## Partial completion

 IO callbacks are also allowed to process or provide only part of the requested
 data, as will commonly happen with socket IO.

 The library might not get as much input as it wanted when it is first
 called.  If it gets a partial read, it needs to hold onto that
 valuable and irreplaceable data.

 It cannot keep it on the stack, because it will be lost if the read
 blocks.  It needs to be kept in the job structure, or in somewhere
 referenced from there.

 The state function probably cannot proceed until it has all the needed
 input.  So possibly this can be expressed at a high level of the job
 structure.  Or perhaps it should just be done by each particular state
 function.

 When the library has output to write out, the callback might not be
 able to accept all of it at the time it is called.  Deferred outgoing
 data needs to be stored in a buffer referenced from the job structure.

 I think it's always OK to try to flush this when entering rs_job_iter.
 I think it's OK to not do anything else until all the outgoing data
 has been flushed.

 In many cases we would like to pass a pointer into the input (or
 pread) buffer straight to the output callback.  In other cases, we
 need a different buffer to build up literal outgoing data.

 librsync deals with short, bounded-size headers and checksums, and
 with arbitrarily-large streaming data.  Although the commands are of
 bounded size, they are not of fixed size, because there are different
 encodings to suit different situations.

 The situation is very similar to fetching variable-length headers from
 a socket.  We cannot read the whole command in a single input, because
 we don't know how long it is.  As a general principle I think we
 should *not* read in too much data and buffer it, because this
 complicates things.  Therefore we need to read the type byte first,
 and then possibly read some parameters.
	# Callback API {#api_pull}

	As an alternative to \ref api_streaming, librsync provides a "pull-mode"
	callback interface where it will repeatedly call application-provided
	callbacks to get more input data and to accept output data.

	Pull jobs are also created using rs_sig_begin(), rs_loadsig_begin,()
	rs_delta_begin(), rs_patch_begin().

	However, rather than calling rs_job_iter(), the application should then call
	rs_job_drive(), passing an input and an output callback. rs_job_drive() takes
	an opaque pointer for both the input and output callback: this could be a
	`FILE*` or some similar object telling them what to read and write.

	## Non-blocking IO

	The librsync interface allows non-blocking streaming processing of data.
	This means that the library will accept input and produce output when it
	suits the application. If nonblocking file IO is used and the IO
	callbacks support it, then librsync will never block waiting for IO.

	Normally callbacks will read/write the whole buffer when they're called,
	but in some cases they might not be able to process all of it, or
	perhaps not process any at all. This might happen if the callbacks are
	connected to a nonblocking socket. Either of two things can happen in
	this case. If the callback returns ::RS_BLOCKED, then rs_job_iter() will
	also return ::RS_BLOCKED shortly.

	When an IO callback blocks, it is the responsibility of the application
	to work out when it will be able to make progress and therefore when it
	is worth calling rs_job_iter() again. Typically this involves a mechanism
	like `poll` or `select` to wait for the file descriptor to be ready.

	## Blocking IO

	The IO callbacks are allowed to block.
	This will of course mean that the application's call to
	rs_job_drive() will also block.

	## Partial completion

	IO callbacks are also allowed to process or provide only part of the requested
	data, as will commonly happen with socket IO.

	The library might not get as much input as it wanted when it is first
	called. If it gets a partial read, it needs to hold onto that
	valuable and irreplaceable data.

	It cannot keep it on the stack, because it will be lost if the read
	blocks. It needs to be kept in the job structure, or in somewhere
	referenced from there.

	The state function probably cannot proceed until it has all the needed
	input. So possibly this can be expressed at a high level of the job
	structure. Or perhaps it should just be done by each particular state
	function.

	When the library has output to write out, the callback might not be
	able to accept all of it at the time it is called. Deferred outgoing
	data needs to be stored in a buffer referenced from the job structure.

	I think it's always OK to try to flush this when entering rs_job_iter.
	I think it's OK to not do anything else until all the outgoing data
	has been flushed.

	In many cases we would like to pass a pointer into the input (or
	pread) buffer straight to the output callback. In other cases, we
	need a different buffer to build up literal outgoing data.

	librsync deals with short, bounded-size headers and checksums, and
	with arbitrarily-large streaming data. Although the commands are of
	bounded size, they are not of fixed size, because there are different
	encodings to suit different situations.

	The situation is very similar to fetching variable-length headers from
	a socket. We cannot read the whole command in a single input, because
	we don't know how long it is. As a general principle I think we
	should not read in too much data and buffer it, because this
	complicates things. Therefore we need to read the type byte first,
	and then possibly read some parameters.