WL#4782: Improve documentation of backup stream library

Affects: Prototype Only   —   Status: On-Hold

The in-code documentation of backup stream library is not satisfactory.

Things to take into account in the documentation:
- memory allocation: which memory should be freed by user and which is freed by
the library.
- which members are filled with bstream_rd_header() and other functions.

Here are some comments from Ingo.

>> Does that mean that the documentation provided in stream_v1_service.h is
>> not clear enough? Any suggestions how to improve it?
> 
> 
> Yes. I would like to see an overview. What is the stream library, how
> does it work? What functions are to be called in which order? Which
> call-back functions does it call on read and on write? Some words about
> "catalog coordinates". What are requirements for the catalog that cannot
> be freely chosen by the application?
> 
> Full, complete function comments. Mention of which data is allocated by
> the stream library, what needs to be copied by the application, and what
> needs to be freed by the application.
> 
> ...
>>> +  /**
>>> +    Function reading bytes from the underlying media.
>>> +    For specification, see @c bstream_read_part() function.
>>> +  */
>>> +  typedef int (*as_read_m)(void *, bstream_blob*, bstream_blob);
>>>
>>> However, bstream_read_part() is the caller of the as_read_m typed
>>> function. So this comment basically says "Go and figure out yourself,
>>> how it is used." That's what I had to do.
>>>
>> The intention was to say that as_read_m, which has the same parameters
>> as bstream_read_part() should also implement the same behaviour (both
>> functions are reading bytes from the underlying stream).
>>
>> I don't like copying the same documentation in two places because then
>> they are bond to diverge eventually.
>>
>> Would such change in the documentation explain it better?
> 
> 
> Perhaps. At least the added parameter descriptions are valuable. But if
> both functions "should also implement the same behaviour", why do we
> have two of them at all?
> 
> And keeping the documentation somewhere deep inside a file, which an
> implementor should normally not need to look at, doesn't seem right.
> 
> If you move the main documentation to as_read_m() and point to it from
> bstream_read_part() that would be more acceptable IMHO.
> 
> ...
>>> The 'data' parameter is a reference to an st_blob structure, which
>>> contains two pointers, 'begin' and 'end'. When the function is called,
>>> the pointers point into a buffer to receive the data from the stream.
>>> They are to be placed starting at 'begin' and added up to 'end'. On
>>> return of the function, the 'end' pointer is adjusted towards 'begin',
>>> if less than the requested amount of data was read from the stream.
>> I think it is 'begin' pointer which is adjusted towards 'end':
>>
>>   buf  [---------------=================--------]
>>
>>   data blob before reading:
>>
>>                       [=================]
>>
>>   data blob after reading:
>>
>>                        **********[======]
>>                        ^          ^
>>                        |          space remaining free
>>                        bytes which have been read
>>
> 
> 
> Unfortunately, we seem to have a completely different view of things.
> 
> While you may know, where begin and end of the different buffer objects
> point to, I can just guess.
> 
> If I am right, that in the middle row, we have data.begin at the first
> '=' and data.end at ']', and in the lower row data.begin at the first
> '*' and data.end at '[', then, in my view, data.end references a lower
> memory address as before the call, while data.begin stayed the same.
> This is what I call "the 'end' pointer is adjusted towards 'begin'".
> 
> To release me from guessing, and give me some certainty, I suggest to
> change the chart similar to this (or a correct version thereof):
> 
>   buf  [---------------=================--------]
>         ^                                       ^
>         envelope.begin                          envelope.end
> 
>   data blob before reading:
> 
>                       [=================]
>                        ^                ^
>                        data.begin       data.end
> 
>   data blob after reading:
> 
>                        data.begin
>                        |         data.end
>                        v         v
>                        **********[======]
>                        ^          ^
>                        |          space remaining free
>                        bytes which have been read
> 
> 
> ...
>>>> What is the intention of envelope?
>>> From the reference implementation and the calling function, I guess it
>>> is the buffer as it has been allocated and into which the 'data'
>>> pointers point. It is not necessary that 'data->begin' is at
>>> 'envelope->begin', nor that 'data->end' is at 'envelope->end'. But both
>>> 'data' pointers must be between the 'envelope' pointers.
>>>
>>> However, the parameter is never used. Not in my implementation, nor in
>>> the reference implementation. I don't even have an idea, what it could
>>> be useful for.
>>>
>> If I remember correctly the intention was to not exclude a following,
>> very remote I agree, use scenario:
>>
>> Suppose that the underlying stream is organized so that the proper
>> content is wrapped in some kind of headers/footers. For example
>> checksums or something like this. Having such interface to the stream
>> read function allows the implementation to use the provided buffer to
>> store the header/footer data if there is enough space for it. So the
>> implementation could work like this:
>>
>> If next thing in the stream is N-bytes long header followed by data then
>> do the following:
>>  1. See if there is N-bytes available between beginning of the buffer and
>>     start of the area where data should be stored.
>>  2. If yes, then save these N-bytes on a side.
>>  3. Read into the buffer (the envelope) the header followed with as much
>> data as
>>     needed (and can fit) aligned so that data falls in the correct place.
>>  4. Do header processing.
>>  5. Restore the N-bytes overwriting the header.
>>
>> This way implementation can use the provided buffer for low-level reads
>> and avoid copying big amounts of data (as compared to the presumably
>> short header/footer).
>>
>> I'm not going to defend this design or argue that we need this level of
>> generality. I designed it like this because I like general designs.
>> Later it was not refused by reviewers and... here it is!
> 
> 
> I do not aim at changing the design. If I had a wish free, I'd wish that
> parts of the above is included in the function documentation.
> 
> The design is a bit unusual for a cache. So it deserves more
> documentation as usual.