WL#4782: Improve documentation of backup stream library
Affects: Prototype Only
—
Status: On-Hold
The in-code documentation of backup stream library is not satisfactory. Things to take into account in the documentation: - memory allocation: which memory should be freed by user and which is freed by the library. - which members are filled with bstream_rd_header() and other functions. Here are some comments from Ingo. >> Does that mean that the documentation provided in stream_v1_service.h is >> not clear enough? Any suggestions how to improve it? > > > Yes. I would like to see an overview. What is the stream library, how > does it work? What functions are to be called in which order? Which > call-back functions does it call on read and on write? Some words about > "catalog coordinates". What are requirements for the catalog that cannot > be freely chosen by the application? > > Full, complete function comments. Mention of which data is allocated by > the stream library, what needs to be copied by the application, and what > needs to be freed by the application. > > ... >>> + /** >>> + Function reading bytes from the underlying media. >>> + For specification, see @c bstream_read_part() function. >>> + */ >>> + typedef int (*as_read_m)(void *, bstream_blob*, bstream_blob); >>> >>> However, bstream_read_part() is the caller of the as_read_m typed >>> function. So this comment basically says "Go and figure out yourself, >>> how it is used." That's what I had to do. >>> >> The intention was to say that as_read_m, which has the same parameters >> as bstream_read_part() should also implement the same behaviour (both >> functions are reading bytes from the underlying stream). >> >> I don't like copying the same documentation in two places because then >> they are bond to diverge eventually. >> >> Would such change in the documentation explain it better? > > > Perhaps. At least the added parameter descriptions are valuable. But if > both functions "should also implement the same behaviour", why do we > have two of them at all? > > And keeping the documentation somewhere deep inside a file, which an > implementor should normally not need to look at, doesn't seem right. > > If you move the main documentation to as_read_m() and point to it from > bstream_read_part() that would be more acceptable IMHO. > > ... >>> The 'data' parameter is a reference to an st_blob structure, which >>> contains two pointers, 'begin' and 'end'. When the function is called, >>> the pointers point into a buffer to receive the data from the stream. >>> They are to be placed starting at 'begin' and added up to 'end'. On >>> return of the function, the 'end' pointer is adjusted towards 'begin', >>> if less than the requested amount of data was read from the stream. >> I think it is 'begin' pointer which is adjusted towards 'end': >> >> buf [---------------=================--------] >> >> data blob before reading: >> >> [=================] >> >> data blob after reading: >> >> **********[======] >> ^ ^ >> | space remaining free >> bytes which have been read >> > > > Unfortunately, we seem to have a completely different view of things. > > While you may know, where begin and end of the different buffer objects > point to, I can just guess. > > If I am right, that in the middle row, we have data.begin at the first > '=' and data.end at ']', and in the lower row data.begin at the first > '*' and data.end at '[', then, in my view, data.end references a lower > memory address as before the call, while data.begin stayed the same. > This is what I call "the 'end' pointer is adjusted towards 'begin'". > > To release me from guessing, and give me some certainty, I suggest to > change the chart similar to this (or a correct version thereof): > > buf [---------------=================--------] > ^ ^ > envelope.begin envelope.end > > data blob before reading: > > [=================] > ^ ^ > data.begin data.end > > data blob after reading: > > data.begin > | data.end > v v > **********[======] > ^ ^ > | space remaining free > bytes which have been read > > > ... >>>> What is the intention of envelope? >>> From the reference implementation and the calling function, I guess it >>> is the buffer as it has been allocated and into which the 'data' >>> pointers point. It is not necessary that 'data->begin' is at >>> 'envelope->begin', nor that 'data->end' is at 'envelope->end'. But both >>> 'data' pointers must be between the 'envelope' pointers. >>> >>> However, the parameter is never used. Not in my implementation, nor in >>> the reference implementation. I don't even have an idea, what it could >>> be useful for. >>> >> If I remember correctly the intention was to not exclude a following, >> very remote I agree, use scenario: >> >> Suppose that the underlying stream is organized so that the proper >> content is wrapped in some kind of headers/footers. For example >> checksums or something like this. Having such interface to the stream >> read function allows the implementation to use the provided buffer to >> store the header/footer data if there is enough space for it. So the >> implementation could work like this: >> >> If next thing in the stream is N-bytes long header followed by data then >> do the following: >> 1. See if there is N-bytes available between beginning of the buffer and >> start of the area where data should be stored. >> 2. If yes, then save these N-bytes on a side. >> 3. Read into the buffer (the envelope) the header followed with as much >> data as >> needed (and can fit) aligned so that data falls in the correct place. >> 4. Do header processing. >> 5. Restore the N-bytes overwriting the header. >> >> This way implementation can use the provided buffer for low-level reads >> and avoid copying big amounts of data (as compared to the presumably >> short header/footer). >> >> I'm not going to defend this design or argue that we need this level of >> generality. I designed it like this because I like general designs. >> Later it was not refused by reviewers and... here it is! > > > I do not aim at changing the design. If I had a wish free, I'd wish that > parts of the above is included in the function documentation. > > The design is a bit unusual for a cache. So it deserves more > documentation as usual.
Copyright (c) 2000, 2024, Oracle Corporation and/or its affiliates. All rights reserved.