Opened 2 years ago

Last modified 2 years ago

#294 assigned refactoring

better scheme for communication of file tree bits for synch files and other octopi

Reported by: Fred T. Hamster Owned by: bugdock
Priority: minor Milestone: milestone4
Component: feistymeow-octopus Version:
Keywords: synch, efficiency Cc:

Description (last modified by Fred T. Hamster)

problem case:
synch_files was either sucking way too much memory or cpu or something
because using it to copy pics to the usb drive just killed the machine.

solution:
update the approach in synching for how it tells other side what's up.

more detail:
just skitter along the tree instead of voluminously knowing all files.
reside in assurance that we won't traverse the same files twice;
then just send the stream of the files we saw over to the other side
in batches and let the other side reply about files it wants when.

thus we add some new cromp objects or whatever, or new commands on existing objects, so that there's a way to say:

=> here's new data about my side: here's a batch of files and dirs rooted someplace.
both sides can use that to tell the other side what's there, if we're talking about bidirectional synchronization.
but generally for a synch files op, the server has the source and the client wants to know.

ah, a decision point here. does the client or server do the work of comparing?
let's assume the client does it for now. server gets busy.

so, the conversation looks like this:
client asks server "what you got?"
server starts sending these packets that are mini rooted trees.
client would need to assemble its view of the tree from these.
=> can it do this piecemeal and not need to keep a whole tree around?

probably so, if we can tell the whole starting point of a tree chunk. after all, the client either already has a path to there or it doesn't. the file there already exists or it doesn't.

so after the client hears about all the stuff the server's got, and has made its mind up about what it wants from the server side,
it can then just send a bunch of file and dir names.
the file name tells the server that the client wants that file, on full path.
the dir name tells server that client wants to know the metadata for the dir (time+date, attributes)

all of these pieces can be sent over, even while the server is still barfing up trees; the client can start asking it based on what it already knows about its own tree (if it checks for things the server
says it has) and the client can be receiving files and dirs from the server even while the server is still iterating around its own trees.

eventually the flood of file and dir transfer requests peters out. that's the client being done asking.
the server eventually answers all the requests with data. that's the server being done supplying.
now, after all the data is present for files, and all the files are tagged with the appropriate metadata time,
the client has to go through and apply all the directory timestamp and attribute info.
(this assumes we want the copy to look just like the original in terms of time stamps and permissions)

so, this is a longwinded bug. more of a feature request.
it indicates redoing some of the communication objects in the cromp / octopus support for file transfers. many of the parts could already be there.

but the most important thing is that we want to lose the massive memory footprint and slowness to start getting work done; the client should start getting files right away, not having to traverse a whole huge tree before anything can be done.

we might also want to consider whether the server needs to iterate all its data before it can start serving requests.
and does it need to hold onto all that stuff?
maybe a caching mechanism would be better than total knowledge; we get to like 10000 files, and the whole thing falls apart due to memory usage and scale.

revised more packety scheme seems much more user friendly, since results start to occur faster. not sure if it's really more efficient... but it probably is, because holding onto so much memory can crush virtual memory usage, which in turn makes everything much slower if you're hitting the top.

Change History (2)

comment:1 Changed 2 years ago by Fred T. Hamster

Description: modified (diff)

comment:2 Changed 2 years ago by Fred T. Hamster

Owner: changed from Fred T. Hamster to bugdock
Status: newassigned
Note: See TracTickets for help on using tickets.