The smaller the size of the chunks, the greater the number of chunks. from all my backups and create a separate backup for those (possibly one for each repository)? The default chunk size in Duplicati is 100K, and 4M in Duplicacy. 2. However, if the total size of your repository is less than 1G, a chunk size of 64K or 128K won't create too many chunks. Reduplication should be used only when something has been duplicated more than one time. Regarding the approach of trusting the backend, I wonder if anyone can give some indication as to what trusting implies in practice. If duplicacy trusts the storage backend by default, doesn't that already minimize the chances such problems become known (before it is too late)? The word duplicate is derived from the Latin word duplicare, which means to double. If the attempt was made shortly after the interruption Dropbox would not accept, but if you wait a few minutes and run the command the upload worked again. Of course, there can be some cases that may not be detected by hash verification (for instance, if there is a bug in reading the file during backup) but that is very unlikely. 4) If the upload was so small, why the total size of the remote has grown so much? The "trusting vs not trusting" decision affects the implementation of backup software. : Also posted on the Duplicati forum). I think an average chunk size of 1M should be good enough for general cases. Yes, this kind of open discussion is what moves open source software along (and what helps users make informed decisions). I know there are a lot of mysterious things going on, but one thing seems to be clear: with 128kB chunks, we are not seeing an increasing gap between duplicati storage use and duplicacy storage use, right? None above 1Mb. I agree, that's what I did in this last backup and in the previous one (with 1M chunk). The addition of a backup software as the middle layer only makes it worse if the software is ill-devised. I'm also thinking of doing a test with local storage for the mbox files I mentioned above. A user is seeing a lot of "Failed to read symlink" messages reported by Duplicacy, whereas Crashplan would just silently back up the empty folders without backing up any files there. Ironically, this means that duplicati is actually doing a (much) better job at deduplication between revisions... Hm, this actually further worsens the space use disadvantage of duplicacy: I believe the original tests posted on the duplicati forum did not include multiple revisions. SizeOfAddedFiles: 0 It's a shame that neither Dropbox nor Google Drive provides an easy way to view the size of a folder. Many of these proposals are often duplicative and tied to a news event. Duplicati vs Duplicacy vs others? But the uploads on 26 and 27 January made me change my mind (if not the one on the 26th then definitely the one on the 27th). Denoting a manner of play in cards in which partnerships or teams play the same deals and compare scores at the end: duplicate bridge. But this is a tradeoff we have to make, because the main goal is the cross-client deduplication, which is completely worth the lose in deduplication efficiency on a single computer. Now that we have a nested chunk structure for all storages, and multi-threading support, it perhaps makes sense to change the default size to 1M. Except sometimes redundancy can be good, in information theory, and in safety devices. So now we're anxiously awaiting your results to see how much improvement we can get out of 128kb chunks. save. does anyone know if this refers only to the Dropbox Application and it's Sync behavior, Thangs a lot, towerbr, for the testing. Or is it rather the values from Rclone that should be compared? (Copied from the discussion at https://github.com/gilbertchen/duplicacy/issues/334#issuecomment-360641594): There is a way to retrospectively check the effect of different chunk sizes on the deduplication efficiency. ‘Nora makes duplicate keys for whoever asks and encourages them to walk right in whenever they please.’ ‘The government will issue a new or duplicate number to the name on the certificate.’ ‘Based upon analysis of duplicate samples, reproducibility was better … gchen duplicacy (4M chunks, variable) uses x GB In the end, the space used in the backend (contemplating the 3 versions, of course) was: That is, with these few (tiny) changes Duplicati added 24 Mb to the backend and Duplicacy 425 Mb. So by "average chunk size" you mean that it should not be 1M fixed chunks? You can create a new repository with the same repository id and storage url, but add a new storage with -c 128k being the only argument, then restore the repository to the revision before the database rebuild, back up to the additional storage, and finally restore the repository to the revision after the database rebuild and back up to the additional storage. Total data loss? I knew from the beginning that by adopting a relatively large chunk size, we are going to lost the deduplication battle to competitors. SizeOfModifiedFiles: 877320341 (P.S. SizeOfAddedFiles: 2961 for each repository)? But then again, you will only save space at the cost of eliminating a revision compared to duplicati. large ones) when they are changed. So this is basically what I'm after: what kind of software failures are we up against? By disabling compression and encryption, and applying an optimization on the hash function, they were able to achieve the same or even slightly better performance (than Duplicacy with compression and encryption), but the CPU utilization was still significantly higher. to become duplicate. File chunks: 178 total, 922,605K bytes; 26 new, 176,002K bytes, 124,391K bytes uploaded I'm puzzled too. If you want to try, you can follow the steps in one of my posts above to replay the backups -- basically restoring from existing revisions in your Dropbox storage and then back up to a local storage configured to use the variable-size chunking algorithm. Are you worried that it might be so and are testing it, or do you think it is so but Any edition of Windows 10, 8, 7 (SP1) and Windows Server 2019, 2016, 2012 R2 An airplane has two or three parallel hydraulic systems. Duplicate definition is - consisting of or existing in two corresponding or identical parts or examples. We don't know the exact figures for the test with 1MB chunks test, but we assume the waste of space was reduced. Replicate means to reproduce something, to construct a copy of something, to make a facsimile. Are you worried that it might be so and are testing it, or do you think it is so but are sticking to duplicacy for other reasons? ing. Duplicate content on a site is not grounds for action on that site unless it appears that the intent of the duplicate content is to be deceptive and manipulate search engine results. Then the number of upload requests is also greater, which makes total uploading more time-consuming. Maybe. So the remaining 700 Gb were wasted, and i decided to use for backup. 1. Nope, the idea is to do an off-site backup. File chunks: 7288 total, 892,984K bytes; 2362 new, 298,099K bytes, 251,240K bytes uploaded Metadata chunks: 6 total, 598K bytes; 6 new, 598K bytes, 434K bytes uploaded Only problem: even with a backup so simple and small, in the second and third execution Duplicati showed me a "warning", but I checked the log and: It seems to me a behavior already known to Duplicati.