Revolutionary Fix Unveiled: Data Replicator Receives Major Error Handling Overhaul!
New Exponential Backoff Mechanism Promises to Resurrect Lost Data and Enhance Replication Reliability for Seamless Backfill Operations!
Commit Details:
fix(replicator): handle Hub errors in backfill (#1809) ## Motivation I tried multiple times to sync the replicator. I found that we may have loss of data on the initial backfill: If the grpc request has an error, they seems to be no retried and forgotten. ## Change Summary I added an exponential backoff on backfill jobs. ## Merge Checklist _Choose all relevant options below by adding an `x` now or at any time before submitting for review_ - [X] PR title adheres to the [conventional commits](https://www.conventionalcommits.org/en/v1.0.0/) standard - [X] PR has a [changeset](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#35-adding-changesets) - [ ] PR has been tagged with a change label(s) (i.e. documentation, feature, bugfix, or chore) - [ ] PR includes [documentation](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#32-writing-docs) if necessary. - [X] All [commits have been signed](https://github.com/farcasterxyz/hub-monorepo/blob/main/CONTRIBUTING.md#22-signing-commits) ## Additional Context I had a bunch if this kind of errors during the initial backfill in the logs: ``` ERROR (1): Job failed {"jobName":"BackfillFidRegistration","jobId":"246281","reason":"Unable to backfill","errorName":"Error","errorMessage":"Unable to backfill","errorStack":"Error: Unable to backfill\n at getOnChainEventsByFidInBatchesOf (/home/node/app/apps/replicator/build/hub.js:18:23)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.run (/home/node/app/apps/replicator/build/jobs/backfillFidRegistration.js:19:26)\n at async /home/node/app/node_modules/bullmq/dist/cjs/classes/child-processor.js:69:33"} ERROR (1): Job failed {"jobName":"BackfillFidUserData","jobId":"5069991","reason":"Unable to fetch UserData messages for FID 69477","errorName":"Error","errorMessage":"Unable to fetch UserData messages for FID 69477","errorStack":"Error: Unable to fetch UserData messages for FID 69477\n at getUserDataByFidInBatchesOf (/home/node/app/apps/replicator/build/hub.js:129:19)\n at process.processTicksAndRejections (node:internal/process/task_queues:95:5)\n at async Object.run (/home/node/app/apps/replicator/build/jobs/backfillFidUserData.js:12:26)\n at async /home/node/app/node_modules/bullmq/dist/cjs/classes/child-processor.js:69:33"} ``` Errors on constraints still appear after the backfill, but we need a more resilient in the initial backfill. <!-- start pr-codex --> --- ## PR-Codex overview The focus of this PR is to enhance error handling in the `replicator` app when interacting with the Hub. ### Detailed summary - Added `retryHubCallWithExponentialBackoff` function for handling errors with exponential backoff strategy. - Updated functions to retry calls to Hub methods with backoff mechanism in case of errors. - Improved error messages for backfilling events, proofs, casts, reactions, links, verifications, and user data. - Added logging for error warnings during the retry process. > ✨ Ask PR-Codex anything about this PR by commenting with `/codex {your question}` <!-- end pr-codex -->