if anyone found himself in a similar situation,
i have a db with 300milions in clickhouse db (500go) and my task is to migrate the data to starrocks db and both are using mysql as client
the problem is the schema in clickhouse is just a string representation of json and the second db has 10 tables so i have to process the json and convert its properties to the appropriate table,
my method is export 1million record as csv file ( because its faster than using select sql satetemnt) and im setting a cursor so the next time i’ll pull the next 1mill and process the data using python and send it as put request to starrocks because starrocks expose and endpoint to save files ( this is the fastest way)
the problem is when i reach + 30mil the process of pulling 1mil goes from 1sec to 20min and when reachin +50mil it take like 40min any solution please?
submitted by /u/Youth-Character
[link] [comments]