how to migrate and process data in java with low memory and faster

2023-04-09 19:44 问答作者：

I have an application that receives files with a flat table in DBF, which normalize and convert and insert into a MySQL database, files that grow to between 250,000 to 270,000 records and now consumes about 1.5 - 2 Gb at all I must repeat the process and by parts.

I check in the process if there are no prior records, help in HashMaps me the keys that are unique records to not save all the information in them, the problem here is that as to be uploading files to the database DBF total records data grows and every time you import something becomes slower and consume much more memory, easily get to 770,000 records in one table and a call to spend more of the 2,000,000

example:

DBF file has 91 fields and I have normalized data base gives 5 main tables and I must repeat the process of remembering the DBF 5 times.

nothing can take me about 1 hour migrate all DBF file of 250,000 records.

How can you minimize the time and RAM consumption, unless the RAM consumption is lower and I get several Exception in thread "main" java.lang.OutOfMemoryError: Java heap space

DBF file structure has many redundant data is like this (columns):

survey code
overall survey data
group data within the encuenta
details of the persons

example:

| survey_code | survey_columns | group_survey    | more_data | person_details |
|:------------|:---------------|:----------------|:----------|:---------------|
| 1           | ...            | 1               | ...       |1-oscar         |
| 1           | ...            | 1               | ...       |2-juan          |
| 1           | ...            | 2               | ...       |1-catalina      |
| 2           | ...            | 1               | ...       |7-john          |

note: As you can see there is enough data redundancy

the structure of the MySQL database is like this:

Survery (table)
  id
  survey_data

Group_in_survey (table)
  survey_id      
  group_data

Person (table)
  group_id
  person_details

the import process as divided according to the tables and the order defined by their dependence, for each table, runs through all the records in the DBF file.

Before I make a r开发者_高级运维equest to start the database and then I load into a HashMap information necessary to perform the verification if records exist before being added to the database in MySQL.

Also every time you add a record to the MySQL database that record tamibén add to HashMap, since one of the debugging process is duplicated. This latter is done in each of the processes

From what I understand you are inserting your records first into the HashMap to get rid of duplicates and they you copy it to the MySQL database. If that is the case you can try to use HugeCollections library and their HugeMap instead of simple HashMap

I think I'd prefer processing the flat-file, stripping duplicates first and saving to a temporary file, then doing a full bulk import. This way you'll never run in to memory issues and you'll have a backup of your first step if there are any issues.

It seems that you are doing something strange there , it looks that you are parsing whole dbf file loading it in to memory and then inserting one by one in to the database (hopefully not creating connection each time).

Now as far as i know dbf is easily read by stream readers so there is no need to load everything in to memory , i think it wold be better to do it in batches read 100 records insert all of them to database , or you can create sql file with inserts and then run whole file on mysql (but it would tale loads of hdd space).

now whole process should take only as much memory as processing one row and this should be stable.

i would need to know a bit more about structure of the dbf file and your tables to say anything more

继续阅读：data-migration memory-leaks memory-management performance

how to migrate and process data in java with low memory and faster

更多精彩内容

精彩评论

最新问答

央视是哪个频道？

请问买过的朋友，舒提啦旅行箱实际使用体验如何？？

检查不孕不育需要的费用？

海信ULED电视画质有什么不同的地方?？

钉子可以挂的住画框幕布吗？

问答排行榜

河神2九牛入海钓河妖是第几集河妖什么来历可活吞牛？

性激素六项检查的最佳时间是多久？多少钱？？

Easiest way to get words of one line from istream into a vector?

《梦在燃烧 (《三国演义》动画片主题曲)》MP3歌词-汤子星？

抽烟只抽炫赫门？