About Koen Serneels
Last week I was asked to write something in Java that is able to split a single 30GB XML file into smaller parts of configurable file size. The consumer of the file is going to be a middle-ware application that has problems with the large size of the XML. Under the hood it uses some kind of DOM parsing technique that causes it to run out of memory after a while. Since it’s vendor based middle-ware, we are not able to correct this ourselves. Our best option is to create some pre-processing tool that will first split the big file in multiple smaller chunks before they are processed by the middle-ware.
The XML file comes with a corresponding W3C schema, consisting of a mandatory header part followed by a content element which has several 0..* data elements nested. For the demo code I re-created the schema in simplified form:
The header is neglectable in size. A single data element repetition is also pretty small, lets say less less then 50kB. The XML is so big because of the number of repetitions of the data element. The requirements are that:
Source : http://www.javacodegeeks.com/2013/08/splitting-large-xml-files-in-java.html