giovedì 23 giugno 2016

Parsing fixed width text with Java

...the unnecessarily complicated way.

Recently I've been asked to parse text files with fixed-width columns (see also ISAM).
The task is menial but I had two different schema and particularly large files (77 fields and about 21.000 records).
The schema were described in a table with the name and the width of each field.
Here's what I did

  1. pasted the table into Excel
  2. used a formula to cleanup the field names (removing spaces, hyphen, parentheses, whatever)
  3. used another formula to build a Java declaration for each field
  4. created a class (with Lombok) with the record structure

Here's a (short) example.

field name field width field name (Java) Java declaration
flg-store 1 flg_store @Column(width=1, index=47) private String flg_store;
Code NEW 10 code_new @Column(width=10, index=58) private String code_new;

And here's the class I built:

public class Item
 @Column(width=1, index=0) private String flg_store;
 @Column(width=10, index=1) private String code_new;

So the objects of this class will hold the data parsed from each record. The instructions for parsing are attached to each field via annotations.
Of course any other configuration method would do (propertes, XML, JSON, even the Excel file) but I find that having the code and the schema all in the same place is very effective.
Here is the annotation:

import java.lang.annotation.ElementType;
import java.lang.annotation.Retention;
import java.lang.annotation.RetentionPolicy;
import java.lang.annotation.Target;

public @interface Column

 public int width();
 public int index();

Ok, so how to use these informations? I found it easier to build a list of the instructions for parsing and then parse each line to build the object.


import it.digiwrite.anoto.utility.ReflectionHelper;

import java.lang.reflect.Field;
import java.util.ArrayList;
import java.util.List;

import lombok.Data;

public class FixedWidthParser<T>
 @Data static class FieldDescriptor
  int index;
  int width;
  String fieldName;
 List<FixedWidthParser.FieldDescriptor> mapIndexFieldName;
 public FixedWidthParser(Class<T> what)
  mapIndexFieldName = new ArrayList<FixedWidthParser.FieldDescriptor>();
  for (Field field : what.getDeclaredFields())
   Column column = field.getAnnotation(Column.class);
   FixedWidthParser.FieldDescriptor fd = new FixedWidthParser.FieldDescriptor();
   mapIndexFieldName.add(column.index(), fd);

 public T parseLine(String line, Class<T> cla22)
   int lastPosition = 0;
   T item = (T)cla22.newInstance();
   ReflectionHelper rh = new ReflectionHelper(item); // 1
   for (int i = 0; i<mapIndexFieldName.size(); i++)
    FixedWidthParser.FieldDescriptor fd = mapIndexFieldName.get(i);
    String token = line.substring(lastPosition, lastPosition+fd.width);
    lastPosition = lastPosition+fd.width;
    rh.set(fd.fieldName, token.trim()); // 2
   return item;
  catch (Exception e)
   throw new RuntimeException("Error parsing line: " + line);
  1. ReflectionHelper is a utility class to wrap an object; here it is used only to...
  2. ...easily set a field via reflection
  3. this code uses templates

And how do we use this?


import static org.junit.Assert.assertEquals;


import org.junit.Test;

public class FixedWidthParserTest

 public void flusso()
  String line = "Y0000011111";
  FixedWidthParser<Item> fwp = new FixedWidthParser<item>(FlussoCompletoSap.class);
  FlussoCompletoSap fcs = fwp.parseLine(line, FlussoCompletoSap.class);
  assertEquals("Y", fcs.getFlg_store());
  assertEquals("0000011111", fcs.getCodenew());

Nessun commento: