define unicode aliases for character categories
Character categories are listed at http://www.fileformat.info/info/unicode/category/index.htm
In Java it is possible to use the regular expression matcher to check if a character is contained in an unicode category:
final Pattern p = Pattern.compile("\\p{Nd}"); final Matcher matcher = p.matcher("1"); System.out.println(matcher.matches( ));
Using Character.java it is also possible to collect the characters defined in each category:
Map<Integer, List<Integer>> map = new HashMap<Integer, List<Integer>>( ); int found = 0; for (int i = Character.MIN_CODE_POINT; i <= Character.MAX_CODE_POINT; i++) { if (Character.isDefined(i)) { final int type = Character.getType(i); List<Integer> list = map.get(type); if (list == null) { list = new ArrayList<Integer>( ); map.put(type, list); } list.add(i); found++; } }
This can be done before generating the java code. Each category alias can then be replaced by the characters in the map for this category.