it.jrc.entitymatcher
Class JRCNames

java.lang.Object
  extended by it.jrc.entitymatcher.JRCNames

public class JRCNames
extends java.lang.Object

Copyright 2011 Joint Research Centre of the European Commission Extract from the full LICENSE AGREEMENT By using, copying or distributing this Software or any portion thereof, YOU (the "User") ACCEPT ALL TERMS AND CONDITIONS OF THIS LICENCE, including in particular the limitations on use, transferability, warranty and liability. The following terms and conditions are enforceable against you and any legal entity that obtained the Software and on whose behalf it is used. If you are agreeing to these terms on behalf of a company or other legal entity, you represent that you have the legal authority to bind that company or legal entity to these terms. IF YOU DO NOT HAVE SUCH AUTHORITY OR IF YOU DO NOT WISH TO BE BOUND TO THESE TERMS DO NOT USE THIS SOFTWARE. The European Union (hereinafter "the Licensor") is the owner of the copyright and other intellectual and industrial property rights, trade secrets, and know-how related to the software JRC-Names over which is has the power of disposal regardless geographical or other limitations.


Constructor Summary
JRCNames()
          Method JRCNames
 
Method Summary
 void initKnownEntities(java.io.BufferedReader br, java.util.Hashtable<java.lang.String,java.lang.StringBuilder> entities, java.util.HashMap<java.lang.Integer,java.lang.String> names, java.util.ArrayList<java.lang.String> code2lan)
          Parse the entities file.
static void main(java.lang.String[] args)
          Method main -- always required
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

JRCNames

public JRCNames()
Method JRCNames

Method Detail

initKnownEntities

public void initKnownEntities(java.io.BufferedReader br,
                              java.util.Hashtable<java.lang.String,java.lang.StringBuilder> entities,
                              java.util.HashMap<java.lang.Integer,java.lang.String> names,
                              java.util.ArrayList<java.lang.String> code2lan)
                       throws java.lang.Exception
Parse the entities file.

Parameters:
br - is a buffered reader for the entities file that allows us to read the file line by line
entities - is a Hashtable where the information structure associated with each entity is stored.
names - contains the 'preferred display name' for each entity ID
code2lan - maps stores the language code. The INDEX of the language code is stored in the entity information structure.
Throws:
an - exception... IO or otherwise... purists will have a heart attack, but it simply throws an exception
java.lang.Exception

main

public static void main(java.lang.String[] args)
                 throws java.lang.Exception
Method main -- always required

Parameters:
entity - file, the gzip file containing the JRC names definition -- option 1:
name, - will dump out all name variants for this name -- option 2:
text - file, utf-8 encoded file containing the text to be parsed
language - code. The language code that corresponds to the text file language
Throws:
java.lang.Exception