To use Amazon Comprehend’s custom entity recognition:
- provide a data set for model training purposes, with either a set of annotated documents, or a list of entities and their type label (such as PRODUCT_CODES) and a set of documents containing those entities.
- You can train a model on up to 12 custom entities at once.
-
Once your model is trained, you can search for up to 12 of those custom entities you trained for, in each entities detection job.
Training Custom Entity Recognizers
- analyze your documents to find entities specific to your needs, rather than limiting you to the preset entity types already available.
-
You can identify almost any kind of entity, simply by providing a sufficient number of details to train your model effectively.
two ways to provide data to Amazon Comprehend
- Annotations
- uses an annotation list that provides the location of your entities in a large number of documents so Amazon Comprehend can train on both the entity and its context.
- increase the accuracy – more accurate context to the custom entity you’re seeking
- When the meaning of the entities could be ambiguous and context-dependent.
- Amazon could either refer to the river in Brazil, or the online retailer Amazon.com
- Entity Lists
- This provides only a limited context, and uses only a list of the specific entities list so Amazon Comprehend can train to identify the custom entity.
- comma-separated value (CSV) file
- Text – The text of an entry example exactly as seen in the accompanying document corpus
- např. Brno
- Type – The customer-defined entity type.
- uppercase, underscore separated string such as MANAGER or SENIOR_MANAGER
- Up to 12 entity types can be trained per model.
- A minimum of 200 entity matches are needed per entity
- např. CITY
- Text – The text of an entry example exactly as seen in the accompanying document corpus
Zdroje
- https://docs.aws.amazon.com/comprehend/latest/dg/training-recognizers.html