Autor: Robert Šípek
Universal Dependencies (UD)
- a framework for consistent annotation of grammar (parts of speech, morphological features, and syntactic dependencies) across different human languages.
- cross-linguistically consistent treebank annotation for many languages
- The annotation scheme is based on
- an evolution of (universal) Stanford dependencies
- Google universal part-of-speech tags (Petrov)
- the Interset interlingua for morphosyntactic tagsets (Zeman)
- provide a universal inventory of categories and guidelines to facilitate consistent annotation of similar constructions across languages, while allowing language-specific extensions when necessary.
spaCy: New language & Training model
Statistical models
- predict linguistic attributes in context:
- Part-of-speech tags
- Syntactic dependencies
- Named entities
spaCy: Představení
spaCy: Dependency Parsing
displaCy ent
- JS library
- fetches JSON-formatted named entity annotations and transforms them into semantic HTML
- wraps the entities in the HTML5 element for highlighted text
- entity is assigned the data attribute
data-entity
- the labels are displayed and styled using only CSS selectors
NLP
NLP
- information extraction systems
- natural language understanding systems
- NLP is a subfield of Artificial Intelligence and is concerned with interactions between computers and human languages
- NLP is the process of analyzing, understanding, and deriving meaning from human languages for computers