--- Parsers --- -- SpaCy -- |- Philosophy - Fast, easy to use, Industrial Strength |- Models - Pre-trained, "en_core_web_trf", "en_core_web_sm/md/lg" |- Outputs - Provides universal dependancies (UD) labels by default |- Use - Very easy to use, a few lines of code to parse sentence and its dependencies |- Integration - Works very well with python data science stack, networkx integrates easily |- Performance - Larger models very accurate, smaller are very fast -- Stanford Stanza -- |- Philosophy - Pure python + modern version of Stanford Core NLP Research oriented, highly accurate |- Models - Pre-trained models on different treebanks can handle complex gramatical sctructures. |- Output - Universal dependencies |- Use - Good API, is clean and fits well with python |- Integration - Pure python integrated well with oher python libraries |- Performance - Accuracy is among the best available, Speed is slower than spacy non-transformer models -- Allen NLP -- |- Philosophy - Research first, built on python Designed for "state of the art" deep learning models in NLP "Go to choice if you plan to modify or train your own models" |- Models - Biffane dependancy parser is most widely used (highly accurate) |- Output - Universal Dependancies |- Use - More difficult than SpaCy or Stanza, requires better understanding of the libraries abstactions |- Integration - Excellent in the python ecosystem, for pre-trained model is overkill |- Performance - State-of-the-art accuracy, inference speed can be slower due to model complexity -- Spark NLP --- |- Philosophy - Built on Apache Spark or scalable, distributed NLP processing For massive datasets in a distributed computing |- Models - Provided its own anotated models often transformer architecture |- Output - Universal Dependancies |- Use - Good if familiar with spark ML API Setup more involved than pure python libraries |- Integration - Ideal for big data pipelines Unnesisarily heavy for single corpus analysis |- Performance - Very high accuracy Designed for speed and scale on clusters -- Overall -- |- SpaCy or Stanze -- SpaCy is much simpler to set up and use, - robust, highly accurate system to be set up quickly and relativly simply Stanza is more complex and requires more complex setup, - maximise baseline accuracy when parsing in exchange for speed and simlicity -- Choice -- |- SpaCy - Use SpaCy initially, if parsing errors appear will switch to Stanza to check issues