Serialization bug in ContextSpellChecker when training in cluster mode #1018

albertoandreottiATgmail · 2020-08-22T22:10:14Z

Description

When training the ContextSpellChecker in cluster mode, we get this error,

vocabTest = ['name1', 'name2', 'name3']
# context dependent spell checker
spell_checker1 = sparknlp.annotator.ContextSpellCheckerApproach()\
.setInputCols(["token"])\
.setOutputCol("spell")\
.setLanguageModelClasses(1400)\
.addVocabClass(label = "_NAME_", vocab = vocabTest)
# pipeline
pipeline1 =  Pipeline().setStages([document_assembler,
                                   sentence_detector,
                                   tokenizer, 
                                   spell_checker1,
                                   finisher
                                  ])
model = pipeline1.fit(df)
An error occurred while calling o1430.fit.
: java.io.NotSerializableException: com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach$$anon$1
Serialization stack:

Expected Behavior

It should train in the same manner it does when running locally.

Current Behavior

I fails serializing

Possible Solution

Steps to Reproduce

Context

Your Environment

Spark NLP version:
Apache NLP version:
Java version (java -version):
Setup and installation (Pypi, Conda, Maven, etc.):
Operating System and version:
Link to your project (if any):

The text was updated successfully, but these errors were encountered:

albertoandreottiATgmail · 2020-08-24T14:42:13Z

adding more information,

Py4JJavaError: An error occurred while calling o402.fit.
: java.io.NotSerializableException: com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach$$anon$1
Serialization stack:
- object not serializable (class: com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach$$anon$1, value: com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach$$anon$1@59accddd)
- element of array (index: 2)
- array (class [Lcom.johnsnowlabs.nlp.annotators.spell.context.parser.SpecialClassParser;, size 3)
- field (class: scala.collection.mutable.WrappedArray$ofRef, name: array, type: class [Ljava.lang.Object;)
- object (class scala.collection.mutable.WrappedArray$ofRef, WrappedArray(com.johnsnowlabs.nlp.annotators.spell.context.parser.DateToken$@3c712a44, com.johnsnowlabs.nlp.annotators.spell.context.parser.NumberToken$@28800c15, com.johnsnowlabs.nlp.annotators.spell.context.ContextSpellCheckerApproach$$anon$1@59accddd))
at org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scal

albertoandreottiATgmail assigned maziyarpanahi and albertoandreottiATgmail Aug 22, 2020

albertoandreottiATgmail changed the title ~~Potential serialization bug in ContextSpellChecker~~ Serialization bug in ContextSpellChecker when training in cluster mode Aug 24, 2020

maziyarpanahi added the bug label Jan 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialization bug in ContextSpellChecker when training in cluster mode #1018

Serialization bug in ContextSpellChecker when training in cluster mode #1018

albertoandreottiATgmail commented Aug 22, 2020 •

edited

Loading

albertoandreottiATgmail commented Aug 24, 2020

Serialization bug in ContextSpellChecker when training in cluster mode #1018

Serialization bug in ContextSpellChecker when training in cluster mode #1018

Comments

albertoandreottiATgmail commented Aug 22, 2020 • edited Loading

Description

Expected Behavior

Current Behavior

Possible Solution

Steps to Reproduce

Context

Your Environment

albertoandreottiATgmail commented Aug 24, 2020

albertoandreottiATgmail commented Aug 22, 2020 •

edited

Loading