Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integration failures on linux.cs #1629

Closed
lintool opened this issue Sep 11, 2023 · 0 comments
Closed

Integration failures on linux.cs #1629

lintool opened this issue Sep 11, 2023 · 0 comments

Comments

@lintool
Copy link
Member

lintool commented Sep 11, 2023

At d8dc5b3 setting new env on linux.cs:

deactivate

cd ~/virtualenv/
python -m venv pyserini-dev
source ~/virtualenv/pyserini-dev/bin/activate

export JAVA_HOME=/usr/lib/jvm/java-1.11.0-openjdk-amd64

pip install torch faiss-cpu
pip install -e .

Running integrations:

python -m unittest discover -s integrations/sparse
python -m unittest discover -s integrations/dense
python -m unittest discover -s integrations/clprf
python -m unittest discover -s integrations/papers

dense and papers pass fine; getting failures for sparse and clprf.

Failures for sparse:

For sparse, these were the failures:

======================================================================
ERROR: test_reranking (test_lucenesearcher_check_ltr_msmarco_document.TestLtrMsmarcoDocument)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/sparse/test_lucenesearcher_check_ltr_msmarco_document.py", line 50, in test_reranking
    result = subprocess.check_output(f'python tools/scripts/msmarco/msmarco_doc_eval.py --judgments tools/topics-and-qrels/qrels.msmarco-doc.dev.txt --run ltr_test/{outp}', shell=True).decode(sys.stdout.encoding)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python tools/scripts/msmarco/msmarco_doc_eval.py --judgments tools/topics-and-qrels/qrels.msmarco-doc.dev.txt --run ltr_test/run.ltr.msmarco-pass-doc.test.trec' returned non-zero exit status 1.

======================================================================
ERROR: test_reranking (test_lucenesearcher_check_ltr_msmarco_passage.TestLtrMsmarcoPassage)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/sparse/test_lucenesearcher_check_ltr_msmarco_passage.py", line 50, in test_reranking
    result = subprocess.check_output(f'python tools/scripts/msmarco/msmarco_passage_eval.py tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt ltr_test/{outp}', shell=True).decode(sys.stdout.encoding)
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command 'python tools/scripts/msmarco/msmarco_passage_eval.py tools/topics-and-qrels/qrels.msmarco-passage.dev-subset.txt ltr_test/run.ltr.msmarco-passage.test.tsv' returned non-zero exit status 1.

----------------------------------------------------------------------
Ran 60 tests in 36698.206s

FAILED (errors=2)

To reproduce:

python -m unittest integrations.sparse.test_lucenesearcher_check_ltr_msmarco_passage.TestLtrMsmarcoPassage
python -m unittest integrations.sparse.test_lucenesearcher_check_ltr_msmarco_document.TestLtrMsmarcoDocument

More detailed error trace:

Traceback (most recent call last):
  File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "/u4/jimmylin/pyserini/pyserini/search/lucene/ltr/__main__.py", line 291, in <module>
    batch_info = searcher.search(dev, queries)
  File "/u4/jimmylin/pyserini/pyserini/search/lucene/ltr/_search_msmarco.py", line 243, in search
    for dev_extracted in self.batch_extract(dev, queries, self.fe):
  File "/u4/jimmylin/pyserini/pyserini/search/lucene/ltr/_search_msmarco.py", line 208, in batch_extract
    print(group.mean())
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/frame.py", line 11338, in mean
    result = super().mean(axis, skipna, numeric_only, **kwargs)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/generic.py", line 11969, in mean
    return self._stat_function(
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/generic.py", line 11926, in _stat_function
    return self._reduce(
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/frame.py", line 11207, in _reduce
    res = df._mgr.reduce(blk_func)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/internals/managers.py", line 1459, in reduce
    nbs = blk.reduce(func)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/internals/blocks.py", line 377, in reduce
    result = func(self.values)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/frame.py", line 11139, in blk_func
    return op(values, axis=axis, skipna=skipna, **kwds)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/nanops.py", line 147, in f
    result = alt(values, axis=axis, skipna=skipna, **kwds)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/nanops.py", line 404, in new_func
    result = func(values, axis=axis, skipna=skipna, mask=mask, **kwargs)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/nanops.py", line 720, in nanmean
    the_sum = _ensure_numeric(the_sum)
  File "/u4/jimmylin/virtualenv/pyserini-dev/lib/python3.10/site-packages/pandas/core/nanops.py", line 1678, in _ensure_numeric
    raise TypeError(f"Could not convert {x} to numeric")
TypeError: Could not convert ['1000000100000410000061000017100003010000831000097100013100017010002010002321000272100031910004591000461000472100050910005191000574100058510006191000678100068110007981000864100090610009511000959100110810012791001381100139710014541001810100190310019261001981100199910020581002145100214810021971002238100225210022741002330100242610024821002501002554100258410025851002596100271610027371002887100288910029381002940100299710030031003006100301510030710031141003210100321310032391003277100329910033191003329100333410033511003359100344510034811003482100350710035571003561100359010036031003641003695100383110038491003875100388010038841003973100399710041671004191100419910042281004233100424010042431004254100425810043221004493100492110049401004949100511310051311005163100519110054751005476100550010055201005586100559510056531005678100579810058881005907100594910060001006161006199100645910064891006509100657810065801006611006751100679110068521006911100692210069871007242100738210074731007550100760610076281007673100769110076961007875100793410079591007972100820810085151008516100883010089111008947100895110089681008977100897910090231009109100918310092371009388100952710096101009668100969510097241009742100974910099591009961100999410100481010057101005910101511010173101027710102871010524101052710105371010607101061510106701010700101100310110181011021101104410111201011140101116610112481011328101138110113821011512101152910116181011663101171310117211011811101186010119251012026101232810123291012431101246410125471012780101286510128661013114101322910132671013304101336710134241013492101357010135791013592101361510137971013965101411510141311014132101421010142421014264101451101469710148841014885101491110150551015307101534710155561015641101571015766101601310160151016154101625410162811016406101646010165471016565101658310166111016676101670310167901016869101687910169151016943101720410172761017348101747610174981017524101752910175371017605101768710176921017706101773410177731017775101783010178921017952101796610179711018032101805610183591018525101865810188071018918101917910192001019236101926210193561019405101941410194331019470101960210196491019705101972410197831019787101983010201981020244102037610205102050010207101020724102090710209151020999102105310210651021170102124110212771021318102132410213271021446102153210215541021605102163910216791021682102169510217971021900102190710219311021971102202210221241022132102217810221981022359102237010224101022442102257710226201022621102263010227121022735102276210227691022782102283210229071022911102302510231111023363102378210238381023850102403410240691024166102417610242211024288102430010243051024312102452810245911024592102459910246671024669102467210247271024835102489310249041024950102506102518810252591025270102529010253481025483102562410257141025801102589510259911026098102614810262581026271026372102641026711102676810267751026789102679910269510269911027178102720910273731027610276501027669102778510278121027817102786510279191028098102813110281791028538102855510285981028608102865210286701028711102874210287521028753102875510287961029003102901610290301029031102905810291241029181102929110294021029492102954410295521029617102968110296941029772102979110299081029909103017610302151030230103027110303241030378103038110303881030446103045110306171030623103072210308231030924103103210310331031047103105410311181031173103121031240103125103145610315021031580103167910316821031684103186110319091031910103197610319991032011103201910320741032156103218210321981032281103234110327581032822103300710330921033205103324910332501033296103339810334431033534103358010336521033703103371810337251033759103391210339271033962103403910340501034136103417210342041034409103444610345871034595103466610346791034680103470310347611034839103484510350061035078103509810352281035247103527810353211035367103537910353831035410103553510357191035805103586110358741035931103600210360051036214103624410363801036385103654210366271036656103667510367821036784103680010369991037033103710410371161037188103725010373021037341103737310374071037662103768610376891037722103778110378171037826103787210378811038161103818410385271038592103867810386851038719103872410387551038830103884910388591038871103887910390021039052103919510392981039346103936110394951039521103958610397281039746104002210400301040038104006410400821040088104009910402381040312104035310404091040461104050710405321040684104069410407031040793104084810409591041043104105010411461041159104122610414731041520104170310417141041753104192410419481041951104209910421581042364104239910424261042488104250710426261042676104275210428001042901042978104306410433371043413104354510435681043587104365810437021043815104391410439551043969104399510440411044244104424910447551044809104507110450721045135104520310452081045227104522910453471045374104549410455271045540104555410455671045709104571710458261045853104585510460421046047104609310461611046384104638710464631046475104652010465671046569104664810467361046750104693110469521046969104701010470121047088104713810471521047160104716210472691047365104738610475481047556104759210475991047625104762910476421047662104770010477021047708104773810477941047833104784310478541047913104791710479871048185104828110482821048303104835910483611048363104837710483811048565104858510486421048876104891710489951049085104920010492211049329104936810494561049484104976710497741049791104986710498941049926104995510500071050033105023110502531050275105067010506951050747105077810508571050923105109510511081051112105121110512141051223105122910512571051279105128510513071051339105135210513721051422105147510515201051530105157110517231051755105180810518861051902105194210519431051990105208910521151052274105241410524211052427105256310525851052610105261510526401052717105294810529651052985105311110532191053253105361110537161053896105390110539311053992105399710540231054071105418610541891054328105433910544381054440105445010544511054468105459310545951054610105470710549231054958105496910549991055125105517610551971055351105549105550510557171055889105592110559401056057105606010561591056163105621110562651056303105640510564201056437105644610564821056548105658010566441056726105674210567581056850105695010570071057015105709105709810571121057139105716810572511057270105733410573671057446105747610574881057539105763110576561057708105775710579371057996105803610581001058140105814110581421058165105818210582711058284105832510584151058425105844210584701058515105860110586041058717105879210588221058853105888510589521058978105904510590771059253105928710594201059421105944210594961059504105960110596191059646105969810598011059820105997010600391060040106030510603421060391106046210604961060566106061610606231060795106086810608811061167106121010612371061251061324106138210614721061762106199410621901062223106223310623321062334106235010624571062511106258910626031062609106268710627441062784106292810629611063177106334910633711063461106347810636071063644106365910637021063709106375810637651063777106389210639741064140106415510641951064206'] to numeric
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/tools/scripts/msmarco/msmarco_passage_eval.py", line 184, in <module>
    main()
  File "/u4/jimmylin/pyserini/tools/scripts/msmarco/msmarco_passage_eval.py", line 173, in main
    metrics = compute_metrics_from_files(path_to_reference, path_to_candidate)
  File "/u4/jimmylin/pyserini/tools/scripts/msmarco/msmarco_passage_eval.py", line 158, in compute_metrics_from_files
    qids_to_ranked_candidate_passages = load_candidate(path_to_candidate)
  File "/u4/jimmylin/pyserini/tools/scripts/msmarco/msmarco_passage_eval.py", line 75, in load_candidate
    with open(path_to_candidate,'r') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'ltr_test/run.ltr.msmarco-passage.test.tsv'

The critical line is:

  File "/u4/jimmylin/pyserini/pyserini/search/lucene/ltr/_search_msmarco.py", line 208, in batch_extract
    print(group.mean())

I believe this error had been there all along, but somehow it was being swallowed and ignored before, whereas (because of updated version of underlying library?) the error is manifested now.
Commenting out the above line appears to fix the problem.

For sparse, these were the failures:

Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/./scripts/classifier_prf/rank_trec_covid.py", line 315, in <module>
    rank(args.new_qrels, args.base, args.tmp_base, args.qrels, args.index, R, args.output, args.alpha, args.clf, args.vectorizer, args.tag)
  File "/u4/jimmylin/pyserini/./scripts/classifier_prf/rank_trec_covid.py", line 286, in rank
    map_score,ndcg_score = evaluate(new_qrels, output_path)
  File "/u4/jimmylin/pyserini/./scripts/classifier_prf/rank_trec_covid.py", line 204, in evaluate
    ndcg_score = str(subprocess.check_output(cmd1, shell=True)).split('\\t')[-1].split('\\n')[0]
  File "/usr/lib/python3.10/subprocess.py", line 421, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 526, in run
    raise CalledProcessError(retcode, process.args,
subprocess.CalledProcessError: Command '../anserini/tools/eval/trec_eval.9.0.4/trec_eval -c -M1000 -m all_trec ./tools/topics-and-qrels/qrels.covid-round5.txt integrations/tmp5548/runs/covidex.r5.d2q.1s.lr.tfidf.R12.A0.6.txt  | grep 'ndcg_cut_20 '' returned non-zero exit status 1.
E
======================================================================
ERROR: test_cross_validation (test_clprf.TestSearchIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/clprf/test_clprf.py", line 108, in test_cross_validation
    score = parse_score(stdout, 'map')
  File "/u4/jimmylin/pyserini/integrations/utils.py", line 49, in parse_score
    while 'Results' not in lines[0]:
IndexError: list index out of range

======================================================================
ERROR: test_bm25 (test_trec_covid_r3.TestSearchIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/clprf/test_trec_covid_r3.py", line 75, in test_bm25
    with open(f'{self.tmp}/output.json') as json_file:
FileNotFoundError: [Errno 2] No such file or directory: './integrations/tmp2200/output.json'

======================================================================
ERROR: test_bm25 (test_trec_covid_r4.TestSearchIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/clprf/test_trec_covid_r4.py", line 81, in test_bm25
    with open(f'{self.tmp}/output.json') as json_file:
FileNotFoundError: [Errno 2] No such file or directory: './integrations/tmp6202/output.json'

======================================================================
ERROR: test_round5 (test_trec_covid_r5.TestSearchIntegration)
----------------------------------------------------------------------
Traceback (most recent call last):
  File "/u4/jimmylin/pyserini/integrations/clprf/test_trec_covid_r5.py", line 83, in test_round5
    with open(f'{self.tmp}/output.json') as json_file:
FileNotFoundError: [Errno 2] No such file or directory: './integrations/tmp5548/output.json'

----------------------------------------------------------------------
Ran 44 tests in 30207.120s

FAILED (errors=4)

To reproduce:

python -m unittest integrations.clprf.test_clprf.TestSearchIntegration.test_cross_validation

python -m unittest integrations.clprf.test_trec_covid_r3
python -m unittest integrations.clprf.test_trec_covid_r4
python -m unittest integrations.clprf.test_trec_covid_r5

For the first error, ../anserini/tools/ wasn't checked out/compiled. (We should change path.)
The remaining errors were flaky tests, network issues while downloading.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant