Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add entry point for extracting datasets from TEI #4

Draft
wants to merge 47 commits into
base: master
Choose a base branch
from
Draft
Changes from 1 commit
Commits
Show all changes
47 commits
Select commit Hold shift + click to select a range
5e8ecfd
make gradle build and add github actions
lfoppiano Mar 29, 2024
d545b5d
read grobid-home from configuration
lfoppiano Mar 29, 2024
33648de
disable superfluous tests
lfoppiano Mar 29, 2024
49a07b6
fix build
lfoppiano Mar 29, 2024
5d2872e
add simple test on analyzer to get started
lfoppiano Mar 29, 2024
8bc2987
enable jacoco report
lfoppiano Mar 29, 2024
fd84d88
fix build docker
lfoppiano Mar 29, 2024
ffb5bea
disable docker build for the moment
lfoppiano Mar 29, 2024
bb48f37
add parameter to enable/disable sentence segmentation for TEI processing
lfoppiano Apr 18, 2024
f05f68b
Update docker build (#1)
lfoppiano Apr 26, 2024
981ac95
implement tei processing for datasets
lfoppiano Apr 26, 2024
d668625
fix output JSON streaming
lfoppiano Apr 26, 2024
33d4f13
Merge branch 'master' into add-tei-processing-dataset
lfoppiano May 1, 2024
288850f
add the rest of the processing
lfoppiano May 2, 2024
12dcc37
disable broken tests
lfoppiano May 2, 2024
23c2dd5
add XML JATS entry point
lfoppiano May 2, 2024
0213c78
add CC-BY sample documents
lfoppiano May 2, 2024
52ffc23
revert to the original port
lfoppiano May 2, 2024
4448437
enable TEI processing in UI - javascript joy
lfoppiano May 2, 2024
4aad23d
correct parameter
lfoppiano May 2, 2024
6989335
attach URLs obtained from Grobid's TEI
lfoppiano May 6, 2024
7f0cdd5
fix frontend
lfoppiano May 7, 2024
1c5ff72
fix github action
lfoppiano May 7, 2024
4cd7390
fix wrong ifs - thanks intellij!
lfoppiano May 9, 2024
df86b81
avoid exception when entities are empty
lfoppiano May 9, 2024
843463c
avoid injecting null stuff
lfoppiano May 9, 2024
1b1da5f
reduce the timeout for checking the disambiguation service
lfoppiano May 12, 2024
75dd711
fix the convention for sentence segmentation and enable it
lfoppiano May 20, 2024
758f418
update examples
lfoppiano May 21, 2024
91fe70d
add sequence (sentence, paragraph) identifier in each mention
lfoppiano May 21, 2024
cc1cd2a
Fix sentence switch
lfoppiano May 21, 2024
c58502e
Fix incorrect xpath on children
lfoppiano May 23, 2024
6977bda
Cleanup text when extracting from XML, normalise unicode character, r…
lfoppiano Jun 4, 2024
cc01140
Fix bug in the xpaths that were used wrongly to select sentences or p…
lfoppiano Jun 4, 2024
3c3af44
Try to get possible sections in the <back> in which the das is hidden…
lfoppiano Jun 4, 2024
7b6fe06
update to grobid 0.8.1, and catch up other changes
lfoppiano Sep 14, 2024
2162720
retrieve URLs from the TEI XML in all the sections that are of interest
lfoppiano Oct 13, 2024
a2b5bbb
update github actions
lfoppiano Oct 13, 2024
e3a4890
fix xpath to fall back into div into TEI/back
lfoppiano Oct 13, 2024
371f520
cleanup
lfoppiano Oct 13, 2024
1483aab
fix reference mapping
lfoppiano Oct 13, 2024
4ab67a6
fix references extraction
lfoppiano Oct 14, 2024
774dd78
fix regression
lfoppiano Oct 22, 2024
b18454b
cosmetics
lfoppiano Oct 22, 2024
962f7eb
fix regressions in the way we attach references from TEI
lfoppiano Oct 22, 2024
3b343c6
allow xml:id to be string using a wrapper that generates integer to m…
lfoppiano Jan 1, 2025
f58c493
fix extraction of urls that are not well formed (supplementary-materi…
lfoppiano Jan 2, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
update examples
lfoppiano committed May 21, 2024
commit 758f418c35b0f9805279f0233ded1a4a95c3bddf
Loading
Oops, something went wrong.