Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tokenization with exception patterns #700

Merged
merged 53 commits into from
Jan 2, 2017
Merged
Changes from 1 commit
Commits
Show all changes
53 commits
Select commit Hold shift + click to select a range
5b00039
First steps towards the Hungarian tokenizer code.
oroszgy Dec 7, 2016
90d22db
Added Hungarian resource files.
oroszgy Dec 8, 2016
0289b8c
Additional abbreviation tests.
oroszgy Dec 8, 2016
2051726
Passing Hungatian abbrev tests.
oroszgy Dec 10, 2016
0cf2144
Adding partial hyphen and quote handling support.
oroszgy Dec 10, 2016
c035928
Partial Hungarian number tokenization is added.
oroszgy Dec 20, 2016
366b3f8
Merge branch 'master' into hu_tokenizer
oroszgy Dec 20, 2016
6add156
Refactored language data structure
oroszgy Dec 20, 2016
23956e7
Improved partial support for tokenzing Hungarian numbers
oroszgy Dec 20, 2016
3d5306a
Added further testcases.
oroszgy Dec 20, 2016
ab2f6ea
Removed data files from tests..
oroszgy Dec 21, 2016
35aa547
Hungarian module is exposed in spacy.
oroszgy Dec 21, 2016
1748549
Added exception pattern mechanism to the tokenizer.
oroszgy Dec 21, 2016
d9c59c4
Maintaining backward compatibility.
oroszgy Dec 21, 2016
c5c0ed9
fixed minor typo
fnorf Dec 22, 2016
642803d
Merge pull request #702 from fnorf/patch-1
ines Dec 22, 2016
fdf4776
Added Swedish abbreviations
Dec 22, 2016
7f411fd
Remove exceptions containing whitespace / no special chars
ines Dec 23, 2016
11ec02d
Separate inline icon and help cursor classes
ines Dec 23, 2016
cc051dd
Add resources page to usage docs
ines Dec 23, 2016
48b03b4
Fix formatting and wording
ines Dec 23, 2016
12bb0aa
Fix license formatting for GitHub's parser
ines Dec 23, 2016
1d64527
Update Spanish tokenizer
ines Dec 23, 2016
1436b9f
Fix formatting and consistency
ines Dec 23, 2016
207555f
Fix spelling
ines Dec 23, 2016
3a9be4d
Updated token exception handling mechanism to allow the usage of arbi…
oroszgy Dec 23, 2016
72b61b6
Typo fix.
oroszgy Dec 23, 2016
45e045a
Unicode/UTF8 compatibility for Python2
oroszgy Dec 23, 2016
8785706
Reformat stop words for better readability
ines Dec 23, 2016
b893126
Use link mixin instead of plain link markup
ines Dec 24, 2016
f6f6e02
Make links detect target automatically and replace false with null fo…
ines Dec 24, 2016
6dd8ae1
Update README.md
ines Dec 25, 2016
b7becae
Fix typo
ines Dec 25, 2016
ade7487
Accepted contributor agreement.
oroszgy Dec 26, 2016
ef8f310
Merge branch 'hu_tokenizer' of github.com:oroszgy/spaCy into hu_token…
oroszgy Dec 26, 2016
78f754d
Merge pull request #705 from oroszgy/hu_tokenizer
ines Dec 26, 2016
223142d
Update CONTRIBUTORS.md
ines Dec 26, 2016
ad3669c
Merge pull request #703 from magnusburton/master
ines Dec 27, 2016
ce4539d
Allow the vocabulary to grow to 10,000, to prevent cold-start problem.
honnibal Dec 27, 2016
cade536
Merge branch 'master' of ssh://github.com/explosion/spaCy
honnibal Dec 27, 2016
f62db78
Increment version
honnibal Dec 27, 2016
e80dad8
Update version
ines Dec 27, 2016
decb743
Update README.rst
ines Dec 27, 2016
d158595
Add Hungarian to alpha support overview
ines Dec 27, 2016
9f24eb3
Update CONTRIBUTORS.md
ines Dec 27, 2016
14295f9
Update README.rst
ines Dec 27, 2016
f112e77
Add PART to tag map
petterhh Dec 28, 2016
9d39e78
Merge pull request #713 from petterhh/patch-1
ines Dec 28, 2016
623d94e
Whitespace
honnibal Dec 30, 2016
3e8d9c7
Test interaction of token_match and punctuation
honnibal Dec 30, 2016
9936a1b
Merge branch 'tokenization_w_exception_patterns' of https://github.co…
syllog1sm Dec 30, 2016
3ba7c16
Fix URL tests
syllog1sm Dec 30, 2016
fde53be
Move whole token mach inside _split_affixes.
syllog1sm Dec 30, 2016
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Prev Previous commit
Next Next commit
Update CONTRIBUTORS.md
  • Loading branch information
ines authored Dec 27, 2016
commit 9f24eb3fd9e848bc2bac1052a07f1e2c6da86172
1 change: 1 addition & 0 deletions CONTRIBUTORS.md
Original file line number Diff line number Diff line change
Expand Up @@ -4,6 +4,7 @@ This is a list of everyone who has made significant contributions to spaCy, in a

* Adam Bittlingmayer, [@bittlingmayer](https://github.com/bittlingmayer)
* Andreas Grivas, [@andreasgrv](https://github.com/andreasgrv)
* Bhargav Srinivasa, [@bhargavvader](https://github.com/bhargavvader)
* Chris DuBois, [@chrisdubois](https://github.com/chrisdubois)
* Christoph Schwienheer, [@chssch](https://github.com/chssch)
* Dafne van Kuppevelt, [@dafnevk](https://github.com/dafnevk)
Expand Down