Skip to content

Parsing Floats, Integers, Money, and Dates

Mark Robinson edited this page Jun 22, 2021 · 15 revisions

Floats, Integers, Money, and Dates have one thing in common: their format depends on the locale...

The Wikipedia page explores the different Date formats by country

This Wikipedia page explores how different countries use different Decimal Separators and which character represents the decimal "point".

Check out India and China in the examples below... you can't even assume that numbers are grouped in triplets.

|Style	       | Countries | 
---------------------------------------------------------------------------
| 1,234,567.89 | Australia, Canada (English-speaking, unofficial), China, Hong Kong, Ireland, Israel, Japan, Korea, Malaysia, Mexico, New Zealand, Pakistan, Philippines, Singapore, Taiwan, Thailand, United Kingdom, United States |
| 1234567.89   | SI style (English version), Canada (English-speaking), China, Sri Lanka, Switzerland (officially encouraged for currency numbers) |
| 1234567,89   | SI style (French version), Albania, Austria, Belgium (French), Belgium (Dutch: alternative), Brazil, Bulgaria, Canada (French-speaking), Costa Rica, Croatia, Czech Republic, Denmark, Estonia, Finland, France, Germany, Greece, Hungary, Italy, Kosovo, Latin Europe, Netherlands (alternative), Norway, Peru, Poland, Portugal, Romania, Russia, Serbia, Slovakia, Slovenia, South Africa, Spain (official), Sweden, Switzerland (officially encouraged for non-currency numbers), Ukraine | 
| 1,234,567·89 | Ireland, Malaysia, Malta, Philippines, Singapore, Taiwan, Thailand, United Kingdom (older, typically hand written)[32] | 
| 1.234.567,89 | Argentina, Austria, Belgium (Dutch: most common), Bosnia and Herzegovina, Brazil, Chile, Croatia,[33][34] Denmark, Greece, Indonesia, Italy, Netherlands (most common), Portugal, Romania, Russia, Slovenia, Spain,[35] Sweden (not recommended), Turkey | 
| 12,34,567.89 | Bangladesh, India (see Indian Numbering System) | 
| 1'234'567.89 | Switzerland (printed, computing, currency, everyday use), Liechtenstein |
| 1'234'567,89 | Switzerland (handwriting) |
| 1.234.567'89 | Spain (handwriting) |
| 123,4567.89  | China (based on powers of 10 000 — see Chinese numerals) |
---------------------------------------------------------------------------

How monetary amounts are represented is similarly different from country to country. Besides the numbers being formatted differently, the currency sign can be before or after the numerical value, depending on which country you are in.

TL/DR

SmarterCSV does not automatically parse money, dates, integers and floats, because there is so much variation in how they can be formatted in different countries.

There are built-in helpers for numerical values, but these only assume the Ruby way of writing numbers, e.g. 123456.78 for floats, and 123456 for integers.

The country-specific different formats are purposely not covered by SmarterCSV.

If we were to implement a built-in way to parse European floats like 123,456 how could we distinguish that from the integer value 123,456 in US notation? So if SmarterCSV would see 123,456, what should the result be? A float 123.456 or an integer 123456?

This is why SmarterCSV does not handle automatic parsing of floats, integers, money and dates...

Instead SmarterCSV version 2.0 allows the user to define their own custom Procs to handle hash_transformations, so country-specific formats can be parsed in a user-defined way if so desired.

If you have to implement your own Proc, it is probably a good idea to try to first strip-off the decimal separators first, and then try to convert the normalized numbers.

Clone this wiki locally