-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: arbitrary-radix integer literals #28256
Comments
In Go, I have never wanted to write an integer literal with radix other than 2, 8, 10, or 16. I have also never read code that would have used such literals, had they existed. Therefore, the benefit seems extremely low. The fact that the existing hexadecimal syntax doesn't fit directly into the proposed syntax but requires a special case of 0 ≡ 16 significantly detracts from the appeal. |
I like the idea of removing the leading-zero octal notation. |
@cespare I would have formulated your 2nd paragraph slightly differently: The fact that the existing hexadecimal syntax neatly fits directly into the proposed syntax significantly adds to the appeal. :-) |
While I see the appeal of having a consistent syntax, I fear this would become a very obscure feature. I never felt then need for anything else but binary, octal, decimal and hexadecimal integer constants. Binary integer literals are useful in many cases involving bit twiddling, octal is useful for file permissions, hexadecimal is useful for compact notation of bytes. But trinary or twentyone-ary, seems to be useful for obfuscation only. I do like the idea of changing then notation for octals, now it's still the confusing C notation. And I do like the uniform notation you propose. I would just disallow anything else than base 2, 8, 10 and 16 to avoid such obfuscation. Otherwise, could you please show us a few production open source code bases where the use of such arbitrary radix integer constants would have been beneficial? |
I'd be ok with the restriction to 2, 8, 10, and 16, but why? It would make things (a tiny bit) more complicated; the only reason I'd see is that it might perhaps eliminate errors (somebody might write 9x066 rather than 8x066 for a file permission). I agree that most programmers may not care much about the flexibility here, they'll be just fine that they can write down numbers in all the commonly used radixes (2, 8, 10, 16) w/o extra cost (one extra char for octal) and use a single, uniform notation. Personally, I think that not having arbitrary radix notation is what prevents us from thinking it might be useful. Now usefulness alone is not a criteria for adding something to the language, but it this case it would address the desire for a binary notation and simplify what we already have, and remove restrictions. Seems like a win-win to me. Keep in mind that there's really strong support for adding binary integer literals, so no matter what, we'd have to make changes in all the same places. The difference is just whether we add one more special case, or whether we simplify all the code in favor for a uniform notation. Finally, there's also the educational aspect of Go: Having a simple, uniform mechanism here rather than an agglomeration of historical notations seems like a nice cleanup. Btw., Smalltalk supports arbitrary radix notation, too, using the same syntax but with an 'r' instead of an 'x'. Using the 'x' permits the most common other base notation fit neatly into the system. |
Because that's 32 = 36-4 fewer bases you need to understand when reading code.
Hexidecimal is certainly useful. Binary and octal seem marginally useful. Other bases just don't seem useful at all. Certainly their value isn't worth burdening the reader with them. |
I am interested in smalltalk you mentioned. Could you point out any open
source smalltalk project that uses arbitrary bases to good effect? My
feeling is that the feature will be rarely used and more of a source of
frustration due to typos as you mentioned.
|
I don't think we should use this proposed syntax with such a restriction. I think that, if anything, we should just add the
I don't agree that this proposal is uniform; it introduces more ways of writing the same integer literals:
|
@beoran I don't know of a Smalltalk playground offhand (which doesn't require installation), but there is of course Squeak (https://en.wikipedia.org/wiki/Squeak). For documentation see the famous "Blue Book", http://stephane.ducasse.free.fr/FreeBooks/BlueBook/Bluebook.pdf, literals with radixes are described on page 19. And the examples there are limited to radix 8 and 16. Again, I have no strong feelings regarding restricting a radix to 2, 8, 10, 16, but I also don't think it matters much - people won't use crazy radixes for no good reason. (I suspect it's the small radixes that are interesting. For instance, I can see how I'd use a small-n (3, 5, etc.) radix to encode multiple values of n states in a single int, e.g. for some state on a game board.) In summary, it really doesn't matter all that much; what people seem to want is binary integer literals, and there's a specific proposal for that. It happens to do what all other languages do (which is good) but it also happens to introduce yet another notation. I've submitted this proposal because I think it's a viable alternative. Especially if we're considering removing/improving the octal notation (which would be a Go 2 item) we'd have to have some replacement. This proposal would resolve all those issues in one fell swoop. Personally, I think this is a more elegant approach for the whole problem of different radix integers, but I'm biased, of course. I think the decisions that need to be made are:
I think the decision for 2) should take into account:
|
@cespare Not to be facetious, but with the 0b notation there will also forever be two ways of writing a "hex" number: 0x2a and 0b00101010 . I'd see that as much bigger problem - there will be plenty of people arguing that one is better than the other. Realistically, with the radix notation, people will stick to the shorter 0x notation rather than 16x (but either way, the actual hex number looks the same). What you are saying really was one of the reasons for not including 0b from day one: There's already a suitable notation, namely 0x. |
There is also the suggestion to support intN for all N from @jimmyfrasche:
And several real world uses immediately occurred to me:
I can see game states similarly benefitting from intN. In contrast, I can't think of any real world use cases for arbitrary radix constants. Just another data point. |
To answer your questions, I think, 1. yes we need binary constants because they are useful for bit masks and other bit twiddling. And 3. Dropping C style octals and replacing them is a good idea, because C style octals are a source of beginner bugs. Though I would probably go for 0o765 notation, although seeing the Smalltalk precedent 08x765 would also be ok. |
The choice of The referenced issue calling for binary literals has quite a lot of voices saying they're not needed. I won't paste in arguments from there to here, but it's not clear to me that the case for needing them has been made. More literal bases is a step towards Perl's There's more than one way to do it and away from Go's Yes, that was a deliberate mistake to show few readers will want to count a run of the same digit so then there will be calls for underscores, a la Ada, as separators, with arguments over where to separate. It doesn't matter those programmers won't be coding on your project; you and I will still have to read their project. It's a shame octal nabbed 0755 instead of 0o755, no capital O allowed, but other than that things seem fine as they are. And deprecation of 0755 for a new octal format can be done, as gri outlined, without adding base 2 or base 2-36. |
It's hard to believe there are programmers that can mentally reason about 64 digit binary literals but not hex. The digits are not zero padded so to even determine what bit is set, you need to determine the number of digits in the number. Easy with base16, but are there really any examples of binary integer literals serving a useful purpose other that tables of constants rendered by a monospaced font that are rigorously whitespace alligned or zero padded? The gofmt is not going to move these numbers to the right either. Small values will be difficult to see clearly. I suppose that could solved by using 2b01 and 2b10 though. |
To make binary literals more readable some languages also allow the use of separators. For example, in C# you can write something like this In fact, C# allows underscore to be used in any numeric literal, not only in binary ones. In my opinion, even for 64-bit literals binary representation would be much more readable if you need very specific bits to be set. Hex values always require a bit more thinking and conversion in your head even if you know hex perfectly well. It's simple for one byte values but gets harder as you go further (the argument about counting digits applies here even for hex) and add values with multiple bits set where simple pattern of |
Sorry, I didn't mean to drag this back to a rerun of #19308, but to point out that widening the choice of ways to do something, write 0xffe, ripples out into formatting and tools. What demand there is for base 2-36 could be lessened by two things touched on in earlier comments. Keith gave an example for it being easier to read the manual multiplication and addition for a base 23 number. Syntax for array multiplication, AKA Hadamard product, perhaps introduced for vector instructions, would give an alternative. As he said, an exponent operator would help.
That would also allow for mixed-radix numbers; units of time being a common example. Josh referred to intN for all N, e.g. int12. That might be too general, and uintN for 1 ≤ N ≤ 32 good enough for most cases. Verilog has something similar and combined with a bit-catenate operator allows I'm not strongly arguing for either of these, just pointing out that if there is any movement towards them then they overlap with the need for a base 2-36 notation. |
Also Perl.
|
Also python.
|
Java,
|
I expect there's quite a few languages that permit underscore in some numeric literals. Ada was just the first I encountered. Like ditching 0751 as the octal syntax, these underscores would seem to be orthogonal to whether base 2-36 is required. They can be an aid to readability on long literals, but also allow more formatting choice by the author, and disagreement with everyone else. (Perl accepts It's tempting to dictate the allowable formats, e.g. integers must either no underscores, or they must be every three digits from the right: |
Although I'd normally welcome improvements to the numeric aspects of the language, I'm finding it very hard to get enthused about this proposal. The demand just doesn't seem to be there for bases other than 2, 8, 10 and 16 and, even it was, I don't think the change could be made in isolation. People would then be asking for a simple way to print these numbers out. Currently the formatted print functions in the standard library support only the standard bases with their %b, %o, %d and %x verbs so new verbs would need to be added to print out values for arbitrary bases. In other words what are already very complicated functions would become even more so. Nor do I like the proposed syntax. The use of the letter 'x' as a divider seems inappropriate as the other radixes have nothing to do with hex and for the highest radixes it's even a digit itself. I also dislike the discontinuity for hex itself when 16 suddenly becomes 0. It's worth remembering that we already have support for radixes from 2 to 36 in the Although on balance I'd support it, I'm not even sure that adding binary literals (with a 0b prefix) is such a great idea unless a digit separator (such as _) is introduced at the same time. The reality it that once you get past one or two bytes, binary literals become unreadable. As for octal, if one surveys the current state of C family languages, the traditional ones (C, C++, Java) all use the leading zero notation and the newer ones (Swift, Rust) use an 0o prefix. It seems to me that compatibility with the former is much more important for Go and that the leading zero notation should therefore be retained. As no one appears to be seriously complaining about this, it's just not worth the hassle of changing it. Having said that, if binary literals are introduced, then for the sake of consistency I wouldn't necessarily be against adding an alternative 0o prefix for octal with people being advised to prefer that unless they were using |
I don't think Go needs compatibility with any language, especially C/C++. Go is already quite different from C family of languages that there's no point in clinging to them. If we're going to look elsewhere we should really look at what modern languages are doing, not the ancient ones that riddled with questionable design decisions and years of backwards compatibility. If we were to add |
I'm not denying that the leading zero syntax for octal was a questionable design decision for C in the first place. It's more a question of what people expect and anyone coming to Go from the traditional languages is going to expect it to deal with octal literals in the same way. Also it's not just a matter of |
Given the feedback so far, I am going to narrow the proposal as follows:
but only permit 0, 2, and 8 as radix prefix; i.e., integers literals are either decimal, hexadecimal (0x) binary (2x), or octal (8x), and the radix_digit must be within the range 0...radix. (If we wanted to add radix 10 and 16 for regularity, that would be fine, too.)
This fixes potential confusion with octals, and the stream-lined notation can be extended trivially (and in the obvious manner) should there ever be a need for another radix. The analogous alternative to this reduced proposal would be #19308 (binary integer literals) modified/extended such that we remove octals in the current form and add the "0o" prefix for octals instead. This alternative would be less regular in notation and extending it would require inventing a new prefix, but otherwise would be about the same. Independently of this, one might consider #28493 for improving readability of long literals. |
I'm still not keen on the 2x and 8x prefixes and would much prefer your alternative notation of 0b and (if we must change octal) 0o. That would be consistent with the verbs in the formatted print statements and also with what Swift and Rust do. If we are to have binary literals then, in the interests of readability, I think #28493 is a necessity and it would also help with other long numbers. |
Hi @griesemer, I realise from your opening Discussion that reusing the 0b and 0o have fans because they continue this mnemonic use of the letter. If you want a syntax open to future radixes then adopting a new letter avoids thwarting what's already learnt, e.g. r for radix in 8r32. Rewriting octal is already being considered, partially to avoid the beginner error of leading zeroes on base10. If that just leaves 0x as an oddity, given a new 0r syntax, then, 16r0fc0 is at least consistent, but it's noisy compared to the leaner 0x0fc0 that we all love, and parse without thinking. :-) |
That's a good point about I find it difficult to imagine any base outside 2, 8, 10 or 16 becoming popular in the future but, if one did, then other languages would also be under pressure to support it. Perhaps a consensus might then emerge on the best notation to use which Go could follow rather than coming up with its own. |
Bases other than 2, 8, 10, and 16 are extensively used for example in the handling of Bitcoin, Ethereum, and IPFS (all of which have existing implementations in Go). Whilst it's true that all these projects exist and thrive without having base 32 and base 58 literals available, there is no good reason why programmers who frequently use that base should make their code less readable or less expressive. I think Robert's proposal is perfect, he doesn't seem to have overlooked anything. If I were forced to complain about anything, that would be that I'd like to see this feature support up to base 58 for reasons stated above, but I reckon that may be a little too much to ask, because there are various different base encodings for bases above 36. (for example, the alphabet for Bitcoin's base 58 encoding is crafted to remove ambiguity in numbers as read by humans, that's the reason there are no Bitcoin base 58 addresses containing the character l (lowercase L), to avoid confusion with the number 1. So that's the reason that makes me think the base 36 upper bound is good enough, it corrects the glaring omission of base 2, it's a consistent syntax for any integer literal, it promotes readability, and easy learning one single rule for all bases. It's as perfect a solution as you can get. Good work, @griesemer. |
@htrob ISTM that you're really arguing here for a base32 encoding to be added. base58 would be out of the question because, unless we distinguish between upper and lower case letters (which wouldn't fit in with hex), we simply don't have enough potential digits and, even if we did, some of them are omitted by base58 as you've pointed out yourself. With the exception of base32hex, base32 also suffers from having several different alphabets which are not consistent with the original proposal My view is that it's best to process them as strings or byte slices as we do now. |
Or perhaps, for the more exotic bases, better compile-time evaluation of pure functions applied to constant strings. E.g.
That by itself wouldn't make the result eligible for use as a constant, however. |
@alanfo This is the part where I said "but I reckon that may be a little too much to ask", which you may have missed. |
@htrob Well, I detailed my concerns about the original proposal at some length in my first post to this thread. But, as @griesemer has since narrowed it to only allow 0, 2 and 8 as radix prefixes, there's not much point in going over the same ground again. The question now is whether 2x or 8x should be preferred to the more familiar 0b and 0o which he offered as an alternative. As I don't like the use of |
It's always been clear what's your position, and I still honestly believe it's wrong. I already explained exactly why I think it's more practical, readable, and useful to the new Go programmer going forward to accept @griesemer's proposal, which I believe made a far more solid argument than "I don't like". |
The status quo in integer literal is IMO more than sufficient wrt what's needed. Even plain decimal only would be perfectly enough, just use a comment. const LaunchMask = 141836999991328 // 1000 0001 0000 0000 0000 0000 0000 0000 0010 0000 0010 0000 But I don't want to see such monstrosities, as the comment is, in source code. As a comment it's just fine. |
@cznic with comment you introduced an even bigger problem that's common to comments in general - they could be out of date or plain wrong. Single bit error is enough to throw people off that will inevitably rely on these comments. And no amount of testing would catch that. Even code review may not always catch when at some point someone decides to format it like so const LaunchMask = 141836999991328 // 10000001 00000000 00000000 00000000 0010000 00100000 Good luck catching an error. At least with binary literals you can write tests. |
Tests can be written without them as well. func TestFoo(t *testing.T) {
n, err := strconv,ParseUint(strings.Replace("1000 0001 0000 0000 0000 0000 0000 0000 0010 0000 0010 0000", " ", "", -1), 2, 64)
if err != nil {
t.Fatal(err)
}
if g, e := n, 141836999991328; g != e {
t.Fatal(g, e)
}
} Also, I have yet to see a test that tests for the equality of a constant against a literal value. I think Can we estimate the share of programs that would ever use, per this proposal, something like an int literal My guess is it might be well less than a promile and that's another reason why I'm not in favor of the proposal. |
@cznic I fail to see the relevance of this test to my argument. My point is, your comment might be wrong. People will see it, rely on it and report bugs or simply waste time until discovering that the comment was wrong and they need to manually check the bits in the calculator. One can argue that no comment at all would be better. I'm not talking about testing the exact value of a constant. Its value could mean some feature flags that you could pass to your function during tests. Very common for libraries to have constants with default flags set. With wrong comment tests would be green. With an error in a binary literal tests could immediately catch it.
Binary literals are useful even for much smaller literals. Share of programs would be meaningless as it very much depends on the nature of a program. Binary network protocols, stuff that deals with hardware, emulators - they all could benefit from this proposal. But if we take some REST API service - it doesn't need binary or even hex literals. |
But those are IMO way better readable when written in hex. |
I am going to retract and close this proposal. With the reduction to 3 radixes at best (0x, 2x, 8x), it doesn't really bring enough "bang for the buck"; especially so if we keep the existing octal notation. Thanks to the initial supporters, but there doesn't seem to be enough community support for this idea at this stage of Go. If we are going to introduce binary integer literals, we should follow established practice in other languages and go with proposal #19308. If we want to introduce another octal notation, we may want to go with the 0o prefix (another more established convention). Closing. |
After seeing what a bike shed this became, I think closing this is the best
idea. Thanks for your continued efforts.
Op wo 7 nov. 2018 22:57 schreef Robert Griesemer <notifications@github.com:
… Closed #28256 <#28256>.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#28256 (comment)>, or mute the
thread
<https://github.com/notifications/unsubscribe-auth/AAEWeXoYwNeXvQqJvo5GYH56rAdCg7XWks5us1dHgaJpZM4XlECl>
.
|
Will #19308 be re-opened? Not much more to say, but it seems odd to request feedback in a blog post and then link to a locked issue.
That may be more clear than present. It would also be a simple feature to sort out breaking language changes:
|
I unlocked #19308. |
I've brought up this idea several times before informally. I'm filing this issue now for the formal documentation trail.
Currently, Go permits octal, decimal, and hexadecimal integer literals. There's a pending proposal for binary integer literals (#19308) which has wide support.
Proposal:
This is a fully backward-compatible proposal for arbitrary-radix integer literals. We change the integer literal syntax to the following:
with
representing the digit values 0 to 35 (for a maximum radix of 36). The radix must be a decimal literal between 0 and 36, expressing the radix; with the radix value 0 having the same meaning as 16, and the value 1 being invalid.
Examples:
Discussion:
The beauty of this approach is that it permits arbitrary radix notation, thus removing any future need to expand this again, remove the need for the extra notation for hexadecimal numbers because they are just part of this notation, and at the same time it's fully backward-compatible. The commonly accepted notation for binary integer literals and the respective notation here have the same length and the proposed notation here seems just as intuitive (e.g., 0b1001100 == 2x1001100).
We could go a step further and remove octal literals from the language since they are also easily expressed with this notation, but that's a step that would not be backward-compatible. One way to make that happen w/o introducing bugs would be to disallow non-zero decimal numbers that start with a 0; octal numbers in existing code would then lead to a compiler error and could be fixed. It would also be trivial to have them fixed automatically with a simple tool. Finally, removing octals would eliminate another (albeit mostly academic issue) with them; see #28253. If octals were not supported anymore, one could condense the integer literal syntax to:
Implementation:
The implementation is straight-forward. It would likely slightly simplify some of the scanning code for numeric literals because with this proposals now all such literals simply start with a decimal_lit always. If that value is zero, or between 2 and 36, a subsequent 'x' indicates the actual literal value in that radix. The respective number conversion routines are trivial and would need minimal adjustments.
Impact:
Hard to say. It may be sufficient to just add another notation for binary integer literals per #19308. Or we could do this and lay the issue to rest for good.
The text was updated successfully, but these errors were encountered: