This page shows the source for this entry, with WebCore formatting language tags and attributes highlighted.


Why can't identifiers start with a number?


The video <a href="" author="Kevin Powell">I'm not sure how much longer I can wait!</a> is an excellent introduction to sub-grids in CSS. But I was more interested in the fact that he told his viewers that, <bq>you can use numbers in classes, but if you have a class or id that starts with a number, it's invalid. [...] It's one of those weird things in CSS that sometimes trips people up.</bq> I immediately thought to myself, "it's not weird. Every programming language is like that." Then, I thought, "I bet this guy only knows CSS, so he doesn't have anything to compare it to." Then, I thought, "Wait...why can't you start an identifier with a number?" And, finally, "I bet it's a lexing/parsing thing." <h>Parser or lexer?</h> I've written several parsers for medium-sized languages and my gut feeling is that letting an identifier start with a number seems like a surefire way of making the lexer more ambiguous or pushing more work into the parsing stage. For example, if <c>25L</c> can be either an identifier <i>or</i> a long integer, then the parser has to figure out from context which one it is (e.g. by checking whether that identifier is declared). If it can only be a number, then it comes out of the lexer as a number token and the parser doesn't have to disambiguate. Even if your language doesn't allow suffixes, you'd still have the problem with an identifier like <c>25</c>, which would be legal <i>unless</i> you introduce the additional restriction that an identifier must have <i>at least</i> one alphabetic character. In that case, though, you might as well make the rule that the identifier has to <i>start</i> with an alphabetic character and avoid the whole ambiguity. With that common---not weird!---rule, the disambiguation happens in the lexer, where the operation is clearer and less expensive, performance-wise. <h>Unresolvable ambiguity</h> It's actually worse than that, though. In the case of a programming language, you could see how the following would result in a compiler ambiguity: <code> var 3 = 5; // I'm <i>already</i> confused //...the compiler gets it, though var a = 3; // Now, the compiler's confused as well </code> Is the developer assigned the value <c>3</c> to <c>a</c> or the variable <c>3</c>? Not only is this a terrible idea for readability, the compiler can literally not resolve this ambiguity without additional information. So there <i>have</i> to be restrictions on identifier names in order to avoid clashes with not only reserved words (e.g. <c>if</c>) but also manifest constants (e.g. <c>3</c>). <h>Where's the problem with CSS?</h> In the case of CSS, where you do have suffixes (e.g. <c>25px</c>) but you can't really mix class identifiers with values, it's possible that you could get away with no ambiguities <i>right now</i>. So it's not <i>weird</i> that you can't start an identifier with a number---it's perfectly natural for developers---but it is, in the case of CSS, not required for unambiguous processing. As you can see below, though, it's still kind of confusing for the user. What if we have a class named "3"? It's not very expressive---we'd probably call the class something like "3-part-panel"---but it's the pathological case. Maybe a class called "3px" would be even worse. <code> .3-part-panel { /* This is fine */ } .3 { /* Weird, but OK */ } .3px { /* Now you're just being obnoxious */ } </code> Do we actually get any ambiguities, though? I don't think so. I think in this case, the authors of CSS just used the "standard" (not weird!) definition of an identifier. It's only when you have people using CSS who have had no exposure to any other programming languages (or parsing/lexing) that you get people thinking it's "weird" that you can't start with a number. The only place where you could get an ambiguity is with CSS customer properties. In that case, though, <iq>[a] custom property is any property whose name starts with two dashes</iq>, according to <a href="" author="" source="W3C">CSS Custom Properties for Cascading Variables Module Level 1</a>. So, variable names in CSS are even <i>more</i> restricted than in most programming languages. Is that weird? Again, no. As in the case above with other programming languages, the end result is more clarity for the user. For example, the following declares a few CSS custom properties with deliberately obnoxious names. <code> :root { red: #F33; color: #FF0; 0: 1; 3px: 1px; } .error-text { color: var(red); background-color: var(color); border-width: var(3px); opacity: var(0); } </code> Although I've chosen confusing values and names, this doesn't---at first glance---seem to cause any ambiguities. As with the examples above, it does force implementations to handle enumerations (e.g. all of the colors) in the parser, rather than the lexer. If the word "red" cannot be used as a variable, then it could (possibly) be recognized as its own token in the lexer, (possibly) improving performance. The same goes for the property names. If it's possible for custom properties to use the same names as built-in properties, then the lexer can't handle them. There is no ambiguity because custom-property values must be <i>resolved</i> using the CSS function <c>var()</c>. The problem is worse than that, though. There is an actual ambiguity that isn't obvious because we're using the <c>:root</c> pseudo-class<fn>. The example below, using <c>< html></c>, makes it clearer. <code> html { color: #F33; // Is this setting the color // ...or declaring a color variable? } </code> This is an ambiguity that the compiler cannot resolve. So that's why the CSS designers settled on a prefix for custom properties. So, to a layman or user of CSS, naming restrictions on class or custom-property identifiers may seem arbitrary and "weird", but they are a logical requirement of being able to process the grammar unambiguously. <hr> <ft>If you know where I'm headed, then fine, it's obvious <i>to you</i>. Congratulations. I didn't see it immediately, so I'm writing it this way.</ft>