标题: PHP base_convert函数的一个有趣现象
作者: Demon
链接: https://demon.tw/programming/php-base_convert.html
版权: 本博客的所有文章,都遵守“署名-非商业性使用-相同方式共享 2.5 中国大陆”协议条款。
PHP 的 base_convert 函数能在任意进制之间转换数字,这是常识。那么请你不要实际运行,用常识判断一下,这句代码运行的结果:
echo base_convert('https://demon.tw', 16, 10);
如果你的答案是 222,那么恭喜你答对了,其实上面那句代码跟这句是一样的:
echo base_convert('de', 16, 10);
也就是说,base_convert 函数会忽略掉该进制以外的其他字符。下面通过 base_convert 函数的 C 源码来分析原因,base_convert 函数定义在 PHP 源码的 ext/standard/math.c 中:
/* {{{ proto string base_convert(string number, int frombase, int tobase) Converts a number in a string from any base <= 36 to any base <= 36 */ PHP_FUNCTION(base_convert) { zval **number, **frombase, **tobase, temp; char *result; if (ZEND_NUM_ARGS() != 3 || zend_get_parameters_ex(3, &number, &frombase, &tobase) == FAILURE) { WRONG_PARAM_COUNT; } convert_to_string_ex(number); convert_to_long_ex(frombase); convert_to_long_ex(tobase); if (Z_LVAL_PP(frombase) < 2 || Z_LVAL_PP(frombase) > 36) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid `from base' (%ld)", Z_LVAL_PP(frombase)); RETURN_FALSE; } if (Z_LVAL_PP(tobase) < 2 || Z_LVAL_PP(tobase) > 36) { php_error_docref(NULL TSRMLS_CC, E_WARNING, "Invalid `to base' (%ld)", Z_LVAL_PP(tobase)); RETURN_FALSE; } if(_php_math_basetozval(*number, Z_LVAL_PP(frombase), &temp) != SUCCESS) { RETURN_FALSE; } result = _php_math_zvaltobase(&temp, Z_LVAL_PP(tobase) TSRMLS_CC); RETVAL_STRING(result, 0); }
前面几行都是解析和校验参数是否正确,关键代码是 _php_math_basetozval 和 _php_math_zvaltobase 函数,_php_math_basetozval 定义如下:
/* {{{ _php_math_basetozval */ /* * Convert a string representation of a base(2-36) number to a zval. */ PHPAPI int _php_math_basetozval(zval *arg, int base, zval *ret) { long num = 0; double fnum = 0; int i; int mode = 0; char c, *s; long cutoff; int cutlim; if (Z_TYPE_P(arg) != IS_STRING || base < 2 || base > 36) { return FAILURE; } s = Z_STRVAL_P(arg); cutoff = LONG_MAX / base; cutlim = LONG_MAX % base; for (i = Z_STRLEN_P(arg); i > 0; i--) { c = *s++; /* might not work for EBCDIC */ if (c >= '0' && c <= '9') c -= '0'; else if (c >= 'A' && c <= 'Z') c -= 'A' - 10; else if (c >= 'a' && c <= 'z') c -= 'a' - 10; else continue; if (c >= base) continue; switch (mode) { case 0: /* Integer */ if (num < cutoff || (num == cutoff && c <= cutlim)) { num = num * base + c; break; } else { fnum = num; mode = 1; } /* fall-through */ case 1: /* Float */ fnum = fnum * base + c; } } if (mode == 1) { ZVAL_DOUBLE(ret, fnum); } else { ZVAL_LONG(ret, num); } return SUCCESS; } /* }}} */
代码太长看起来很烦,关键是这一段:
for (i = Z_STRLEN_P(arg); i > 0; i--) { c = *s++; /* might not work for EBCDIC */ if (c >= '0' && c <= '9') c -= '0'; else if (c >= 'A' && c <= 'Z') c -= 'A' - 10; else if (c >= 'a' && c <= 'z') c -= 'a' - 10; else continue; if (c >= base) continue;
遍历字符串,碰到除了 [0-9a-zA-Z] 以外的字符只是用 continue 直接跳到下一次循环,所以其他字符并不影响进制的转换。而且当 c 大于 base 时也是直接跳到下一次循环,所以该进制以外的其他字母亦不会影响进制的转换。这是 base_convert 函数的一个 BUG 呢,还是设计者有意为之?
赞赏微信赞赏支付宝赞赏
随机文章: