“我只要四个字符”
N年前,我在上海某房地产经纪公司做技术。
我们的产品经理想要一个功能:“在咱们的服务的域名后面加上4个字符,点击后打开相关的页面。很简单吧,只要4个字符。”
说实话,此前我虽然看过短地址服务,但是并没有理解里面的算法,因此心里没有底。不过“不行”这种话咱们做技术的正常没法说出口,于是先答应下来。
接下来——你猜的没错——首先是百度(谷歌),然而并没有找到理想的算法,大部分是通过MD5加密,然而这种算法是会有碰撞的问题的。接着又去内部其他服务线了解,也没有,基本都是域名加id的。完蛋。。。于是我寻思是不是找产品PK一下,让他接受域名加id的方案,毕竟这是公司多年传统做法。但是“不行”这种话实在说不出口,于是决定试一试。于是经过几天的找资料,冥思苦想,终于有了下面的代码(修改过)
public class ShortUrl {
/**
* The max int value which has 30 bits (5 characters)
*/
private static int MAX_I=0x3fffffff;
/**
* The max long value which has 60 bits (10 characters)
*/
private static long MAX_L=0xfffffffffffffffL;
private static final byte[] URL_SAFE_ENCODE_TABLE = { 'A', 'B', 'C', 'D',
'E', 'F', 'G', 'H', 'I', 'J', 'K', 'L', 'M', 'N', 'O', 'P', 'Q', 'R',
'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z', 'a', 'b', 'c', 'd', 'e', 'f',
'g', 'h', 'i', 'j', 'k', 'l', 'm', 'n', 'o', 'p', 'q', 'r', 's', 't',
'u', 'v', 'w', 'x', 'y', 'z', '0', '1', '2', '3', '4', '5', '6', '7',
'8', '9', '-', '_' };
private static byte[] intTo6bitByte(int i) {
byte[] targets = new byte[6];
targets[5] = (byte) (i & 0x3F);
targets[4] = (byte) (i >> 6 & 0x3F);
targets[3] = (byte) (i >> 12 & 0x3F);
targets[2] = (byte) (i >> 18 & 0x3F);
targets[1] = (byte) (i >> 24 & 0x3F);
targets[0] = (byte) (i >> 30 & 0x3F);
return targets;
}
private static byte[] longTo6bitByte(long i) {
byte[] targets = new byte[11];
targets[10] = (byte) (i & 0x3F);
targets[9] = (byte) (i >> 6 & 0x3F);
targets[8] = (byte) (i >> 12 & 0x3F);
targets[7] = (byte) (i >> 18 & 0x3F);
targets[6] = (byte) (i >> 24 & 0x3F);
targets[5] = (byte) (i >> 30 & 0x3F);
targets[4] = (byte) (i >> 36 & 0x3F);
targets[3] = (byte) (i >> 42 & 0x3F);
targets[2] = (byte) (i >> 48 & 0x3F);
targets[1] = (byte) (i >> 54 & 0x3F);
targets[0] = (byte) (i >> 60 & 0x3F);
return targets;
}
/**
* never collision!!
* @param i
* @return
*/
private static int intMix(int i) {
i = (i + (~(i << 13) & MAX_I)& MAX_I)& MAX_I;
i = i ^ (i >>> 9);
i = (i + ((i << 3)& MAX_I))& MAX_I;
i = i ^ (i >>> 6);
i = (i + ((i << 10)& MAX_I))& MAX_I;
i = i ^ (i >>> 14);
return i;
}
/**
* never collision!!
* @param i
* @return
*/
private static long longMix(long i) {
i = (i + (~(i << 29) & MAX_L)& MAX_L)& MAX_L;
i = i ^ (i >>> 25);
i = (i + ((i << 11)& MAX_L))& MAX_L;
i = i ^ (i >>> 15);
i = (i + ((i << 19)& MAX_L))& MAX_L;
i = i ^ (i >>> 21);
i = (i + ((i << 5)& MAX_L))& MAX_L;
i = i ^ (i >>> 6);
return i;
}
public static String createToken(int i) {
if ( i < 0 || i > MAX_I) {
throw new IllegalArgumentException();
}
i = intMix(i);
StringBuilder *** = new StringBuilder();
byte[] bytes = intTo6bitByte(i);
for (int j = 1; j < 6; j++) {
***.append((char) URL_SAFE_ENCODE_TABLE[bytes[j]]);
}
return ***.toString();
}
public static String createToken(long i) {
if ( i < 0 || i > MAX_L) {
throw new IllegalArgumentException();
}
i = longMix(i);
StringBuilder *** = new StringBuilder();
byte[] bytes = longTo6bitByte(i);
for (int j = 1; j < 11; j++) {
***.append((char) URL_SAFE_ENCODE_TABLE[bytes[j]]);
}
return ***.toString();
}
public static void main(String[] args) throws Exception {
for (int i = 1; i < 100; i++) {
System.out.println(createToken(i));
}
for (int i = 9540123; i < 9540223; i++) {
System.out.println(createToken(i));
}
for (long i = 1; i < 100; i++) {
System.out.println(createToken(i));
}
for (long i = 87572396021L; i < 87572396121L; i++) {
System.out.println(createToken(i));
}
}
}
永不碰撞
这个程序可以把有序的整数(自增主键)转化到相同数值范围内另外一个整数,并且是一一对应的,也即是说每个都不会碰撞。然后通过base64转化成字符就可以得到一个短地址字符串。
各位看官是否能猜出里面算法的核心思想吗?