I'm currently working on getting saving/loading working.I'd just like to rant a bit about Java.
I know, it's nothing new. And the fact that I'm pre-crippling myself by using Java 1.1 will garner me no sympathy whatsoever. And I'm almost certainly approaching the problem from completely the wrong direction.
Thudgame is designed so that it'll connect through a socket, and send stuff to the server, and get stuff back. Java can do this fine.
Thudgame is designed to accept various things from the server. Depending on the first word of the line* (which will always be 7-bit ASCII), the remainder of the line will be in ISO8859-1 (Latin-1), UTF-8, or binary data encoded in Base64, or binary data gzipped and encoded in Base64. Four possible encodings.
Fair enough. Trivial, trivial stuff, in any normal language. You get a line (in all encodings, the line endings are the same), you tokenise it by whitespace (the same in all encodings), and you use the first word to establish what's being sent, and how to deal with all the other words.
But in Java, anything that becomes a String (or a StringBuffer) will be corrupted and munged into Unicode, so that's no use for binary data. Also, if it uses the wrong encoding (which it will 3/4 of the time if we haven't read the first word to tell us what the encoding is), then it throws an exception and goes a bit wobbly.
Fair enough, then, we want arrays of bytes. How can we read a line of un-formatted bytes in?
Well, back in Java 1.0, we could almost use DataInputStream.readLine(). Except that's deprecated now, and anyway, it returned a String.
It's been deprecated in favour of BufferedReader.readLine(). Unfortunately that also only does Strings.
But wait! There's StreamTokenizer! A whole class specially for splitting streams into tokens! But it stores the tokens as Strings.
What about Java's other tokenizer, StringTokenizer? Of course, it also only works on Strings.
OK, so we roll our own ByteTokenizer class from scratch.
We just read it in character by character, and stuff the bytes into a dynamically expanding byte array until the end of the line! But... there's no way to dynamically extend arrays.
But there's the Vector class! That allows you to add stuff onto the end! Except, it doesn't let you add bytes or other primitive types: only object references.
That's OK though, that's kindof like C. We roll our own ByteVector class from scratch!
Whoo! On the home stretch now! And we've only had to create two new classes! Now all we need to do is convert the array of bytes we've read in to the right kindof object, using the correct encoding.
On the way, we stumble over the fact that Java doesn't allow pass-by-reference. It just simply forbids it. It passes primitive types by value, and object references by value. So you can modify an object by passing it to a function, but you can't make it into a different object. This is important for arrays, because it means that you can't resize an array if you need to, in a function that it passed the array but doesn't return it.
So, we can use new String(byte[] data, int offset, int length, String encoding) with encoding set to "UTF-8" or "ISO-8859-1", and we've got our first two encodings.
Then we roll our own Base64/BZip encoding/decoding utility class (*), create a ByteArrayInputStream from the array, stream it through a reader in the utility class, and we've got decoded Base64 with optional BZip.
And that cost about 500 lines and multiple wasted days chasing down dead ends that turned out to be impossible. Sure, I could have kludged it in a lot fewer lines, but the classes here should be useful and reusable elsewhere.
Just in case you missed it there, that means that to read in a single line, I have to create a new ByteVector object, parse it character by character into an array, reallocating and copying into a new array whenever it runs out of space. Then scan through the array to find the first occurance of the space character, convert that slice of the array to a new String object in order to compare it to the command string, then convert the remainder of the array into a ByteArrayInputStream, chain it with a new Base64InputStream, and read it into a new byte array...
And that's just to read in a single line.
(Actually, I reuse the ByteVector, and do an array comparison with a bunch of short, static arrays, which saves me from tokenizing the main array, and from creating any Strings unless I need to, and a bunch of other simplifications. It even runs smoothly on my crusty old laptop when I spam it from the server. But still! Yuck!)
In PHP, it's far fewer lines, including three per encoding type to decode. Most importantly, the MASSIVE faff to read in the bytes is just a single assignment in the while(). Half a line. *sob* I hate you, Java.
while($line = fgets($sock, 512)) {
if (substr($line, 4) === 'UTF8') {
$str = utf8_decode($params[2]);
}
elseif (substr($line, 6) === 'LATIN1') {
$str = $params[2];
}
elseif (substr($line, 6) === 'BASE64') {
$data = base64_decode($params[2]);
}
elseif (substr($line, 10) === 'BASE64GZIP') {
$data = bzdecompress(base64_decode($params[2]));
}
}
[* I've actually simplified significantly for the sake of this rant. The encoding depends on multiple factors. The "command word" (third word in the line), is used to distinguish between the BASE64 binary data and the strings. The Byte Order Marker (BOM) at the beginning of the strings is used to distinguish between the two encodings (default is Latin-1), and the decoded value of the Base64 stuff is checked and it automatically uncompresses on the fly if it wants. The whole sexy Base64 magic is not something I can claim credit for: it comes from Mr Harder's work at http://iharder.sourceforge.net/base64/ ]












In the end, I ended up doing the string tokenisation after converting them to Strings, and just using the Byte Order Marker as the magic cookie to tell what I wanted to decode them into. Then, once they're decoded, I tokenize and sort them to wherever needed by command word. The data strings are decoded as strings, since the Base64 encoding survives being stringified.
This all means that any encoding without a BOM will show up wrong, but that shouldn't be a problem, touch wood, since all user messages from the client should be UTF-8 with a BOM.
It'll only become an issue when people find their way onto the server with other IRC clients, which is something I entend to protect against anyway.
So, odds are it'll only be an issue for betatesters
===
Today's job - daemonise the server!
Trouble there is, php (which the server processes are written in) requires compiling with --enable-pctrl if you want to be able to fork(). And that's not the default. So I've written a "daemonize" script for it, so if PHP's not been compiled that way, it'll try to nohup itself, and if that doesn't work, it'll call a perl wrapper around itself, to fork it into the background. So we win all three ways.
That's the first kludge I've had to do on the serverside, though: I'm not proud of it, and will make sure the real server has properly recompiled PHP. Mind you, there's a second kludge in there, too: I couldn't find how to redirect the STDIN/OUT/ERR constants from within a PHP script, so the calling wrapper redirects them to the relevant logs.
The long and the short of it is, the client can now be used to connect to the server, and move the pieces around. Endgame detection using all that tokenized kerfuffle above is on the way, next day or two I hope.
So far, we have maybe five betatesters, but nobody's talking on here. Speak up! It's good motivation for me when I can hear people looking at my stuff and saying "what a crock!" - it makes me fix the crockiness faster!
But first... sleep! Before those darn builders next door start smashing things loudly again.
Yeah, getting some feedback now. So I'm feeling all fired up and motivated, and with a big mug of coffee at my side, I've decided sleep is for the WEAK! I'm typoing even more than normal, though, so I think I'll have to pack it in eventually, when I become incoherent.
Here's the declaration of the BOMs. It appears to work, but I'm sure there has to be a nicer way of doing it. This is rather ghastly.
/* private static final byte[] BOM_UTF32LE = {0xFF, 0xFE, 0x00, 0x00}; // UTF-32/UCS-4, little endian. private static final byte[] BOM_UTF32BE = {0x00, 0x00, 0xFE, 0xFF}; // UTF-32/UCS-4, big-endian. private static final byte[] BOM_UTF16LE = {0xFE, 0xFF}; // UTF-16/UCS-2, little endian (Microsoft) private static final byte[] BOM_UTF16BE = {0xFF, 0xFE}; // UTF-16/UCS-2, big endian. private static final byte[] BOM_UTF8 = {0xEF, 0xBB, 0xBF}; // UTF-8 (what ThudChat uses) */ // The above, converted to signed, because of java's lack of unsigned bytes. private static final byte[] BOM_UTF32LE = {-1, -2, 0, 0}; private static final byte[] BOM_UTF32BE = {0, 0, -2, -1}; private static final byte[] BOM_UTF16LE = {-2, -1}; private static final byte[] BOM_UTF16BE = {-1, -2}; private static final byte[] BOM_UTF8 = {-17, -69, -65};