The online world is chiefly realised through the immersive internet protocol.

Connection and avatar

The basic protocol begins with a handshake.

It requires a handle from the client, a client software identifier, the protocol version used by the client and the IP of the server, along with the name of the requested resource (latter quite akin to HTTP).

It permits two avatar 'prototypes' per connection that are sent on initial connection with the server: One 'human appearance' that by unhacked neural connectors is auto-rendered as 'offline appearance', one 'preferred avatar' that can be freely designed. They can be REQUESTed again by the server (if they've expired in cache or were otherwise lost to it, or for any arbitrary reason), or UPDATEd while connected. The client can refuse to send either, following any hail to do so. The server can refuse to render either for any reason, or alter the prototypes as it sees fit for any duration it sees fit.

The data format of the avatar prototype data is not defined by the protocol, but the de-facto established standard format contains meta-data, quick-access to avatar dimensions and quirks (computation of latter is especially arduous). The format's meta-data is fairly free-for-all, much like the XML format of today, and MMORPG avatar creation kits tend to populate avatars with a lot of extra information about properties (e.g. magic spells and comparable) that let the avatar interact non-physically or long-range with the environment, something it does not inherently support. (In plain English: Most servers will expect your avatar to be self-contained and with a hard surface. If your avatar deviates, it depends heavily on the experience and willingness of the server to permit those traits.)

Trying to send more than two prototypes - or more than one of the respective category - will result in the excess avatar notations to be ignored. In most servers, this discards the later excessive avatar information, rather than overwriting those sent first with those sent last, but it depends on the server. It's not even technically illegal (as per the protocol) for the server to store all avatars and use them as their metadata indicates, though, so the colloquial rule here sensibly tends to be a simple “If you send glitched data, expect glitches.”

Streaming

The data format that streams between client and server is a mix of audiovisual cues, proprioception and balance meta-data, all fed to the client machine in one stream (rather than in parallel). Connection issues tend to be locally patched over by client software if brief if enough about the world is known from the data given so far. True glitches thus tend to corrupt the whole experience, not just 'one sense', unless the glitch is deep in the rendering routine of the server before being merged into the stream (if separately generated; genAesis's strength is that it doesn't do this).

The most interesting format is probably the visual one, which tends to be a mix of pre-rendered by the server and three-dimensional data, to a degree assessed by the server. If bandwidth permits, it'll often be entirely pre-rendered, which is standard; simply because that permits the server proper control over what is viewed. For sites that aren't geared toward visual importance and more toward content, the other extreme is usually true.

As for data sent to the server, this tends to come in the form of motor control cues, psychological and emotive data if requested and not forbidden by the client(!), and abstract(!) vocalisation. It's expressly up to the server how it wishes to render the action and audio.