XEP-136 and XEP-59 implementation comments

Tagged:

This article discusses issues arisen during implementation of XEP-0136 specification, which describes the protocol for server-side instant messaging history, and XEP-59, which basically provides the way to page through results set and limit results number.

Basically, it’s the copy of letter sent to XMPP discussion mailing list, I’ve put it here to make sure more people have access to it and can provide their feedback / comments.

For most of the issues listed some solutions are proposed.

When working on mod_archive_odbc implementation (XEP-136 support for ejabberd) and libwsw (the library for XEP-136 support on clients side) I discovered different issues with XEP-136 standard which I’d like to present here, in the hope that they still can be addressed until XEP-136 goes “gold”.

For those issues that have proposed solutions, support for these solutions was implemented and verified in mod_archive_odbc and libwsw, so they’re certainly feasible.

Update: sections “Who changed it?” and “List collections for Bare JID / Domain” are no longer relevant as there are some changes in proposals emerged during discussion in XEP standards mailing list. I currently have no time to describe them here, but might do so in the future.

Replication

Replication in XEP-136 has at least two flaws which make its usage somewhat problematic.

Duplicate items

Basically, it is stated that client should use specifically prepared “after” RSM tag to specify the point where to start from.

IMHO this seems to be quite bad decision on its own for at least two reasons:

  1. This violates XEP-59, item 2.2: “The requesting entity MUST treat all UIDs as opaque”.
  2. It differs from all other XEP-136 commands, which use “start” / “end” attributes to specify the required range, thus creating unwanted and confusing “special case”.

That being not enough, it creates yet another problem.

Consider several changes that are done at the same period of time - for example, as part of “remove range” request, but it can happen even without that if several messages come in different conversations at the same time and auto-archiving is enabled - and, thus, change times of these conversations become equal.

Now if someone issues “modified” query and this query due to RSM “limit” clause stops at some of the collections with the same changed time, the next query will either list all collections which were sent to the client already with the same time - or skip all remaining collections with the same time, as the server have no way to determine where to start its answer having only “change time” value in “after” element.

Both cases seems to be quite bad, as first one requires the additional filtering on client side and in some cases may mean client will enter the infinite loop (if “limit” size is less than number of collections with the same changed time), and second one means that some data will be just missing, thus destroying synchronization between client and server.

Proposal: change “10. Replication” item by removing references to “after” and “last” element and stating that start replication date should be specified using “start” attribute of “modified” command with additional note that the collections with changed time exactly equal to “start” time are NOT included in the result (thus, “start” will effectively work as “after”).

So, rephrasing the query from Example 57:

<iq type='get' id='sync1'>
  <modified xmlns='http://www.xmpp.org/extensions/xep-0136.html#ns'
            start='1469-07-21T01:14:47Z'>
    <set xmlns='http://jabber.org/protocol/rsm'>
      <max>50</max>
    </set>
  </modified>
</iq>

Probably, it may make sense to use “after” attribute in “modified” command to highlight the difference with “start”, I’m not sure which solution is better.

Then “modified” command may work just like any other command and RSM will also be used consistently to page through results without any ambiguity.

This change can be done with complete backward compatibility: if server discovers that there’s “after” RSM element that specifies datetime - use old mechanism, if “after” is not specified or is not datetime - use the new one.

Who changed it?

Typically the client will perform replication when it has some local cache for collections / messages, to synchronize its cache with server one. Therefore, it makes sense that client also use this cache for caching those collections client uploads.

However, implementing it strictly according to XEP-136 means that client has no way to determine if the changes received in replication were done by this client or not - so, it will have to re-fetch entire collection even if “changed” item in replication results was caused by upload from itself, thus basically downloading the same collection it just uploaded on the server, which is stored already in local cache.

Proposal: extend replication answer to include “by” attribute, which specifies full JID of entity who made that change. Then client that receives replication results can verify if the change was done by itself or not, thus discarding those changes that are cached locally already.

Example:

<iq type='result' to='romeo@montague.net/orchard' id='sync1' >
  <modified xmlns='http://www.xmpp.org/extensions/xep-0136.html#ns'>
    <removed by='romeo@montague.net/pda'
             with='balcony@house.capulet.com'
             start='1469-07-21T03:16:37Z'/>

…..

This change can be done with complete backward compatibility, as it’s just extends the answer format.

XEP-59: detecting the change

During caching in client implementation I faced up with the problem that it may be dangerous to fetch collections when client is not synchronized with the server, as if client maintains some internal state based on received results and it receives results after some change was made, but it doesn’t know it - it may screw up its internal state.

Consider the following example: the client builds indices for collections that are fetched by utilizing RSM “index” attribute, so that it can answer locally indexed requests. However, these local indices are valid until the change is made on server side - after that they should be rebuild using replication.

However, if now the client fetches some collections after the change happened on the server, and it wasn’t able to discover that - it will screw up its indices by inserting fetched collections in local cache, as indices may be shifted already - and will be shifted once more, when performing replication, as client cannot detect at replication time what collections were fetched before replication and what collections after it.

Please note that this is just one possible scenario, there may be some others. All such scenarios would require some form of determining whether fetched results are “valid”, which translates to “were they changed compared to some fixed time point?”

One possible solution here is to perform replication before and after each query to the server, and discard results of query just performed if it is found out that change took place - however, this seems to be unacceptably high overhead, as instead of 1 query the client has to perform 3 queries.

Proposal: add to RSM result the tag “changed”, which, when present, indicates the datetime of the most recent change of the items affected by the query. It typically shouldn’t be that problematic to compute this value (certainly it wasn’t for XEP-136 implementation), and it can be made optional, as it is done with “index” if in some cases it’s hard to calculate it.

Example:

<iq type='result' to='romeo@montague.net/orchard' id='sync1' >
  <modified xmlns='http://www.xmpp.org/extensions/xep-0136.html#ns'>
    <removed by='romeo@montague.net/pda'
             with='juliet@capulet.com/chamber'
             start='1469-07-21T02:56:15Z'/>
...

    <changed by='romeo@montague.net/orchard'
             with='balcony@house.capulet.com'
             start='1469-07-21T03:16:37Z'/>
    <set xmlns="http://jabber.org/protocol/rsm">
      <first index="0" >63362086582@1</first>
      <last>63362092915@51</last>
      <changed>1469-07-21T04:22:39Z</changed>
      <count>1372</count>
    </set>
  </modified>
</iq>

Inconsistencies or omissions

Start attribute

Attribute “start” usage seems to be inconsistent:

  • For “8.1 Retrieving a List of Collections” it is “If only ‘start’ is specified then all collections on or after that date should be returned.”
  • For “8.3 Removing a Collection” it is “If the end date is in the future then then all collections after the start date are removed.”

I assume it’s just a typo and for 8.3 it should be “on or after” instead of “after”, no?

Remove JID from prefs

I’m not sure if this is a problem or not, but it seems there’s no way to remove JID from user prefs once it is there. I’m not really comfortable with this as it means even if some JIDs are removed from your roster, all their collections are also removed - but still you have them in prefs without the possibility to remove them, and this list will grow over time.

Wouldn’t it make sense to specify that uploading item with all tags besides JID being empty removes this user from prefs, in the same way it is done for links and extra info in chats?

JIDs prefs: ambiguity

Possibly related to previous item: what should happen if during prefs upload some attributes for JID are not specified, and they were present earlier? Should they be reset to default values, or remained as they were before update?

Taking into account previous item, I see two possibilities:

  1. All missing attributes are remained as is unless none are specified - then request is treated as removal request.
  2. All missing attributes are reset to default values unless none are specified - then request is treated as removal request.

Second case seems to be more logical for me, as then removal behavior follows almost automatically from general case.

Resource modification when auto archiving

When performing auto archiving it’s possible that the initial message may be not enough to determine full JID of the recipient - if the conversation is initiated by the client whose server performs auto archiving and the client does not know what resource it should use, it will send the message to bare JID, thus initiating auto archiving for collection with bare JID.

However, when reply message is received, we know now the full JID - thus, it could make sense to adjust initial collection, changing its JID to full JID, otherwise we will either start new collection after message is received - or continue recording in bare JID collection, thus effectively eliminating resource usage in “with” attribute altogether.

Of course, the possibility here would be to just drop all resources from JIDs and store only bare JIDs, but that seems to be too limiting and inconvenient.

The proposal here is to specify the algorithm the implementation should use to perform tracking of conversations, making the best effort to determine and correct JIDs when additional info becomes available.

In mod_archive_odbc I’ve implemented tracking algorithm, but it’s possible I’ve missed some points due to not really good knowledge of XMPP standards or lack of experience with XMPP-related developments. I will provide description of the algorithm in appendix - please, fill free to comment on it or take it as the basis for inclusion to XEP-136, if it appears to be useful.

Various small notes

Duplicate messages times

In “5.3 Uploading Messages to a Collection” it’s specified that “If the collection already exists then the server MUST append the messages to the existing collection.” However, it’s not said what should be done if time for some of the messages is equal to time of those messages existing already in collection.

I assume that from “append the messages” clause it follows that duplicate entities should be created, but it could be good to mention to avoid ambiguities.

Malformed XML in examples

Example 21. Private chat linked to later groupchat” and “Example 24. Private chat with attributes form” contain malformed XML: first message in chats starts as “to”, but closes as “from”.

List collections for Bare JID / Domain

There seems to be no way to list collections solely for service JID, as according to XEP-136 it’s treated as domain JID request.

For example, when trying to list all collections for icq.example.com you will get instead all collections of all users at icq.example.com - even if you wanted to receive collections ONLY for icq.example.com

I do not think this is major problem, as it can be filtered out on client side - the only drawback is high amount of extra traffic, so, probably, it can be left as it is, but adding some notice in specification on that subject could be nice.

File format

From my experience it seems that limiting one file to conversation with just one JID is too restrictive - dealing with one single file for all JIDs is much more convenient in many cases than with a bunch of files.

On the other hand, I can imagine when it’s better to separate it to small files.

Therefore, probably the restriction could be just removed by allowing having “with” attribute in “chat” items stored in file and making “with” attribute for “archive” tag optional? This doesn’t seem like a big change, but will certainly make this file format more useable for those cases when one big file is preferred, such as backup, or import / export.

Appendix: conversations tracking

Here is the approach that is used in mod_archive_odbc.

It’s assumed that information about all active collections being recorded is stored in dictionary.

The dictionary has two levels: first level key is bare JID, second level key is the thread. If thread is not present, {no_thread, Resource} is used instead.

Algorithm for deciding on collection to use when some message is received is as follows:

  1. If thread is specified in the message - just use both-levels keys normally, reusing some existing collection if there’s a match or creating new one if no matching collection found.
  2. If no first-level key exists for this JID: create new collection + two level keys with existing information, second level key will be {no_thread, Resource} with Resource being either empty or non-empty.
  3. Otherwise use first-level key to get access to second-level keys, then:

    • If resource IS specified: search for matching resource through second-level keys:
      • if found - just use the appropriate collection.
      • if not, search for second-level key with empty resource. If found, use its collection and rewrite key’s & collection’s empty resource to the new one. If not - create new collection + key.
    • If resource IS NOT specified: use the most recent second-level key or create new collection if none exists.
Syndicate content