February 06, 2007

Authenticated Distributed Search (OpenSearch, OpenID)

I've been working on Drupal distributed search for a while now, releasing a beta of the OpenSearch Aggregator as well as a release of the OpenSearch feed module. The aggregator has a friendly UI for setting up any number of sources and the feed contains relevance information from the Drupal search system. Results are also cached on the aggregator for performance reasons.

More information about these modules can be found in my earlier blog posts about OpenSearch.

The ultimate goal however is to set up distributed search for a Bryght client between a network of secure Drupal sites. The searches for logged-in users should include content that is visible to them across all the different Drupal sites.

OpenID is the obvious choice as an identity mechanism for the users, but it does not immediately help us with the authentication. I've written a document after some research that details possible approaches and solutions. Because we're talking about frontier technology here, it seemed best to repost it publicly to sollicit feedback from anyone interested. I could certainly use some extra opinions on this, as it is all very new to me.

Essential requirements

The most basic requirement can be summed up as follows:

Given the search aggregator (master) and a group of sites that return results (slaves). Whenever a search is performed by a user logged in on the master, the master contacts all the slaves and passes along the identity of the user making the request. This information needs to be unambiguous and secure. Slaves will then return results respecting the user's access permissions back to the master, who aggregates and caches the results for browsing.

All communication to the slaves is done by the master. For practical and performance reasons, none of this can be implemented on the browser/client-side (other than maybe a trivial form/redirect to log-in once).

1. Naive implementation

A very simple implementation could use each user's e-mail address as the global identifier for identity, and map it to the local Drupal uid on each site. The request could be signed using a simple keyed hash (HMAC) with a shared secret key that is set on all the participating sites (thus encoding trust). A query from the server to a slave might look like:

GET http://example.com/opensearch/node/keywords?user=name@example.com&hmac=123456789abcdef

There are several problems with this:

The user needs to register manually on each of the participating sites using the same e-mail address.
A shared secret needs to be manually set on each of the participating sites.
All participating sites must be part of a completely trusted, closed network.

2. Improvements: OpenID log-in

Some first low-hanging fruit is using distributed OpenID for the end users. This avoids explicit registering on each of the participating sites, as the Drupal OpenID module does this for us when you log-in the first time. It also gives us a globally unique identifier (the user's OpenID) which is verified (by DNS), to cross reference the local numeric uid's with.

However, OpenID is distributed in nature, and does not immediately help us when we want to have the master prove to the slave that it is allowed to fetch content for a certain user. The only entity that knows and holds the keys/cookies to which sites a user is logged in to is the user's browser (User-Agent), while the home site knows only which sites are allowed and have been logged in to in the past.

The trust between master and slave would still have to be implemented through some other means, for example again using a shared secret key and HMAC verification:

GET http://example.com/opensearch/node/keywords?user=user.openid.com&hmac=123456789abcdef

Instead of authenticating every request, we could also implement an authenticated 'back door' through which the master can force log-ins on the slaves without doing actual OpenID authentication with the home server. The result would be a session cookie for each slave that can be used normally by the master:

POST http://example.com/trust/?user=user.openid.com&hmac=123456789abcdef => Cookie is set GET http://example.com/opensearch/node/keywords Cookie: PHPSESSID=123456789abcdef123456789abcdef

This backdoor login would be provided by another module, and would have to rely on e.g. a DNS or IP-based whitelist of allowed hosts, optionally with SSL to ensure confidentiality.

Access control would be respected by Drupal using the normal session mechanism and the OpenSearch client module would not need to be altered. The cookies can be stored along with the local user account on the master, and aggregated OpenSearch data would be cached on the master per user.

3. Search master as home server

A possible solution is to restrict the OpenID home server to be the search master.

This would allow the master to log-in to the slaves directly, as it can produce all the necessary cryptographic tokens without needing user action. No modifications are needed to the slaves. Once the master has logged in once to each site, it has a valid session cookie for each (like in (2)).

4. Multi-login extension to OpenID

Another solution would require an extension to OpenID, both to the client module for Drupal and the code that runs the home server. Still, it would allow the home server to be any other server (or even a set of servers), provided the home server supports the custom extension.

When the user logs in to the search master, the module would know that it will need access to each of the search slaves. When it sends a request to the home server, it would not only ask for a log-in to itself, but also for each of the slaves. The user would get a single screen on the home site to log-in to (with correct notification that this is a multi-login), and is returned to the search master.

The search master logs in the user locally, and uses the cryptographic tokens for each of the slaves to log into them. The slaves can verify the log-in tokens using direct communication with the OpenID provider. Like in (3), normal Drupal cookies are returned to the master and used to perform the searches.

5. Other ideas

Of the above methods, only (2) is really immediately practical. Using OpenID at the base does not help if you still need proprietary extensions, or if you take away the ability to choose a home site. And if we do want to do (4) properly, then we need to develop an actual spec that respects the principles and security of OpenID. Not an easy job.

The downside of (2) is that there is no actual proof that the log-in took place, and that we rely on the shared key to ensure trust.

However, because the whole Identity 2.0 space is still developing, I think it would be silly to try and build something elaborate that implements some sort of utopian federation/whatchamacallit model. It would be an insane amount of work and would not be future proof or even useful today as very few real-world services would support it. I think we just have to wait here to see what develops, once OpenID gains some more widespread use and people get more comfortable with these concepts (which should happen in 2007).

5.1. SAML

SAML has been suggested as a standardized way of passing along security assertions. However, SAML is really a competitor to OpenID and started as a way of doing single sign-on between trusted sites.

The main difference is that OpenID is tied to a particular HTTP exchange pattern, while SAML better separates the message from the delivery method. Still, SAML is based on exactly the same principles, so assertions have to be generated and signed by the home server. So, they are still useless if we want the master to securely prove the log-in of a certain OpenID. Of course, we could encapsulate the message from master to slave in an unsigned SAML message, but that would really defeat the point of using SAML in the first place.

SAML itself doesn't do trust either. The slaves would still have to have a whitelist that includes only the master and which would be verified again by DNS/IP or SSL.

5.2. OpenID Proof-of-login token

The big functionality hole in OpenID is that the only one who can verify the cryptographic tokens for a log-in is the intended recipient (the relying party). There is no way in the specs to 'forward' the assertion by the home server that the user logged in to the relying party in a way that makes the information verifiable for everyone.

An extension to OpenID for this would be good. It's essentially a variant on the multi-login idea, where, instead of asking the user to log-in to the cloud of sites, the user only logs in to the master, and the master sends proof of this log-in to the slaves:

POST http://example.com/trust/?user=user.openid.com&....crypto-here.... => Cookie is set

The slaves would again have a whitelist based on IP/DNS or SSL. The main difference with (4) is that here, trust is again set by the administrator rather than explicitly given by the user through OpenID.

Conclusion

(4) sounds like the cleanest solution, as it does not rely on explicit whitelisting for trust. The user simply has to check that the master is not asking for a log-in to an external site. However, it would require an extension to the OpenID log-in process on the home server, including the UI (as the extra log-ins have to be communicated somehow), so it is unlikely that it would be implemented easily.

(5.2) sounds better in this light, because only changes under the hood need to be made. The home server would simply send along extra cryptographic tokens that can be forwarded to other parties. Depending on security issues, these tokens could be requested in general (by amount) or for specific recipients (for each specific slave).

Both solutions require serious crypto and spec work to do properly.

Bryght Drupal OpenID OpenSearch SAML February 06, 2007

Hackery, Math & Design

Steven Wittens i