Wednesday, March 29, 2017

Website Authentication through SAML

Single Sign On (SSO) is a way to separate the authentication mechanism from the rest of the service, such as a webpage or mobile app. The first benefit of using SSO is that the service does not need to implement its own authentication logic. The second benefit is that the same authentication mechanism can be used across multiple services. The third benefit is that a service can support multiple authentication options. Single Sign On does not mean that multiple services share users. A user of one service does not have to be able to access all services.

SAML is a Single Sign On solution. Most webpages about SAML describe it at a high, very abstract level because the specification allows some flexibility to the implementation. But I've only seen SAML implemented one way and I thought it would be useful to explicitly describe that scenario: Someone using SAML to log into a website.

Important Vocabulary SAML:

User: In our case, the browser. More generically, whatever is trying to access the SP.
SP (Service Provider): In our case, the protected web page the User wants to access. More generically, whatever service the User is trying to access. The SP relies on the IdP for authentication.
IdP (Identity Provider): In our case, the webapp that authenticates users and creates SAML Assertions. More generically, whatever is responsible for identification.
SAML Assertion: A message created by the IdP and used by the SP. This message identifies a User who is authenticated.

How SAML Works:

Before an actual request comes in, the IdP and SP are configured to know about each other, know each other's SAML relevant urls, and to trust each other. The IdP and SP will never directly communicate with each other (unless you're using Single Logout).

The complete SAML flow looks like this:

  1. The User requests a protected resource (say, the index page) from the SP. The SP sees the user has no session in the SP (by comparing against a header or cookie in the HTTP Request.) So the SP redirects the user to the IdP url that is used for generating SAML Assertions.
  2. The User requests a SAML Assertion for the SP from the IdP. The IdP has no session for the user, so redirects the User to a url (either local or remote) where they must authenticate.
  3. The User requests the login page and then fills out the login form.
  4. The User posts the login information and is authenticated to the IdP. The user is redirected back to the IdP url to get a SAML Assertion.
  5. The User requests a SAML Assertion for the SP from the IdP. The IdP now has a session for the user, so it creates an encrypted SAML Assertion and returns it to the User. The IdP redirects the User to the SP url that is used for validating SAML Assertions.
  6. The User posts the SAML Assertion to the SP url. The SP validates the SAML Assertion. Using information included in the SAML Assertion (for example, an email or special uid), the SP can match the SAML Assertion to an internal user. The SP will create a session and redirect the User to the original resource they requested.
  7. The User requests a protected resource from the SP. The SP sees the User has a valid session and serves the resource.
The Confusing Bits

Does the IdP ever contact the SP for Authentication?

No. At least not that I've ever seen. The SAML Assertion is transported by the User. The SAML Assertion is encrypted so there's no need to be concerned about the Assertion being misused. Note that if the IdP is being used for Single Logout, then the IdP will need to be able to contact the SP.

IdP Initiated vs. SP Initiated?

You see these terms when you're looking into SAML. It seems confusing, but all it really means is whether the User goes to the SP first or the IdP first. So, following the flow I listed, it decides whether you start at Step 1 or Step 2. There's no other difference. It doesn't mean that SP communicates with the IdP. SP Initiated is almost certainly what you need to do and gives you more flexibility in what resource the User gets at the end.

Logout?

Logout is a special case that requires extra decisions. If a user logs out or is removed from an SP, should they be logged out of all SPs? Probably not. If a user's session or the user themselves are removed from the IdP, should they immediately be logged out of all SPs? This is up to the security requirements. If the User is not removed, the User will remain authenticated to the SP until the User needs a new session. This could be a long time. If the User is removed immediately, then the IdP will need a way to tell the SPs whenever a SAML Assertion is invalidated. This functionality probably requires the IdP to be able to contact the SPs. There's no right way of handling logout, it all depends on how the security of the system needs to work.

So how do I implement this?

OAuth is much more popular nowadays, but in case you need to support SAML for whatever reason, implementation is fairly straightforward. You generally write your own authentication piece, configure an IdP to know about you, and then point to the IdP. The IdP I'm familiar with is OpenAM, which was once upon a time called OpenSSO.

Tuesday, March 28, 2017

SSL Example

How does SSL work? This is something that every developer should understand, even if they never use the information. It's quickest to explain what happens when I use a service like Mochimarks, my bookmarking app that supports HTTPS.

Before we start, note that Mochimarks has a private key and a public key. Our fundamental assumption (backed up by proofs and research and studies and so on) is that when you encrypt a message using the public key, the message can realistically only be decrypted with the private key. This kind of encryption is called Asymmetric Encryption. The alternative is Symmetric Encryption, where a single key is used by both sides for encryption and decryption.


Secondly, note that there exists an entity called the Certificate Authority (CA), which exists to manage certificates. Each browser and OS comes with a list of Certificate Authorities and corresponding public keys that it knows it can trust.


Now let's step through the flow when I try to access Mochimarks.

  1. My browser tries to connect to the Mochimarks server by sending an http request.
  2. The Mochimarks server replies with a request to establish a https connection instead by sending an https request.
  3. My browser sends an https request.
  4. The Mochimarks server replies with a certificate. This certificate is digitally signed by a Certificate Authority and includes the Mochimarks server's public key.
  5. My browser checks the https certificate and agrees to use https to connect. The least intuitive step here is how my browser decides it can trust the CA. First, my browser determines the CA that signed the certificate. My browser checks that it trusts the CA against its built in list of trusted CAs. The browser uses the public key of the CA to encrypt a message, which it then sends to the CA. If the CA can respond with the original message, then my browser knows it can trust the CA.
  6. My browser chooses a random new symmetric key K to use for its connection to Mochimarks. It encrypts K under Mochimarks's public key. A third party can not decrypt K without the private key, which only Mochimarks has. My browser sends K to Mochimarks.
  7. Mochimarks decrypts K using its private key. Now both my browser and the Mochimarks server know K, but no one else does.
  8. Anytime my browser wants to send something to Mochimarks, it encrypts it under K; the Mochimarks server decrypts the message upon receipt. Anytime the Mochimarks server wants to send something to my browser, it also encrypts it under K, which my browser decrypts upon receipt. So at this point, we're using symmetric encryption.
Why bother with the Symmetric Encryption when we can use the safer Asymmetric Encryption? Asymmetric Encryption is much slower (more computationally expensive, more data, more steps), so while it's fine to do it once per session, it would cause problems if it was used for every request.

Monday, March 27, 2017

ANAME / CNAME

An A-Name record is a domain that points directly to an IP Address. A CNAME is an alias record that points to another domain. This allows two domains to share an IP Address. If there's ever a problem with the IP chosen, the IP Address can be changed immediately with no problems.

If using an ANAME record, my.Foo.com would point to 1.1.1.1 and my.Bar.com would point to 1.1.1.1 to share IP Addresses. If you want my.Bar.com to point to a new IP Address, it can be a huge pain. With CNAME, my.Foo.com could point to subdomain1.myregistrar.com, which points to 1.1.1.1. my.Bar.com can point to subdomain2.myregistrar.com, which points to 1.1.1.1. When I want Bar.com to go somewhere different, I have control of subdomain2.myregistrar.com so that's an easy change.

The downside here is that according to the DNS specifications, Root Domains (like foo.com or bar.com) can't be CNAME records.

Friday, March 24, 2017

Runnable vs. Thread in Java

Runnable (an interface you implement) vs. Thread (an class you extend): Basically, always use Runnable. It's less tied to a concurrency model (meaning it works with Threads and Futures) and allows reuse of threads. Executors work with Threads and Futures... well, technically only with Runnables, but Thread and FutureTask both implement Runnable. A big fan of futures suggested 'Future f = new FutureTask<object>(runnable, null)', but I think that's pretty ugly.

Thursday, March 23, 2017

ACID, BASE, and CAP

ACID

The claim to fame for relational databases is they make the ACID promise:


Atomicity - A transaction is all or nothing

Consistency - Only valid data is written to the database
Isolation - Pretend all transactions are happening serially and the data is correct
Durability - What you write is what you get. Data won't disappear.

The "problem" with ACID is that it may be giving you too much at the cost of performance. Fulfilling these promises has a big impact to scalability. It trips you up when you are trying to scale a system across multiple nodes.


Down time is generally unacceptable for cloud applications. So your system needs to be reliable. Reliability requires multiple nodes to handle machine failures. To make scalable systems that can handle lots and lots of reads and writes you need many more nodes.


Once you try to scale ACID across many machines you hit problems with network failures and delays. The algorithms don't work in a distributed environment at any acceptable speed.


BASE


The types of large systems based on CAP aren't ACID. They are BASE (har har):


Basically Available - System seems to work all the time

Soft State - It doesn't have to be consistent all the time
Eventually Consistent - Becomes consistent at some later time

CAP


If you can't have all of the ACID guarantees, it turns out you can have two of the following three characteristics:


Consistency - Your data is correct all the time. What you write is what you read.

Availability - You can read and write your data all the time
Partition Tolerance - If one or more nodes fails, the system still works and becomes consistent when the system comes on-line.

Wednesday, March 22, 2017

Type Erasure in Java

When generics are used, they're converted into compile-time checks and execution-time casts. Type erasure means that at execution time, there's no way of figuring out that is a String because that information has been erased. Type Erasure exists in order to keep java bytecode backwards compatible with old JVM versions. Instead of implementing generics through erasure, generics could be implemented through “reification.” These would be called reified generics and would retain the type information. For a pretty good introductory exploration of generics in Java, see http://beust.com/weblog/2011/07/29/erasure-vs-reification/.

Tuesday, March 21, 2017

Lat / Lon and Bounding Areas

To add to the post made yesterday, I thought I'd add some notes and code for some latitude / longitude bounding area logic.

Deciding if a point is within a bounding box on a globe

You need two cases to account for the special case where the bounding box crosses over the 180 degree meridian.  Here's some sample code:

  1. if( upperLeftLong > lowerRightLong &&  
  2.     latitude <= upperLeftLat &&  
  3.     latitude >= lowerRightLat &&  
  4.     longitude >= upperLeftLong &&  
  5.     longitude <= lowerRightLong )  
  6. {  
  7.   //Special case when the 180 degree meridian crossed  
  8.   //The point is within the bounding area  
  9. }  
  10. else if ( latitude <= upperLeftLat &&  
  11.           latitude >= lowerRightLat &&  
  12.           longitude >= upperLeftLong &&  
  13.           longitude <= lowerRightLong )  
  14. {  
  15.   //Normal case  
  16.   //The point is within the bounding area  
  17. }  
  18. else  
  19. {  
  20.   //The point is not within the bounding area  
  21. }  
Deciding if a point is within a bounding circle on a globe:

In the project I was working on, someone else had written a function that determines whether or not a coordinate is within a bounding circle on Earth.  The bounding circle was defined by a center coordinate and a radius.  The function was not always working and the original author was long gone, so I looked at it myself.  It turns out the problem was that the equation being used was for a 2 dimensional plane, which the Earth is not.  So I just needed to replace it with an equation for determining if a point is within a circle on a sphere. This is what I found:

acos(sin(lat1)*sin(lat2)+cos(lat1)*cos(lat2)*cos(lon2-lon1))*EarthRadius <= Radius

The latitudes and longitudes must be in radians. This is the spherical law of cosines and should work well for our purposes. The Haversine Formula seems to be more precise, but more computationally expensive, and we're not using small values anyway. For further optimization, see: http://www.movable-type.co.uk/scripts/latlong-db.html