Dave's part of the internet

Not all of this will be relevant. Or even useful.

Authentication on Hue using SAML SSO and Azure Part 1

2021-01-25 Dave

As I mentioned on this before, there will be some niche information on this, a lot of which will be driven by experiences I have in professional and personal projects. This is a prime example of something that I found it difficult to find resources for online and had to do a lot of trial and error myself in order to get it over the line and the goal of this is to save someone the heartache I suffered through to accomplish it.

Some high level information here for people who haven’t come looking for this exact solution on desperate google searches:

  • HUE is a UI used to query information on datastores, but has been long tied to the Hadoop suite of products, and is offered with services such as AWS EMR.
  • SAML is a security protocol that facilitates Single Sign On (SSO) based on XML (Fantastic Guide here)
  • Azure is Microsoft’s public cloud offering of various functionalities, the main one of which we’ll be focusing on is it’s SSO component
  • SAML authentication depends on an Identity Provider (IdP) which in this case is Azure and a Service Provider (SP) which in this case is HUE to operate. While both will make efforts towards enabling this with ease for end users, the vast number of potential software combinations can lead to some troublesome cases.

Some useful links to get people started as well as ones I found particularly helpful:


Step 1 - Libraries

First and foremost, on any hosts that make up the cluster that HUE will be working with, we need to ensure the following libraries are installed:

RHEL/CentOS

yum install git gcc python-devel swig openssl xmlsec1 xmlsec1-openssl

Ubuntu/Debian

apt-get install git gcc python-dev swig openssl xmlsec1 libxmlsec1-openssl

Step 2 - Azure SSO Application setup

On the Azure SSO UI, the set up is handled by a fairly intuitive wizard where you will need a few URLs that HUE uses for login, logout etc and then once the application is set up, there will be an option to download a metadata.xml file and a certificate file. These will be important for HUE’s configuration for SAML.

One thing that I did for my Azure application setup was to have a custom attribute created with the content I wanted for mapping to the HUE username. This is because we used a combination of existing Azure attributes to make the username so this was done on the Azure end and then mapped to a custom attribute named “uid” which was returned in SAML responses sent by Azure.

The URLs you will need to provide are all suffixed off your base URL for HUE. For example if the base URL is: https://hue.mydomain.net then the following can be derived:

Also don’t forget to ensure any users that want access to the application are granted access through the Azure group membership. At the time of posting, it’s worth noting that nested groups don’t work for lookups so you’ll need to ensure there is only one level of abstraction in the group connected to the application.

Step 3 - HUE configuration files

On the node that is running the HUE server, you will need to deploy the metadata.xml and certificate (.cer) file that Azure generated. HUE also expects a key (.pem) file as part of it’s SAML configuration to allow for encryption and decryption of requests. This file can match the contents of the certificate file and just switch the heading:

-----BEGIN CERTIFICATE-----

with

-----BEGIN RSA PRIVATE KEY-----

and similar with the footer of the .pem file. These files should be deployed somewhere that the “hue” user can access as well as being excutable by this user. For the purposes of illustration, I’ll be using “/opt/hue/saml” as the path for all these files which you’ll see referenced in configuration below

Step 4 - hue.ini file configuration

The hue.ini file is where all of HUE’s core configuration live. There are a number of changes to this file we need to make to enable SAML authentication:

[desktop]
redirect_whitelist="^\/.*$,^https:\/\/login.microsoft.com/12345678-1234-1234-1234-abcdefghijkl\/.*$"
[[auth]]
backend=libsaml.backend.SAML2Backend
[libsaml]
xmlsec_binary=/usr/bin/xmlsec1
entity_id=https://hue.mydomain.net
metadata_file=/opt/hue/samlidp-openam-metadata.xml
key_file=/opt/hue/saml/saml.key
cert_file=/opt/hue/saml/host.pem
attribute_map_dir=/opt/hue/saml/attribute_mapping
user_attribute_mapping='{"uid":"username"}'
want_response_signed=false
username_source=attributes
  • redirect_whitelist is a regex used to match the redirect URL you will be using. This is tied to your Azure SSO application and can be found in your metadata file.

  • backend is the selection of using SAML as your authentication tool, so HUE knows what to expect, this won’t change.

  • xmlsec_binary is the location of your xmlsec library. Handy command below for finding files:

     sudo find / -name xmlsec
    
  • entity_id this is the URL that HUE lives at. From this URL the SAML URLS are derived from.

  • metadata_file is the XML you got from Azure. This will need full path and name of the file.

  • key_file is the key file you derived from the certificate file. Full path and name also required.

  • cert_file is the certificate you got from Azure. Full path and name required.

  • attribute_map_dir is required to host our custom mapping. We’ll be using this due to the disconnect between Azure <-> HUE <-> Django and will read more about it in Step 5.

  • user_attribute_mapping is the map that HUE will TRY to use. The uid is sent from Azure as per my note in Step 2. However, the uid on the left of this map is the pysaml variable specifically mapping to Django’s username. You do not need to use this, you can use another one but from Azure they are fairly lengthy i.e. “http://schemas.xmlsoap/org/ws/2005/05/identity/claims/emailaddress

  • want_response_signed is how to get HUE to verify a signature on IdP responses. As Azure seems to have issues with this, I set this to false to get past this problem.

  • username_source is the way HUE will derive where to match the username from within your SAML request to the backend. Setting this to attributes means that one of the attributes sent by Azure, either default or custom, can be mapped to the username that HUE will use for authentication. This allows for some flexibility if you want to use names or email addresses for example.

  • create_users_on_login is not in the above configuration but appears in the libsaml section and can, in theory, create users once SSO has authenticated them. For additional security in our implementation, we created users in HUE’s Django backend as part of the application set up. This means any SAML authentication request maps to an existing user in the Django database.

Step 5 - Attribute Mapping

If you are following the steps so far exactly, feel free to just copy the code below and create a file in the location you set for the attribute_map_dir configuration in the hue.ini file. However, if you’re adapting your own configuration or you just want to know a bit more about why we need to do these steps, I highly recommend reading the thread in the second link I put at the top of the article.

In a nutshell, there is a chain of mapping that happens which is called out in the link above:

SAML response attribute/value ====> pysaml attribute/value

pysaml attribute/value ====> djangosaml “username” attribute

djangosaml user ====> Hue user

The uid parameter doesn’t map natively to one of the existing parameters in pysaml, even though there is an existing mapping for uid in it’s config because it’s looking for a seperate OID (seriously, read the thread here to learn more) so we need to create our own mapping instead. To do that, we create the following python file:

## saml_uri.py
MAP = {
  "identifier": "urn:oasis:names:tc:SAML:2.0:attrname-format:uri",
 "fro": {
  'uid': 'uid',
 },
 "to": {
  'uid': 'uid',
 }
}

This file is mapping the uid coming from the Azure SAML response to the uid that pysaml understands by bypassing the OID lookup. In theory this is where you can map any of your SAML response attributes and map them to any of the attributes that pysaml can handle (Full list of them are at desktop/libs/libsaml/attribute-maps/SAML2.py in HUE deployments).

I have no idea why they use “fro” instead of “from”.

Conclusion

And that’s it! This is the configuration that I successfully implemented, if you’re having problems doing the same, feel free to contact me over twitter.