Locking down the default warehouse in Hive

If you are using a secured cluster, you may wish to segregate the data in HDFS so that different users are able to access different data. If you do this, you probably want to use this directory structure for all your services and tools.

For example, you will want to use only external tables in Hive and have the table locations be in specific directories on HDFS in your segregated directory structure. You will also want to lock down the default Hive warehouse location (/apps/hive/warehouse or /user/hive/warehouse) so that users won't be putting data into an insecure location that is accessible by all.

The most intuitive way to lock down the default warehouse is "hadoop fs -chmod 000". However, if you try to create an external table with the internal warehouse at 000 permissions, you will get an error similar to this:

Authorization failed:java.security.AccessControlException: action WRITE
not permitted on path hdfs://<hostname>:8020/apps/hive/warehouse for user
anastetsky. Use show grant to get more details.



Looks like Hive is still trying to write to the internal warehouse even when creating an external table!

This is a Hive bug. It seems as if Hive requires the warehouse to have "write" permission. If you unlock the warehouse and try again, it doesn't look like it actually writes anything to it, it just wants the directory to have the "write" permission. 



There is a workaround.




You can set the warehouse to 222 instead of 000, which gives everyone "write" permission to it, but no "read" or "execute". Now, creating an external table works. But won't users now be able to create internal tables and not know their mistake until they try (and fail) to read them?



No, because creating an internal table and actually writing data to the internal warehouse also requires the "execute" permission, which the warehouse does not have (it's set to 222). You would get an error like the following:



FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. MetaException(message:Got exception: org.apache.hadoop.security.AccessControlException Permission denied: user=anastetsky, access=EXECUTE, inode="/apps/hive/warehouse":hive:hdfs:d-w--w--w-

No comments:

Post a Comment