Wednesday, 9 March 2016

Hive metastore on mysql database

######################################
## Hive on mysql
#####################################
Step1:
To install MySQL on a Red Hat system:
$ sudo yum install mysql-server

Starting Mysql service
$ sudo service mysqld start

Step2:
To install the MySQL connector on a Red Hat 6 system:
Install mysql-connector-java and symbolically link the file into the /usr/lib/hive/lib/ directory.

$ sudo yum install mysql-connector-java
$ ln -s /usr/share/java/mysql-connector-java.jar /opt/hive/lib/mysql-connector-java.jar

Step3:
$ sudo /usr/bin/mysql_secure_installation
[...]
Enter current password for root (enter for none):
OK, successfully used password, moving on...
[...]
Set root password? [Y/n] y
New password:
Re-enter new password:
Remove anonymous users? [Y/n] Y
[...]
Disallow root login remotely? [Y/n] N
[...]
Remove test database and access to it [Y/n] Y
[...]
Reload privilege tables now? [Y/n] Y
All done!

Step4:
To make sure the MySQL server starts at boot:

On Red Hat systems:
$ sudo /sbin/chkconfig mysqld on
$ sudo /sbin/chkconfig --list mysqld
mysqld          0:off   1:off   2:on    3:on    4:on    5:on    6:off

Step5:
$ mysql -u root -p
Enter password:
mysql> CREATE DATABASE metastore;
mysql> CREATE USER 'hive'@'ec2-54-210-74-58.compute-1.amazonaws.com' IDENTIFIED BY 'hive';
mysql> GRANT all on *.* to 'hive'@'ec2-54-210-74-58.compute-1.amazonaws.com' identified by 'hive';
mysql> flush privileges;
------------------------------------------------------
Note: Procedure to export and import dump for other cluster setup
backup:
$mysqldump -u root -p[root_password] [database_name] > dumpfilename.sql

restore:
$mysql -u root -p[root_password] [database_name] < dumpfilename.sql
-------------------------------------------------------

Step6:
Adding/Modifying the properties in hive-site.xml

<property>
  <name>javax.jdo.option.ConnectionURL</name>
  <value>jdbc:mysql://localhost/metastore</value>
  <description>the URL of the MySQL database</description>
</property>

<property>
  <name>javax.jdo.option.ConnectionDriverName</name>
  <value>com.mysql.jdbc.Driver</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionUserName</name>
  <value>hive</value>
</property>

<property>
  <name>javax.jdo.option.ConnectionPassword</name>
  <value>hive</value>
</property>

<property>
  <name>datanucleus.autoCreateSchema</name>
  <value>false</value>
</property>

<property>
  <name>datanucleus.fixedDatastore</name>
  <value>true</value>
</property>

<property>
  <name>hive.metastore.uris</name>
  <value>thrift://localhost:9083</value>
  <description>IP address (or fully-qualified domain name) and port of the metastore host</description>
</property>

---------------------------------------------------
Step8:

Start thrift service
hive --service metastore

Apache Ambari for HDP installation (screen shots)

Apache Ambari for HDP installation

Welcome page:
Screen Shot 2013-03-29 at 3.12.16 PM


Nodes Registration and checks:
Now we can target hosts for installation with a full listing of host names or regular expressions (in situations when there are many nodes with similar names):
Screen Shot 2013-03-29 at 3.13.20 PM
The next step is node registration, with Ambari doing all of the heavy lifting for us. An interface to track progress and drill down into log files is made available:
Screen Shot 2013-03-29 at 3.22.53 PM
Upon registration completion, a detailed view of host checks run and options to re-run are also available:
Screen Shot 2013-03-29 at 3.22.57 PM


Service Selection:
Next, we select which high level components we want for the cluster. Dependency checks are all built in, so no worries about knowing which services are pre-requisites for others:
Screen Shot 2013-03-29 at 3.23.26 PM
After service selection, node-specific service assignments are as simple as checking boxes:
Screen Shot 2013-03-29 at 3.24.00 PM
Database credential and database selection:
Screen Shot 2013-03-29 at 3.24.24 PM
Review Board:
Screen Shot 2013-03-29 at 3.24.40 PM

Installation:
Ambari will now execute the actual installation and necessary smoke tests on all nodes in the cluster. Sit back and relax, Ambari will perform the heavy lifting yet again:
Screen Shot 2013-03-29 at 3.38.15 PM
Detailed drill-downs are available to monitor progress:
Screen Shot 2013-03-29 at 3.38.18 PM

Screen Shot 2013-03-29 at 3.38.23 PM
Ambari tracks all progress and activities for you, dynamically updating the interface:
Screen Shot 2013-03-29 at 3.44.45 PM

Dashboard:
We have our Hortonworks Data Platform Cluster up and running, ready for that high priority POC:
final


IPA_RANGER_HDP_2.2


#####  Authorization & Audit: allow users to specify access policies and enable audit around Hadoop from a central location via a UI, integrated with LDAP

- Goals:
  - Install Apache Ranger on HDP 2.1
  - Sync users between Apache Ranger and LDAP
  - Configure HDFS & Hive to use Apache Ranger
  - Define HDFS & Hive Access Policy For Users
  - Log into Hue as the end user and note the authorization policies being enforced

- Pre-requisites:
  - At this point you should have setup an LDAP VM and a kerborized HDP sandbox. We will take this as a starting point and setup Ranger

- Contents:
  - [Install Ranger and its User/Group sync agent](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#install-ranger-and-its-usergroup-sync-agent)
  - [Setup HDFS repo in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#setup-hdfs-repo-in-ranger)
  - [HDFS Audit Exercises in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#hdfs-audit-exercises-in-ranger)
  - [Setup Hive repo in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#setup-hive-repo-in-ranger)
  - [Hive Audit Exercises in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#hive-audit-exercises-in-ranger)
  - [Setup HBase repo in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#setup-hbase-repo-in-ranger)
  - [HBase audit exercises in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-ranger-21.md#hbase-audit-exercises-in-ranger)
 
- Video:
  - <a href="http://www.youtube.com/watch?feature=player_embedded&v=Qi-fxJTNhtY" target="_blank"><img src="http://img.youtube.com/vi/Qi-fxJTNhtY/0.jpg" alt="Authorization and Audit" width="240" height="180" border="10" /></a>

---------------------------


 
#####  Install Ranger and its User/Group sync agent


- Download Ranger policymgr (security webUI portal) and ugsync (User and Group Agent to sync users from LDAP to webUI)
```
mkdir /tmp/xasecure
cd /tmp/xasecure
wget http://public-repo-1.hortonworks.com/HDP-LABS/Projects/XA-Secure/3.5.001/xasecure-policymgr-3.5.001-release.tar
wget http://public-repo-1.hortonworks.com/HDP-LABS/Projects/XA-Secure/3.5.001/xasecure-uxugsync-3.5.001-release.tar
tar -xvf xasecure-uxugsync-3.5.001-release.tar
tar -xvf xasecure-policymgr-3.5.001-release.tar
```

- Configure/install policymgr
```
cd /tmp/xasecure/xasecure-policymgr-3.5.001-release
vi install.properties
```
- No changes needed: just confirm the below are set this way:
```
authentication_method=NONE
remoteLoginEnabled=true
authServiceHostName=localhost
authServicePort=5151
```

- Start Ranger Admin
```
./install.sh
#enter hortonworks for the passwords
#You should see "XAPolicyManager has started successfully"
```

- Install user/groups sync agent (ugsync)
```
yum install ranger-usersync
#to uninstall: yum remove ranger_2_2_0_0_2041-usersync ranger-usersync
```
- Configure ugsync to pull users from LDAP
```
cd /tmp/xasecure/xasecure-uxugsync-3.5.001-release
vi install.properties

POLICY_MGR_URL = http://sandbox.hortonworks.com:6080
SYNC_SOURCE = ldap
SYNC_LDAP_URL = ldap://ipa.hortonworks.com:389
SYNC_LDAP_BIND_DN = uid=admin,cn=users,cn=accounts,dc=hortonworks,dc=com
SYNC_LDAP_BIND_PASSWORD = hortonworks
SYNC_LDAP_USER_SEARCH_BASE = cn=users,cn=accounts,dc=hortonworks,dc=com
SYNC_LDAP_USER_NAME_ATTRIBUTE = uid
```

- Install the service
```
./install.sh
```

- Start the service
```
./start.sh
```
- confirm Agent/Ranger started
```
ps -ef | grep UnixAuthenticationService
ps -ef|grep proc_ranger
```

- Open log file to confirm agent was able to import users/groups from LDAP
```
tail -f /var/log/uxugsync/unix-auth-sync.log
```

- Open WebUI and login as admin/admin. Your LDAP users and groups should appear in the Ranger UI, under Users/Groups

http://sandbox.hortonworks.com:6080

---------------------







#####  Setup HDFS repo in Ranger

- In the Ranger UI, under PolicyManager tab, click the + sign next to HDFS and enter below (most values come from HDFS configs in Ambari):
```
Repository name: hdfs_sandbox
Username: xapolicymgr
Password: hortonworks
fs.default.name: hdfs://sandbox.hortonworks.com:8020
hadoop.security.authorization: true
hadoop.security.authentication: kerberos
hadoop.security.auth_to_local: (copy from HDFS configs)
dfs.datanode.kerberos.principal: dn/_HOST@HORTONWORKS.COM
dfs.namenode.kerberos.principal: nn/_HOST@HORTONWORKS.COM
dfs.secondary.namenode.kerberos.principal: nn/_HOST@HORTONWORKS.COM
Common Name For Certificate: (leave this empty)
```

- Make sure mysql connection works before setting up HDFS plugin
```
mysql -u xalogger -phortonworks -h localhost xasecure
```

- Setup Ranger HDFS plugin


**Note: if this were a multi-node cluster, you would run these steps on the host running the NameNode**

```
cd /tmp/xasecure
wget http://public-repo-1.hortonworks.com/HDP-LABS/Projects/XA-Secure/3.5.001/xasecure-hadoop-3.5.001-release.tar
tar -xvf xasecure-hadoop-3.5.001-release.tar
cd xasecure-hadoop-3.5.001-release
vi install.properties

POLICY_MGR_URL=http://sandbox.hortonworks.com:6080
REPOSITORY_NAME=hdfs_sandbox
XAAUDIT.DB.HOSTNAME=localhost
XAAUDIT.DB.DATABASE_NAME=xasecure
XAAUDIT.DB.USER_NAME=xalogger
XAAUDIT.DB.PASSWORD=hortonworks
```
- Start agent
```
./install.sh
```

- Edit HDFS settings via Ambari, under HDFS > Configs :
```
dfs.permissions.enabled = true
```

- Before restarting HDFS add below snippet to bottom of the file to start the Hadoop Security Agent with the NameNode service::
```
vi /usr/lib/hadoop/libexec/hadoop-config.sh
if [ -f ${HADOOP_CONF_DIR}/xasecure-hadoop-env.sh ]
then
 . ${HADOOP_CONF_DIR}/xasecure-hadoop-env.sh
fi
```

- Restart HDFS via Ambari

- Create an HDFS dir and attempt to access it before/after adding userlevel Ranger HDFS policy
```
#run as root
su hdfs -c "hdfs dfs -mkdir /rangerdemo"
su hdfs -c "hdfs dfs -chmod 700 /rangerdemo"
```

- Notice the HDFS agent should show up in Ranger UI under Audit > Agents. Also notice that under Audit > Big Data tab you can see audit trail of what user accessed HDFS at what time with what result


##### HDFS Audit Exercises in Ranger:
```
su ali
hdfs dfs -ls /rangerdemo
#should fail saying "Failed to find any Kerberos tgt"
klist
kinit
#enter hortonworks as password. You may need to enter this multiple times if it asks you to change it
hdfs dfs -ls /rangerdemo
#this should fail with "Permission denied"
```
- Notice the audit report and filter on "REPOSITORY TYPE"="HDFS" and "USER"="ali" to see the how denied request was logged

- Add policy in Ranger and PolicyManager > hdfs_sandbox > Add new policy:
```
Resource path: /rangerdemo
Recursive: True
User: ali and give read, write, execute
Save > OK and wait 30s
```
- now this should succeed
```
hdfs dfs -ls /rangerdemo
```
- Now look at the audit reports for the above and filter on "REPOSITORY TYPE"="HDFS" and "USER"="ali" to see the how allowed request was logged

- Attempt to access dir *before* adding group level Ranger HDFS policy
```
su hr1
hdfs dfs -ls /rangerdemo
#should fail saying "Failed to find any Kerberos tgt"
klist
kinit
#enter hortonworks as password. You may need to enter this multiple times if it asks you to change it
hdfs dfs -ls /rangerdemo
#this should fail with "Permission denied". View the audit page for the new activity
```

- Add hr group to existing policy in Ranger
Under Policy Manager tab, click "/rangerdemo" link
under group add "hr" and give read, write, execute
Save > OK and wait 30s. While you wait you can review the summary of policies under Analytics tab

-  Attempt to access dir *after* adding group level Ranger HDFS policy and this should pass now. View the audit page for the new activity
```
hdfs dfs -ls /rangerdemo
```

- Even though we did not directly grant access to hr1 user, since it is part of hr group it inherited the access.

---------------------










#####  Setup Hive repo in Ranger

- In Ambari, add admins group and restart HDFS
hadoop.proxyuser.hive.groups: users, hr, admins


- In the Ranger UI, under PolicyManager tab, click the + sign next to Hive and enter below to create a Hive repo:

```
Repository name= hive_sandbox
username= xapolicymgr
password= hortonworks
jdbc.driverClassName= org.apache.hive.jdbc.HiveDriver
jdbc.url= jdbc:hive2://sandbox:10000/
Click Add
```
- install Hive plugin

**Note: if this were a multi-node cluster, you would run these steps on the host running Hive**

```
cd /tmp/xasecure
wget http://public-repo-1.hortonworks.com/HDP-LABS/Projects/XA-Secure/3.5.001/xasecure-hive-3.5.001-release.tar
tar -xvf xasecure-hive-3.5.001-release.tar
cd xasecure-hive-3.5.001-release
vi install.properties

POLICY_MGR_URL=http://sandbox.hortonworks.com:6080
REPOSITORY_NAME=hive_sandbox
XAAUDIT.DB.HOSTNAME=localhost
XAAUDIT.DB.DATABASE_NAME=xasecure
XAAUDIT.DB.USER_NAME=xalogger
XAAUDIT.DB.PASSWORD=hortonworks
```

- Start Hive plugin
```
./install.sh
```

- Replace the contents of this file with the below
```
vi /var/lib/ambari-server/resources/stacks/HDP/2.0.6/services/HIVE/package/templates/startHiveserver2.sh.j2

HIVE_SERVER2_OPTS="  -hiveconf hive.log.file=hiveserver2.log -hiveconf hive.log.dir=$5 -hiveconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator "
{% if hive_authorization_enabled == True and str(hdp_stack_version).startswith('2.1') %}
# HiveServer 2 -hiveconf options
#HIVE_SERVER2_OPTS="${HIVE_SERVER2_OPTS} -hiveconf hive.security.authenticator.manager=org.apache.hadoop.hive.ql.security.SessionStateUserAuthenticator -hiveconf hive.security.authorization.manager=org.apache.hadoop.hive.ql.security.authorization.plugin.sqlstd.SQLStdHiveAuthorizerFactory "
{% endif %}
HIVE_CONF_DIR=$4 /usr/lib/hive/bin/hiveserver2 -hiveconf hive.metastore.uris=" " ${HIVE_SERVER2_OPTS} > $1 2> $2 &
echo $!|cat>$3
```
- Restart Ambari agents
```
/etc/init.d/ambari-server stop
/etc/init.d/ambari-server start

/etc/init.d/ambari-agent stop
/etc/init.d/ambari-agent start
```

- Copy Ranger files to /etc/hive/conf
```
cd /etc/hive/conf.server/
cp xasecure-* ../conf/
```
- Make hive config changes and restart ambari
```
hive.security.authorization.manager = com.xasecure.authorization.hive.authorizer.XaSecureAuthorizer
hive.security.authorization.enabled = true
hive.exec.pre.hooks = org.apache.hadoop.hive.ql.hooks.ATSHook,com.xasecure.authorization.hive.hooks.XaSecureHivePreExecuteRunHook
hive.exec.post.hooks = org.apache.hadoop.hive.ql.hooks.ATSHook,com.xasecure.authorization.hive.hooks.XaSecureHivePostExecuteRunHook

#add to Custom hive-site.xml
hive.semantic.analyzer.hook = com.xasecure.authorization.hive.hooks.XaSecureSemanticAnalyzerHook
hive.server2.custom.authentication.class = com.xasecure.authentication.hive.LoginNameAuthenticator
hive.conf.restricted.list = hive.exec.driver.run.hooks, hive.server2.authentication, hive.metastore.pre.event.listeners, hive.security.authorization.enabled,hive.security.authorization.manager, hive.semantic.analyzer.hook, hive.exec.post.hooks
```

- Now restart Hive from ambari. If Hive fails to start due to metastore not coming up click on Hive > Summary > MysqlServer > Start MySql server

- You may also need to start data node if it went down (Ambari > HDFS > Service Action > Restart Data Nodes)

- Restart Hive once again as it did not cleanly restart

- Restart hue to make it aware of Ranger changes
```
service hue restart
```

- As an LDAP user, perform some Hive activity
```
su ali
kinit
#kinit: Client not found in Kerberos database while getting initial credentials
kinit ali
#hortonworks

beeline
!connect jdbc:hive2://sandbox.hortonworks.com:10000/default;principal=hive/sandbox.hortonworks.com@HORTONWORKS.COM
#hit enter twice
use default;
```
- Check Audit > Agent in Ranger policy manager UI to ensure Hive agent shows up now

- Restart hue to make it aware of Ranger changes
```
service hue restart
```

#####  Hive Audit Exercises in Ranger


- create user dir for your LDAP user e.g. ali
```
su  hdfs -c "hdfs dfs -mkdir /user/ali"
su hdfs -c "hdfs dfs -chown ali /user/ali"
```

- Sign out of Hue and sign back in as ali/hortonworks

- Run the below queries using the Beeswax Hue interface or beeline
```
show tables;
use default;
```
- Check Audit > Agent in Ranger policy manager UI to ensure Hive agent shows up now

- Create hive policies in Ranger for user ali
```
db name: default
table: sample_07
col name: code description
user: ali and check "select"
Add
```

```
db name: default
table: sample_08
col name: *
user: ali and check "select"
Add
```
- Save and wait 30s. You can review the hive policies in Ranger UI under Analytics tabs

- these will not work as user does not have access to all columns of sample_07
```
desc sample_07;
select * from sample_07 limit 1;
```
- these should work
```
select code,description from sample_07 limit 1;
desc sample_08;
select * from sample_08 limit 1;
```

- Now look at the audit reports for the above and notice that audit reports for Beeswax queries show up in Ranger


- Create hive policies in Ranger for group legal
```
db name: default
table: sample_08
col name: code description
group: legal and check “select”
Add
```

- Save and wait 30s

- create user dir for legal1
```
su hdfs -c "hdfs dfs -mkdir /user/legal1"
su hdfs -c "hdfs dfs -chown legal1 /user/legal1"
```

- This time lets try running the queries via Beeline interface
```
su legal1
klist
kinit
beeline
!connect jdbc:hive2://sandbox.hortonworks.com:10000/default;principal=hive/sandbox.hortonworks.com@HORTONWORKS.COM
#Hit enter twice when it prompts for password
```

- these should not work: "user does not have select priviledge"
```
desc sample_08;
select * from sample_08;
```

- these should work
```
select code,description from sample_08 limit 5;
```

- Now look at the audit reports for the above and notice that audit reports for beeline queries show up in Ranger

---------------------


#####  Setup HBase repo in Ranger

- Start HBase using Ambari

**Note: if this were a multi-node cluster, you would run these steps on the host running HBase**

- **TODO: add HBase plugin config steps**

#####  HBase audit exercises in Ranger
```
su ali
klist
hbase shell
list 'default'
create 't1', 'f1'
#ERROR: org.apache.hadoop.hbase.security.AccessDeniedException: Insufficient permissions for user 'ali/sandbox.hortonworks.com@HORTONWORKS.COM (auth:KERBEROS)' (global, action=CREATE)
```
---------------------

- Using Ranger, we have successfully added authorization policies and audit reports to our secure cluster from a central location  |

IPA_KNOX_HDP_2.2

## Enable Perimeter Security: Enable Knox to work with kerberos enabled cluster to enable perimeter security using single end point

- Goals:
  - Configure KNOX to authenticate against FreeIPA
  - Configure WebHDFS & Hiveserver2 to support HDFS & JDBC/ODBC access over HTTP
  - Use Excel to securely access Hive via KNOX

- Why?
  - Enables Perimeter Security so there is a single point of cluster access using Hadoop REST APIs, JDBC and ODBC calls

- Contents
  - [Pre-requisite steps](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#pre-requisite-steps)
  - [Setup Knox repo](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#setup-knox-repo)
  - [Knox WebHDFS audit exercises in Ranger](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#knox-webhdfs-audit-exercises-in-ranger)
  - [Setup Hive to go over Knox](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#setup-hive-to-go-over-knox)
  - [Knox exercises to check Hive setup](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#knox-exercises-to-check-hive-setup)
  - [Download data over HTTPS via Knox/Hive](https://github.com/abajwa-hw/security-workshops/blob/master/Setup-knox-22.md#download-data-over-https-via-knoxhive)

**Note: if this were a multi-node cluster, you would run these steps on the host running Knox**

###### Configure Knox to use IPA

- Add the below to HDFS config via Ambari and restart HDFS:
```
hadoop.proxyuser.knox.groups = *
hadoop.proxyuser.knox.hosts = sandbox.hortonworks.com
```

- Start Knox using Ambari (it comes pre-installed with HDP 2.2)

- Try out a WebHDFS request. The guest user is defined in the demo LDAP that Knox comes with which is why this works.
```
curl -iv -k -u guest:guest-password https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
```

- Confirm that the demo LDAP has this user by going to Ambari > Knox > Config > Advanced users-ldif
![Image](../master/screenshots/knox-default-ldap.png?raw=true)

- To configure Knox to use IPA LDAP instead of the demo one, in Ambari, under Knox > Configs > Advanced Topology:
  - First, modify the below ```<value>```entries:
  ```                    
                    <param>
                        <name>main.ldapRealm.userDnTemplate</name>
                        <value>uid={0},cn=users,cn=accounts,dc=hortonworks,dc=com</value>
                    </param>
                     <param>
                        <name>main.ldapRealm.contextFactory.url</name>
                       <value>ldap://ldap.hortonworks.com:389</value>
                    </param>                    
  ```
  - Then, add these params directly under the above params (before the ```</provider>``` tag):
  ```                  
                    <param>
                        <name>main.ldapRealm.authorizationEnabled</name>
                        <value>true</value>
                    </param>
                    <param>
                        <name>main.ldapRealm.searchBase</name>
                        <value>cn=groups,cn=accounts,dc=hortonworks,dc=com</value>
                    </param>        
                    <param>
                        <name>main.ldapRealm.memberAttributeValueTemplate</name>
                        <value>uid={0},cn=users,cn=accounts,dc=hortonworks,dc=com</value>
                    </param>
  ```
- Restart Knox via Ambari

- Re-try the WebHDFS request. After the above change we can pass in user credentials from IPA.
```
curl -iv -k -u ali:hortonworks https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
```

- Notice the guest user no longer works because we did not create it in IPA
```
curl -iv -k -u guest:guest-password https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
```
- Next lets setup Ranger plugin for Knox

###### Pre-requisite steps

- Export certificate to ~/knox.crt
```
cd /var/lib/knox/data/security/keystores
keytool -exportcert -alias gateway-identity -keystore gateway.jks -file ~/knox.crt
#hit enter
```

- Import ~/knox.crt
```
cd ~
. /etc/ranger/admin/conf/java_home.sh

cp $JAVA_HOME/jre/lib/security/cacerts cacerts.withknox
keytool -import -trustcacerts -file knox.crt   -alias knox  -keystore cacerts.withknox
#Enter "changeit" as password
#Type yes
```
- Copy cacerts.withknox to ranger conf dir
```
cp cacerts.withknox /etc/ranger/admin/conf
```

- vi /etc/ranger/admin/conf/ranger-admin-env-knox_cert.sh
```
#!/bin/bash                                                                                  
certs_with_knox=/etc/ranger/admin/conf/cacerts.withknox
export JAVA_OPTS="$JAVA_OPTS -Djavax.net.ssl.trustStore=${certs_with_knox}"
```

- Restart service
```
chmod +x /etc/ranger/admin/conf/ranger-admin-env-knox_cert.sh
service ranger-admin stop
service ranger-admin start
```

- verify that javax.net.ssl.trustStore property was applied
```
ps -ef | grep proc_rangeradmin
```
###### Setup Knox repo


- In the Ranger UI, under PolicyManager tab, click the + sign next to Hbase and enter below to create a Hbase repo:

```
Repository Name: knox_sandbox
Username: rangeradmin@HORTONWORKS.COM
Password: hortonworks
knox.url= https://sandbox.hortonworks.com:8443/gateway/admin/api/v1/topologies/
```
![Image](../master/screenshots/ranger-knox-setup.png?raw=true)

- Click Test (its ok if it gives an error). Then add the repository.

- Install Knox plugin

```
cd /usr/hdp/2.2.0.0-2041/ranger-knox-plugin
vi install.properties

POLICY_MGR_URL=http://sandbox.hortonworks.com:6080
REPOSITORY_NAME=knox_sandbox

XAAUDIT.DB.IS_ENABLED=true
XAAUDIT.DB.FLAVOUR=MYSQL
XAAUDIT.DB.HOSTNAME=localhost
XAAUDIT.DB.DATABASE_NAME=ranger_audit
XAAUDIT.DB.USER_NAME=rangerlogger
XAAUDIT.DB.PASSWORD=hortonworks
```

- Enable Ranger Knox plugin
```
./enable-knox-plugin.sh
```

- To enable Ranger Knox plugin, in Ambari, under Knox > Configs > Advanced Topology, add the below under ```<gateway>```
```
<provider>
<role>authorization</role>
        <name>XASecurePDPKnox</name>
        <enabled>true</enabled>
</provider>
```

- Restart Knox via Ambari

- Find out your topology name e.g. default
```
ls /etc/knox/conf/topologies/*.xml
```

#####  Knox WebHDFS audit exercises in Ranger

- Submit a WebHDFS request to the topology using curl (replace default with your topology name)
```
curl -iv -k -u ali:hortonworks https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
curl -iv -k -u paul:hortonworks https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
```

-These should result in HTTP 403 error and should show up as Denied results in Ranger Audit
![Image](../master/screenshots/ranger-knox-denied.png?raw=true)

- Add policy in Ranger PolicyManager > hdfs_knox > Add new policy
  - Policy name: test
  - Topology name: default
  - Service name: WEBHDFS
  - Group permissions: sales and check Allow
  - User permissions: ali and check Allow
  - Save > OK
  - ![Image](../master/screenshots/ranger-knox-policy.png?raw=true)
 
- While waiting 30s for the policy to be activated, review the Analytics tab
![Image](../master/screenshots/ranger-knox-analytics.png?raw=true)

- Re-run the WebHDFS request and notice this time it succeeds
```
curl -iv -k -u ali:hortonworks https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
curl -iv -k -u paul:hortonworks https://sandbox.hortonworks.com:8443/gateway/default/webhdfs/v1/?op=LISTSTATUS
```
![Image](../master/screenshots/ranger-knox-allowed.png?raw=true)

#####  Setup Hive to go over Knox

- In Ambari, under Hive > Configs > set the below and restart Hive component. Note that in this mode you will not be able to run queries through Hue
```
hive.server2.transport.mode = http
```
- give users access to jks file. This is ok since it is only truststore - not keys!
```
chmod a+rx /var/lib/knox
chmod a+rx /var/lib/knox
chmod a+rx /var/lib/knox/data
chmod a+rx /var/lib/knox/data/security
chmod a+rx /var/lib/knox/data/security/keystores
chmod a+r /var/lib/knox/data/security/keystores/gateway.jks
```

#### Knox exercises to check Hive setup

- Run beehive query connecting through Knox. Note that the beeline connect string is different for connecting via Knox
```
su ali
beeline
!connect jdbc:hive2://sandbox.hortonworks.com:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox;transportMode=http;httpPath=gateway/default/hive
#enter ali/hortonworks
!q
```
- This fails with HTTP 403. On reviewing the attempt in Ranger Audit, we see that the request was denied
![Image](../master/screenshots/ranger-knox-hive-denied.png?raw=true)

- To fix this, we can add a Knox policy in Ranger:
  - Policy name: knox_hive
  - Topology name: default
  - Service name: HIVE
  - User permissions: ali and check Allow
  - Click Add
  - ![Image](../master/screenshots/ranger-knox-hive-policy.png?raw=true)
 
- Review the Analytics tab while waiting 30s for the policy to take effect.
![Image](../master/screenshots/ranger-knox-hive-analytics.png?raw=true)

- Now re-run the connect command above and run some queries:
```
su ali
beeline
!connect jdbc:hive2://sandbox.hortonworks.com:8443/;ssl=true;sslTrustStore=/var/lib/knox/data/security/keystores/gateway.jks;trustStorePassword=knox;transportMode=http;httpPath=gateway/default/hive
#enter ali/hortonworks

#these should pass
desc sample_08;
select * from sample_08;
select code, description from sample_07;

#these should fail
desc sample_07;
select * from sample_07;

!q
```

#### Download data over HTTPS via Knox/Hive

- On windows machine, install Hive ODBC driver from http://hortonworks.com/hdp/addons and setup ODBC connection
  - name: securedsandbox
  - host:<sandboxIP>
  - port:8443
  - database:default
  - Hive server type: Hive Server 2
  - Mechanism: HTTPS
  - HTTP Path: gateway/default/hive
  - Username: ali
  - pass: hortonworks
  - ![Image](../master/screenshots/ODBC-knox-hive.png?raw=true)
 
- In Excel import data via Knox by navigating to:
  - Data tab
  - From other Datasources
  - From dataconnection wizard
  - ODBC DSN
  - ODBC name (e.g. securedsandbox)
  - enter password hortonworks and ok
  - choose sample_07 and Finish
  - Click Yes
  - Properties
  - Definition
  - you can change the query in the text box
  - OK
  - OK

- Notice in the Knox repository Ranger Audit shows the HIVE access was allowed
![Image](../master/screenshots/ranger-knox-hive-allowed.png?raw=true)

- With this we have shown how HiveServer2 can transport data over HTTPS using Knox for existing users defined in enterprise LDAP, without them having to request kerberos ticket. Also authorization and audit of such transactions can be done via Ranger

- For more info on Knox you can refer to the doc: http://knox.apache.org/books/knox-0-5-0/knox-0-5-0.html

IPA_KERBEROS_SETUP_HDP2.2

Kerberos configuration at IPA Client:
-----------------------------------------------
- check the whether krb5.conf is updated by ipa-client

- Login to Ambari (if server is not started, execute /root/start_ambari.sh) by opening http://ec2-54-172-53-173.compute-1.amazonaws.com:8080 and then
  - Admin -> Security-> click “Enable Security”
  - On "get started” page, click Next
  - On “Configure Services”, click Next to accept defaults
  - On “Create Principals and Keytabs”, click “Download CSV”. Save to sandbox by “vi /root/sanbox-principal-keytab-list.csv" and pasting the content
  - Without pressing “Apply", go back to terminal

- Edit host-principal-keytab-list.csv and and move entry containing 'rm.service.keytab' to top of the file. Also add hue and knox principal at the end, making sure no empty lines at the end
  ```
  Add to host-principal-keytab-list.csv
ec2-54-172-53-173.compute-1.amazonaws.com,Hue,hue/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM,hue.service.keytab,/etc/security/keytabs,hue,hadoop,400
 
On IPA Server
-------------------------------------------
  for i in `awk -F"," '/service/ {print $3}' host-principal-keytab-list.csv` ; do ipa service-add $i ; done
  ipa user-add hdfs  --first=HDFS --last=HADOOP --homedir=/var/lib/hadoop-hdfs --shell=/bin/bash
  ipa user-add ambari-qa  --first=AMBARI-QA --last=HADOOP --homedir=/home/ambari-qa --shell=/bin/bash
  ipa user-add storm  --first=STORM --last=HADOOP --homedir=/home/storm --shell=/bin/bash


On IPA_NN
-------------------------------------------
awk -F"," '/ec2-54-172-53-173.compute-1.amazonaws.com/ {print "ipa-getkeytab -s ec2-54-86-17-4.compute-1.amazonaws.com -p "$3" -k /etc/security/keytabs/"$4";chown "$6":"$7" /etc/security/keytabs/"$4";chmod "$8" /etc/security/keytabs/"$4}' host-principal-keytab-list.csv | sort -u > gen_keytabs_NN.sh
chmod +x gen_keytabs_NN.sh

mkdir -p /etc/security/keytabs/
chown root:hadoop /etc/security/keytabs/
./gen_keytabs_NN.sh
chmod 440 /etc/security/keytabs/hue.service.keytab

Copy the below keytabs to all the datanodes (i.e IPA_DN)
hdfs.headless.keytab
smokeuser.headless.keytab
storm.service.keytab

On IPA_DN
---------------------------------------------
awk -F"," '/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM/ {print "ipa-getkeytab -s ec2-54-86-17-4.compute-1.amazonaws.com -p "$3" -k /etc/security/keytabs/"$4";chown "$6":"$7" /etc/security/keytabs/"$4";chmod "$8" /etc/security/keytabs/"$4}' host-principal-keytab-list.csv | sort -u >> gen_keytabs_DN.sh
chmod +x gen_keytabs_DN.sh

mkdir -p /etc/security/keytabs/
chown root:hadoop /etc/security/keytabs/
./gen_keytabs_DN.sh
-----------------------------------------------

-List the ketabs
ls -la /etc/security/keytabs/*.keytab | wc -l
```

- check that keytab info can be accessed by klist
klist -ekt /etc/security/keytabs/nn.service.keytab
```

- verify you can kinit as hadoop components. This should not return any errors
@Node1
[root@IPA_NN ~]$ kinit -V -kt /etc/security/keytabs/dn.service.keytab dn/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: dn/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/dn.service.keytab
Authenticated to Kerberos v5
[root@IPA_NN ~]$  kinit -V -kt /etc/security/keytabs/dn.service.keytab dn/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: dn/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/dn.service.keytab
kinit: Keytab contains no suitable keys for dn/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM while getting initial credentials

@Node2
[root@IPA_DN ~]$ kinit -V -kt /etc/security/keytabs/dn.service.keytab dn/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: dn/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/dn.service.keytab
kinit: Keytab contains no suitable keys for dn/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM while getting initial credentials
[root@IPA_DN ~]$  kinit -V -kt /etc/security/keytabs/dn.service.keytab dn/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: dn/ec2-54-173-54-193.compute-1.amazonaws.com@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/dn.service.keytab
Authenticated to Kerberos v5

[root@IPA_NN ~]$ kinit -V -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: hdfs@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/hdfs.headless.keytab
Authenticated to Kerberos v5


[root@IPA_DN ~]$  kinit -V -kt /etc/security/keytabs/hdfs.headless.keytab hdfs@AMAZONAWS.COM
Using default cache: /tmp/krb5cc_0
Using principal: hdfs@AMAZONAWS.COM
Using keytab: /etc/security/keytabs/hdfs.headless.keytab
Authenticated to Kerberos v5
```
- Click Apply in Ambari to enable security and restart all the components

If the wizard errors out towards the end due to a component not starting up, its not a problem: you should be able to start it up manually via Ambari

```
- Install Hue
```

1. Modify/Add the below mention properties.

Ambari-->HDFS-->Config-->hdfs-site

<property>
  <name>dfs.webhdfs.enabled</name>
  <value>true</value>
</property>
Modify the core-site.xml file.

Ambari-->HDFS-->Config-->core-site

<property>
  <name>hadoop.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>
  <name>hadoop.proxyuser.hue.groups</name>
  <value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.groups</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.hcat.hosts</name>
<value>*</value>
</property>

Ambari-->Hive-->Config-->webhcat-site

<property>
  <name>webhcat.proxyuser.hue.hosts</name>  
  <value>*</value>
</property>
<property>  
  <name>webhcat.proxyuser.hue.groups</name>
  <value>*</value>
</property>

Ambari-->Oozie-->Config-->oozie-site

<property>
  <name>oozie.service.ProxyUserService.proxyuser.hue.hosts</name>
  <value>*</value>
</property>
<property>  
  <name>oozie.service.ProxyUserService.proxyuser.hue.groups</name>
  <value>*</value>
</property>

2. restart the all services (HDFS, MapReduce,Yarn ,Oozie and Hive )

3. installing hue
yum install hue

4. Changes in hue.ini
vi /etc/hue/conf/hue.ini

 # Webserver listens on this address and port
  http_host=ec2-54-172-53-173.compute-1.amazonaws.com
  http_port=8888

 [[hdfs_clusters]]

    [[[default]]]
      # Enter the filesystem uri
      fs_defaultfs=hdfs:/ec2-54-172-53-173.compute-1.amazonaws.com:8020

      # Use WebHdfs/HttpFs as the communication mechanism. To fallback to
      # using the Thrift plugin (used in Hue 1.x), this must be uncommented
      # and explicitly set to the empty value.
      webhdfs_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:50070/webhdfs/v1/

       security_enabled=true

  [[yarn_clusters]]

    [[[default]]]
      # Whether to submit jobs to this cluster
      submit_to=true

       security_enabled=true

      # Resource Manager logical name (required for HA)
      ## logical_name=

      # URL of the ResourceManager webapp address (yarn.resourcemanager.webapp.address)
      resourcemanager_api_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:8088

      # URL of Yarn RPC adress (yarn.resourcemanager.address)
      resourcemanager_rpc_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:8050

      # URL of the ProxyServer API
      proxy_api_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:8088

      # URL of the HistoryServer API
      history_server_api_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:19888

      # URL of the NodeManager API
      node_manager_api_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:8042

  [liboozie]
  # The URL where the Oozie service runs on. This is required in order for
  # users to submit jobs.
  oozie_url=http://ec2-54-173-54-193.compute-1.amazonaws.com:11000/oozie

  security_enabled=true

  [beeswax]

  # Host where Hive server Thrift daemon is running.
  # If Kerberos security is enabled, use fully-qualified domain name (FQDN).
  hive_server_host=ec2-54-172-53-173.compute-1.amazonaws.com

  # Port where HiveServer2 Thrift server runs on.
  hive_server_port=10000
 
  [hcatalog]
  templeton_url=http://ec2-54-172-53-173.compute-1.amazonaws.com:50111/templeton/v1/
  security_enabled=true


5. Hue config changes needed to make Hue work on a LDAP-enbled, kerborized cluster

Goals:

Kerberos enable Hue and integrate it with FreeIPAs directory
Now that kerberos has been enabled on the sandbox VM and LDAP has also been setup, we can configure Hue to for this configuration

Edit the kerberos principal to hadoop user mapping to add Hue Under Ambari > HDFS > Configs > hadoop.security.auth_to_local, add hue entry below above DEFAULT. If the other entries are missing, add them too:

        RULE:[2:$1@$0]([rn]m@.*)s/.*/yarn/
        RULE:[2:$1@$0](jhs@.*)s/.*/mapred/
        RULE:[2:$1@$0]([nd]n@.*)s/.*/hdfs/
        RULE:[2:$1@$0](hm@.*)s/.*/hbase/
        RULE:[2:$1@$0](rs@.*)s/.*/hbase/
        RULE:[2:$1@$0](hue/ec2-54-172-53-173.compute-1.amazonaws.com@.*AMAZONAWS.COM)s/.*/hue/      
        DEFAULT      
allow hive to impersonate users from whichever LDAP groups you choose
hadoop.proxyuser.hive.groups = users, sales, legal, admins

( note : * for all user group)
restart HDFS via Ambari


Edit /etc/hue/conf/hue.ini by uncommenting/changing properties to make it kerberos aware
NOTE: Update the below properties in their respective sections/blocks.
Change all instances of "security_enabled" to true

 [[kerberos]]

    # Path to Hue's Kerberos keytab file
     hue_keytab=/etc/security/keytabs/hue.service.keytab

    # Kerberos principal name for Hue
     hue_principal=hue/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM

    # Path to kinit
     kinit_path=/usr/bin/kinit

    ## Frequency in seconds with which Hue will renew its keytab. Default 1h.
     reinit_frequency=3600

    ## Path to keep Kerberos credentials cached.
     ccache_path=/tmp/hue_krb5_ccache
 



Make changes to /etc/hue/conf/hue.ini to set backend to LDAP:
NOTE: Update the below properties in their respective sections/blocks.
backend=desktop.auth.backend.LdapBackend
pam_service=login
base_dn="DC=amazonaws,DC=com"
ldap_url=ldap://ec2-54-86-17-4.compute-1.amazonaws.com
ldap_username_pattern="uid=<username>,cn=users,cn=accounts,dc=amazonaws,dc=com"
create_users_on_login=true
user_filter="objectclass=person"
user_name_attr=uid
group_filter="objectclass=*"
group_name_attr=cn
Restart Hue
````

- Access HDFS as Hue user
```
su - hue
#Create a kerberos ticket for the user
kinit -kt /etc/security/keytabs/hue.service.keytab hue/ec2-54-172-53-173.compute-1.amazonaws.com@AMAZONAWS.COM
#verify that hue user can now get ticket and can access HDFS
klist
hadoop fs -ls /user
should get the results
exit
```````````

NOTE: for the error
Fail: Execution of 'hadoop --config /etc/hadoop/conf fs -mkdir `rpm -q hadoop | grep -q "hadoop-1" || echo "-p"` /app-logs /mapred /mapred/system /mr-history/tmp /mr-history/done && hadoop --config /etc/hadoop/conf fs -chmod -R 777 /app-logs && hadoop --config /etc/hadoop/conf fs -chmod  777 /mr-history/tmp && hadoop --config /etc/hadoop/conf fs -chmod  1777 /mr-history/done && hadoop --config /etc/hadoop/conf fs -chown  mapred /mapred && hadoop --config /etc/hadoop/conf fs -chown  hdfs /mapred/system && hadoop --config /etc/hadoop/conf fs -chown  yarn:hadoop /app-logs && hadoop --config /etc/hadoop/conf fs -chown  mapred:hadoop /mr-history/tmp /mr-history/done' returned 1. 15/03/12 08:25:56 WARN ipc.Client: Exception encountered while connecting to the server : javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
mkdir: Failed on local exception: java.io.IOException: javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]; Host Details : local host is: "ec2-54-172-53-173.compute-1.amazonaws.com/172.31.8.33"; destination host is: "ec2-54-172-53-173.compute-1.amazonaws.com":8020;

Solution:
Download Java Cryptography Extension (JCE) Unlimited Strength Jurisdiction Policy Files JDK7
http://www.oracle.com/technetwork/java/javase/downloads/jce-7-download-432124.html

copy the local_policy.jar and US_export_policy.jar to $JAVA_HOME/jre/lib/security/

IPA_SERVER_AND_CLIENT_SETUP_HDP2.2

IPA: Identity Policy and Accounting

IPA Server:
#####################################

Turn off firewall
------------------------------
service iptables save
service iptables stop
chkconfig iptables off

Host Configuration:
---------------------------
/etc/hosts
Add the host entry and domain name
127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
172.31.43.121    ec2-54-86-17-4.compute-1.amazonaws.com          IPA

/etc/sysconfig/network
Add the hostname entry to it
HOSTNAME=ec2-54-86-17-4.compute-1.amazonaws.com


Apply OS updates
------------------------------
yum -y update

Installation:
---------------------------
yum -y install ipa-server

DNS setup
-------------------------
Install the below packages
yum -y install bind
yum -y install bind-dyndb-ldap

starting dns setup
ipa-server-install --setup-dns

The log file for this installation can be found in /var/log/ipaserver-install.log
==============================================================================
This program will set up the IPA Server.

This includes:
  * Configure a stand-alone CA (dogtag) for certificate management
  * Configure the Network Time Daemon (ntpd)
  * Create and configure an instance of Directory Server
  * Create and configure a Kerberos Key Distribution Center (KDC)
  * Configure Apache (httpd)
  * Configure DNS (bind)

To accept the default shown in brackets, press the Enter key.

Existing BIND configuration detected, overwrite? [no]: yes
Enter the fully qualified domain name of the computer
on which you're setting up server software. Using the form
<hostname>.<domainname>
Example: master.example.com.


Server host name [ec2-54-86-17-4.compute-1.amazonaws.com]: ec2-54-86-17-4.compute-1.amazonaws.com

Warning: skipping DNS resolution of host ec2-54-86-17-4.compute-1.amazonaws.com
The domain name has been determined based on the host name.

Please confirm the domain name [compute-1.amazonaws.com]: amazonaws.com

The kerberos protocol requires a Realm name to be defined.
This is typically the domain name converted to uppercase.

Please provide a realm name [AMAZONAWS.COM]: AMAZONAWS.COM
Certain directory server operations require an administrative user.
This user is referred to as the Directory Manager and has full access
to the Directory for system management tasks and will be added to the
instance of directory server created for IPA.
The password must be at least 8 characters long.

Directory Manager password:      manager123
Password (confirm):              manager123

The IPA server requires an administrative user, named 'admin'.
This user is a regular system account used for IPA server administration.

IPA admin password: admin123
Password (confirm): admin123

Do you want to configure DNS forwarders? [yes]: yes
Enter the IP address of DNS forwarder to use, or press Enter to finish.
Enter IP address for a DNS forwarder: 8.8.8.8
DNS forwarder 8.8.8.8 added
Enter IP address for a DNS forwarder:
Do you want to configure the reverse zone? [yes]: yes
Please specify the reverse zone name [43.31.172.in-addr.arpa.]: 43.31.172.in-addr.arpa.
Using reverse zone 43.31.172.in-addr.arpa.

The IPA Master Server will be configured with:
Hostname:      ec2-54-86-17-4.compute-1.amazonaws.com
IP address:    172.31.43.121
Domain name:   amazonaws.com
Realm name:    AMAZONAWS.COM

BIND DNS server will be configured to serve IPA domain with:
Forwarders:    8.8.8.8
Reverse zone:  43.31.172.in-addr.arpa.

Continue to configure the system with these values? [no]: yes

The following operations may take some minutes to complete.
Please wait until the prompt is returned.

Configuring NTP daemon (ntpd)
  [1/4]: stopping ntpd
  [2/4]: writing configuration
  [3/4]: configuring ntpd to start on boot
  [4/4]: starting ntpd
Done configuring NTP daemon (ntpd).
Configuring directory server for the CA (pkids): Estimated time 30 seconds
  [1/3]: creating directory server user
  [2/3]: creating directory server instance
  [3/3]: restarting directory server
Done configuring directory server for the CA (pkids).
Configuring certificate server (pki-cad): Estimated time 3 minutes 30 seconds
  [1/21]: creating certificate server user
  [2/21]: creating pki-ca instance
  [3/21]: configuring certificate server instance
  [4/21]: disabling nonces
  [5/21]: creating CA agent PKCS#12 file in /root
  [6/21]: creating RA agent certificate database
  [7/21]: importing CA chain to RA certificate database
  [8/21]: fixing RA database permissions
  [9/21]: setting up signing cert profile
  [10/21]: set up CRL publishing
  [11/21]: set certificate subject base
  [12/21]: enabling Subject Key Identifier
  [13/21]: setting audit signing renewal to 2 years
  [14/21]: configuring certificate server to start on boot
  [15/21]: restarting certificate server
  [16/21]: requesting RA certificate from CA
  [17/21]: issuing RA agent certificate
  [18/21]: adding RA agent as a trusted user
  [19/21]: configure certificate renewals
  [20/21]: configure Server-Cert certificate renewal
  [21/21]: Configure HTTP to proxy connections
Done configuring certificate server (pki-cad).
Configuring directory server (dirsrv): Estimated time 1 minute
  [1/38]: creating directory server user
  [2/38]: creating directory server instance
  [3/38]: adding default schema
  [4/38]: enabling memberof plugin
  [5/38]: enabling winsync plugin
  [6/38]: configuring replication version plugin
  [7/38]: enabling IPA enrollment plugin
  [8/38]: enabling ldapi
  [9/38]: disabling betxn plugins
  [10/38]: configuring uniqueness plugin
  [11/38]: configuring uuid plugin
  [12/38]: configuring modrdn plugin
  [13/38]: enabling entryUSN plugin
  [14/38]: configuring lockout plugin
  [15/38]: creating indices
  [16/38]: enabling referential integrity plugin
  [17/38]: configuring ssl for ds instance
  [18/38]: configuring certmap.conf
  [19/38]: configure autobind for root
  [20/38]: configure new location for managed entries
  [21/38]: restarting directory server
  [22/38]: adding default layout
  [23/38]: adding delegation layout
  [24/38]: adding replication acis
  [25/38]: creating container for managed entries
  [26/38]: configuring user private groups
  [27/38]: configuring netgroups from hostgroups
  [28/38]: creating default Sudo bind user
  [29/38]: creating default Auto Member layout
  [30/38]: adding range check plugin
  [31/38]: creating default HBAC rule allow_all
  [32/38]: Upload CA cert to the directory
  [33/38]: initializing group membership
  [34/38]: adding master entry
  [35/38]: configuring Posix uid/gid generation
  [36/38]: enabling compatibility plugin
  [37/38]: tuning directory server
  [38/38]: configuring directory to start on boot
Done configuring directory server (dirsrv).
Configuring Kerberos KDC (krb5kdc): Estimated time 30 seconds
  [1/10]: adding sasl mappings to the directory
  [2/10]: adding kerberos container to the directory
  [3/10]: configuring KDC
  [4/10]: initialize kerberos container
  [5/10]: adding default ACIs
  [6/10]: creating a keytab for the directory
  [7/10]: creating a keytab for the machine
  [8/10]: adding the password extension to the directory
  [9/10]: starting the KDC
  [10/10]: configuring KDC to start on boot
Done configuring Kerberos KDC (krb5kdc).
Configuring kadmin
  [1/2]: starting kadmin
  [2/2]: configuring kadmin to start on boot
Done configuring kadmin.
Configuring ipa_memcached
  [1/2]: starting ipa_memcached
  [2/2]: configuring ipa_memcached to start on boot
Done configuring ipa_memcached.
Configuring the web interface (httpd): Estimated time 1 minute
  [1/13]: setting mod_nss port to 443
  [2/13]: setting mod_nss password file
  [3/13]: enabling mod_nss renegotiate
  [4/13]: adding URL rewriting rules
  [5/13]: configuring httpd
  [6/13]: setting up ssl
  [7/13]: setting up browser autoconfig
  [8/13]: publish CA cert
  [9/13]: creating a keytab for httpd
  [10/13]: clean up any existing httpd ccache
  [11/13]: configuring SELinux for httpd
  [12/13]: restarting httpd
  [13/13]: configuring httpd to start on boot
Done configuring the web interface (httpd).
Applying LDAP updates
Restarting the directory server
Restarting the KDC
Configuring DNS (named)
  [1/9]: adding DNS container
  [2/9]: setting up our zone
  [3/9]: setting up reverse zone
  [4/9]: setting up our own record
  [5/9]: setting up kerberos principal
  [6/9]: setting up named.conf
  [7/9]: restarting named
  [8/9]: configuring named to start on boot
  [9/9]: changing resolv.conf to point to ourselves
Done configuring DNS (named).

Global DNS configuration in LDAP server is empty
You can use 'dnsconfig-mod' command to set global DNS options that
would override settings in local named.conf files

Restarting the web server
==============================================================================
Setup complete

Next steps:
1. You must make sure these network ports are open:
for port in { 80 443 389 636 88 464 53 123} ;do; netstat -aunt |grep $PORT ;done

TCP Ports:
 * 80, 443: HTTP/HTTPS
 * 389, 636: LDAP/LDAPS
 * 88, 464: kerberos
 * 53: bind
UDP Ports:
 * 88, 464: kerberos
 * 53: bind
 * 123: ntp

2. You can now obtain a kerberos ticket using the command: 'kinit admin'
  This ticket will allow you to use the IPA tools (e.g., ipa user-add)
  and the web user interface.

Be sure to back up the CA certificate stored in /root/cacert.p12
This file is required to create replicas. The password for this
file is the Directory Manager password

-----x---X------------------

chkconfig ipa on



Others
------------------------
/etc/ssh/sshd_config
enable the below parameters
PasswordAuthentication yes
ChallengeResponseAuthentication no
GSSAPIAuthentication yes

Import business users into LDAP
---------------------------------------------
obtain a kerberos ticket for admin user using the admin123 passwords setup earlier

kinit admin

Setup LDAP users, groups, passwords
`````
ipa group-add marketing --desc marketing
ipa group-add legal --desc legal
ipa group-add hr --desc hr
ipa group-add sales --desc sales
ipa group-add finance --desc finance


#Setup LDAP users
ipa user-add  ali --first=ALI --last=BAJWA
ipa user-add  paul --first=PAUL --last=HEARMON
ipa user-add legal1 --first=legal1 --last=legal1
ipa user-add legal2 --first=legal2 --last=legal2
ipa user-add legal3 --first=legal3 --last=legal3
ipa user-add hr1 --first=hr1 --last=hr1
ipa user-add hr2 --first=hr2 --last=hr2
ipa user-add hr3 --first=hr3 --last=hr3
ipa user-add xapolicymgr --first=XAPolicy --last=Manager
ipa user-add rangeradmin --first=Ranger --last=Admin

#Add users to groups
ipa group-add-member sales --users=ali,paul
ipa group-add-member finance --users=ali,paul
ipa group-add-member legal --users=legal1,legal2,legal3
ipa group-add-member hr --users=hr1,hr2,hr3
ipa group-add-member admins --users=xapolicymgr,rangeradmin

#Set passwords for accounts: hortonworks
echo hortonworks >> tmp.txt
echo hortonworks >> tmp.txt

ipa passwd ali < tmp.txt
ipa passwd paul < tmp.txt
ipa passwd legal1 < tmp.txt
ipa passwd legal2 < tmp.txt
ipa passwd legal3 < tmp.txt
ipa passwd hr1 < tmp.txt
ipa passwd hr2 < tmp.txt
ipa passwd hr3 < tmp.txt
ipa passwd xapolicymgr < tmp.txt
ipa passwd rangeradmin < tmp.txt
rm -f tmp.txt
```
- Use JXplorer to browse the LDAP structure we just setup
com->amazonaws->accounts->users
com->amazonaws->accounts->groups

- Click on Paul user and notice attributes. Some important ones:
uiud, uidNumber, posixaccount, person, krbPrincipalName

- Click on hr group and notice attributes. Some important ones:
cn, gidNumber, posixgroup

Sync time with ntp server to ensure time is upto date
----------------------------------------------------------

service ntpd stop
ntpdate pool.ntp.org
service ntpd start
```
- Set password policy (optional)
```
ipa pwpolicy-mod --maxlife=0 --minlife=0 global_policy

###########################
IPA Client
######################

Note: ipa-client installation should be done on all the nodes of cluster.

ipa server Host entry  Update:
------------------------
vi /etc/hosts
172.31.25.255    ec2-54-173-54-193.compute-1.amazonaws.com       IPA_DN
172.31.26.0    ec2-54-172-53-173.compute-1.amazonaws.com       IPA_NN
172.31.43.121    ec2-54-86-17-4.compute-1.amazonaws.com          IPA


service iptables stop
service ip6tables stop
chkconfig ip6tables off
chkconfig iptables off



Installation
--------------------------------
yum -y install ipa-client openldap-clients

Sync time with ntp server to ensure time is upto date
----------------------------------------------------------
service ntpd stop
ntpdate pool.ntp.org
service ntpd start

Setting ipa client with valid parameters
-------------------------------
ipa-client-install

ipa-client-install
DNS discovery failed to determine your DNS domain
Provide the domain name of your IPA server (ex: example.com): amazonaws.com
Provide your IPA server name (ex: ipa.example.com): ec2-54-86-17-4.compute-1.amazonaws.com
The failure to use DNS to find your IPA server indicates that your resolv.conf file is not properly configured.
Autodiscovery of servers for failover cannot work with this configuration.
If you proceed with the installation, services will be configured to always access the discovered server for all operations and will not fail over to other servers in case of failure.
Proceed with fixed values and no DNS discovery? [no]: yes
Hostname: ec2-54-172-53-173.compute-1.amazonaws.com
Realm: AMAZONAWS.COM
DNS Domain: amazonaws.com
IPA Server: ec2-54-86-17-4.compute-1.amazonaws.com
BaseDN: dc=amazonaws,dc=com

Continue to configure the system with these values? [no]: yes
User authorized to enroll computers: admin
Synchronizing time with KDC...
Unable to sync time with IPA NTP server, assuming the time is in sync. Please check that 123 UDP port is opened.
Password for admin@AMAZONAWS.COM:
Successfully retrieved CA cert
    Subject:     CN=Certificate Authority,O=AMAZONAWS.COM
    Issuer:      CN=Certificate Authority,O=AMAZONAWS.COM
    Valid From:  Thu Feb 19 13:10:06 2015 UTC
    Valid Until: Mon Feb 19 13:10:06 2035 UTC

Enrolled in IPA realm AMAZONAWS.COM
Created /etc/ipa/default.conf
New SSSD config will be created
Configured sudoers in /etc/nsswitch.conf
Configured /etc/sssd/sssd.conf
Configured /etc/krb5.conf for IPA realm AMAZONAWS.COM
trying https://ec2-54-86-17-4.compute-1.amazonaws.com/ipa/xml
Forwarding 'env' to server u'https://ec2-54-86-17-4.compute-1.amazonaws.com/ipa/xml'
Hostname (ec2-54-86-17-4.compute-1.amazonaws.com)
updated DNS records.
Adding SSH public key from /etc/ssh/ssh_host_dsa_key.pub
Adding SSH public key from /etc/ssh/ssh_host_rsa_key.pub
Forwarding 'host_mod' to server u'https://ec2-54-86-17-4.compute-1.amazonaws.com/ipa/xml'
updated DNS SSHFP records.
SSSD enabled
Configuring amazonaws.com as NIS domain
Configured /etc/openldap/ldap.conf
NTP enabled
Configured /etc/ssh/ssh_config
Configured /etc/ssh/sshd_config
Client configuration complete.
-------------------------
Verify the IPA server users
-------------------------


id ali
id paul


Monday, 21 April 2014

Configuring Hadoop High Availability Cluster(Two Namenode service)

Configuring High Availability Hadoop Cluster:(Qjournal Method)

Steps:
Step1: Download and configure Zookeeper
Step2: Hadoop configuration and high availability settings
Step3: Creating folders for Hadoop cluster and file permissions
Step4: Hdfs service and file system format

Steps in Details:
Step 1: Download and configure Zookeeper

1.1 Download and configure Zookeeper software package from (https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz)
[cluster@n3:~]$wget https://www.apache.org/dist/zookeeper/zookeeper-3.4.5/zookeeper-3.4.5.tar.gz
Extract source
[cluster@n3:~]$tar –zxvf zookeeper-3.4.5.tar.gz

1.2 Zookeeper related configuration files are located
Configuration files: /home/cluster/zookeeper-3.4.5/conf
Binary executables: /home/cluster/zookeeper-3.4.5/bin
The Main configuration file
/home/cluster/zookeeper-3.4.5/conf/zoo.cfg
[cluster@n3:~]$cp zoo_sample.cfg zoo.cfg
Modifying zoo.cfg as per our installation guide
[cluster@n3:~]$nano /home/cluster/zookeeper-3.4.5/conf/zoo.cfg
tickTime=2000
clientPort=3000
initLimit=5
syncLimit=2
dataDir=/home/cluster/zookeeper/data/
dataLogDir=/home/cluster/zookeeper/log/
server.1=n3:2888:3888
server.2=n4:2889:3889
Save & Exit!
Note :-If Each of the servers hosted in the same physical machine as instance , every server port number has changed to n3:2888:3888 , n4: 2889:3889


1.3 Create the folder structure for Zookeeper data and logs as defined in zoo.cfg , repeat following step in all the nodes in the cluster (n3 & n4)
[cluster@n3:~]$mkdir –p /home/cluster/zookeeper/data/
[cluster@n3:~]$mkdir –p /home/cluster/zookeeper/log/

1.4 Create the myid file in /home/cluster/zookeeper/data/ and assign the value of each of the nodes in cluster. (n3=1 & n4=2)
[cluster@n3:~]$nano /home/cluster/zookeeper/data/myid
1
Save and Exit!
[cluster@n4~]$nano /home/cluster/zookeeper/data/myid
2
Save & Exit!

Step 2: Hadoop configuration and high availability settings

Download the latest hadoop 2.X.X version from the releases
2.1 Add / modify the following lines in hadoop-env.sh file to apply environment variable settings.
[cluster@n3:~]$ nano /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/etc/hadoop/hadoop-env.sh
export JAVA_HOME=/usr/lib/jvm/jdk1.7.0_45/
export HADOOP_COMMON_LIB_NATIVE_DIR=/home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/lib/native/
export HADOOP_OPTS=”-Djava.library.path=/home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/lib/native/”
2.2 Add following lines in cores-site.xml file to configure journaling, default FS, temp directory & hdfs cluster. Within the <configuration> tag
[cluster@n3:~]$nano /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/etc/hadoop/core-site.xml
<property>
    <name>hadoop.tmp.dir</name>
    <value>/hdfs/dfs/tmp</value>
</property>
<property>
    <name>fs.defaultFS</name>
    <value>hdfs://mycluster</value>
</property>
<property>
  <name>dfs.journalnode.edits.dir</name>
  <value>/hdfs/dfs/journal/data</value>
</property>
2.3 Add following lines in hdfs-site.xml file to configure dfs nameservice, dfs high availability, zookeeper & failover. Within the <configuration> tag.
[cluster@n3:~]$nano /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/etc/hadoop/hdfs-site.xml
<property>
    <name>dfs.name.dir</name>
    <value>/hdfs/dfs/nn</value>
</property>
<property>
    <name>dfs.data.dir</name>
    <value>/hdfs/dfs/dn</value>
 </property>
 <property>
    <name>dfs.nameservices</name>
    <value>mycluster</value>
    <final>true</final>
 </property>
<property>
  <name>dfs.ha.namenodes.mycluster</name>
  <value>n3,n4</value>
  <final>true</final>
</property>
 <property>
    <name>dfs.namenode.rpc-address.mycluster.n3</name>
    <value>n3:8020</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.mycluster.n3</name>
    <value>n3:50070</value>
 </property>
 <property>
    <name>dfs.namenode.secondaryhttp-address.mycluster.n3</name>
    <value>n3:50090</value>
 </property>
 <property>
    <name>dfs.namenode.rpc-address.mycluster.n4</name>
    <value>n4:8020</value>
 </property>
 <property>
    <name>dfs.namenode.http-address.mycluster.n4</name>
    <value>n4:50070</value>
 </property>
 <property>
    <name>dfs.namenode.secondaryhttp-address.mycluster.n4</name>
    <value>n4:50090</value>
 </property>
<property>
  <name>dfs.namenode.shared.edits.dir</name>
  <value>qjournal://n3:8485;n4:8485/mycluster</value>
</property>
<property>
  <name>dfs.ha.automatic-failover.enabled</name>
  <value>true</value>
</property>
<property>
  <name>ha.zookeeper.quorum</name>
  <value>n3:3000,n4:3000</value>
</property>
<property>
  <name>dfs.ha.fencing.methods</name>
  <value>sshfence</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.private-key-files</name>
  <value>/home/cluster/.ssh/id_rsa</value>
</property>
<property>
  <name>dfs.ha.fencing.ssh.connect-timeout</name>
  <value>30000</value>
</property>
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>

2.4 Add datanodes in the slaves configuration file as shown below.
[cluster@n3:~]$nano /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/etc/hadoop/slaves
n3
n4
Save & Exit!

Step 3 : Creating folders for Hadoop cluster and set file permissions


3.1 Create folder structure for journalnode as defined in core-site.xml, repeat following step in all the cluster nodes (n3 & n4)
[cluster@n3:~]$mkdir –p /hdfs/dfs/journal/data

3.2 Create temp folder for hadoop cluster as defined in core-site.xml, repeat following step in all the cluster nodes (n3 & n4)
[cluster@n3:~]$mkdir -p /hdfs/dfs/tmp

3.3 Create datanode and namenode folder for hadoop cluster as defined in hdfs-site.xml, repeat following step in all the cluster nodes (n3 & n4)
[cluster@n3:~]$mkdir -p /hdfs/dfs/dn
[cluster@n3:~]$mkdir -p /hdfs/dfs/nn

3.4 Copy hadoop source and zookeeper source configured in n3 node to n4

Step 4: Hdfs service and file system format

4.1 Start zookeeper service, once in all the nodes in cluster used for zookeeper, repeat below step in all the cluster nodes running zookeeper (n3 & n4).
Go to zookepeer-3.4.5 Binary path i.e /home/cluster/zookeeper-3.4.5/bin, then execute below commands
[cluster@n3:~]$./zkServer.sh start
[cluster@n4~]$./zkServer.sh start

4.2 Format Zookeeper file system in n3
Go to hadoop home path i.e /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2, then execute below command
[cluster@n3:~]$bin/hdfs zkfc –formatZK
Before format start journalnode in all the cluster nodes (n3 & n4)
[cluster@n3:~]$sbin/hadoop-daemon.sh start journalnode

4.3 Format namenode in n3
Go to hadoop home path i.e /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2, then execute below command
[cluster@n3:~]$bin/hdfs namenode –format

4.4 Copy Meta data information to slave name node in our guide (n4), run below command in
n4 (slave).
Make sure that namenode service is running in master node(n3)….
Go to hadoop home path i.e /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2, then execute below command
[cluster@n3:~]$sbin/hadoop-daemon.sh start namenode
Then in n4,
Go to hadoop home path i.e /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2, then execute below command
[cluster@n4~]$bin/hdfs namenode -bootstrapStandby


Start hadoop service............


[cluster@n3:~]$cd /home/cluster/hadoop-2.2.0-cdh5.0.0-beta-2/sbin
./stop-all.sh
and start again.
./start-dfs.sh
run jps to check services running in n3 & n4
[cluster@n3 sbin]$ jps
17588 DFSZKFailoverController
16966 DataNode
24142 Jps
4268 QuorumPeerMain
16745 NameNode
17276 JournalNode
[cluster@n4 bin]$ jps
14357 DFSZKFailoverController
2369 QuorumPeerMain
13906 DataNode
23689 Jps
15458 NameNode
14112 JournalNode



Verifying Automatic Failover

After the initial deployment of a cluster with automatic failover enabled, you should test its operation. To do so, first locate the active NameNode. As mentioned above, you can tell which node is active by visiting the NameNode web interfaces.
Once you have located your active NameNode, you can cause a failure on that node. For example, you can use kill -9 <pid of NN> to simulate a JVM crash. Or you can power-cycle the machine or its network interface to simulate different kinds of outages. After you trigger the outage you want to test, the other NameNode should automatically become active within several seconds. The amount of time required to detect a failure and trigger a failover depends on the configuration of ha.zookeeper.session-timeout.ms, but defaults to 5 seconds.
If the test does not succeed, you may have a misconfiguration. Check the logs for the zkfc daemons as well as the NameNode daemons in order to further diagnose the issue.




Thank You.