Last October 28th Microsoft finally released a new Azure service called “Windows Azure HDInsight Service”, that is our Hadoop offering in the Cloud:
Along with this core service, a series of additional components has been also released to integrate Big Data world with Microsoft BI stack familiar tools and on-premise technologies, the most important one is a new ODBC driver that will permit connection to Hadoop HIVE:
Microsoft® Hive ODBC Driver is a connector to Apache Hadoop Hive available as part of HDInsight clusters.
Microsoft® Hive ODBC Driver enables Business Intelligence, Analytics and Reporting on data in Apache Hive.
The Azure Architecture Center contains guidance for building end-to-end solutions on Microsoft Azure. Here you will find reference architectures, best practices, design patterns, scenario guides, and reference implementations. Microsoft Cloud Reference Architecture: Foundation 8 Software Version Description System Center Service Provider Foundation 2012 R2 UR6 Integrated IaaS capabilities for designing and implementing multi-tenant self-service portals Windows Azure Pack (WAP) 1.0 UR6 Integrates with System Center and Windows Server. It is usually not desirable to add DeviceCode as the default token provider type. But it can be used when using a local command: hadoop fs -Dfs.adl.oauth2.access.token.provider.type=DeviceCode -ls. Running this will print a URL and device code that can be used to login from any browser (even on a different machine, outside of the ssh session).
You can download the driver from the link below:
Microsoft® Hive ODBC Driver
This driver can be installed on 32bit or 64bit versions of Windows 7, Windows 8, Windows Server 2008 R2 and Windows Server 2012 and will allow connection to “Windows Azure HDInsight Service” (v.1.6 and later) and “Windows Azure HDInsight Emulator” (v.1.0.0.0 and later). You should install the version that matches the version of the application where you will be using the ODBC driver. Both packages can be installed on the same machine if you need both versions of the driver; they are installed, by default, on different paths:
Depending on the version installed, you may need to use a different version of ODBC Data Source Administrator, the 32-bit version (“odbcad32.exe”) is located, on a 64bit machine, inside “C:WindowsSysWOW64” location. On Windows 8/2012 Server, you can find the right version easily using the “Search” function:
In case of version mismatch between SQL Server and ODBC driver, you will receive an error message as reported below when you will try to execute any query:
OLE DB provider 'MSDASQL' for linked server 'HiveSample' returned message '[Microsoft][ODBC Driver Manager] The specified DSN contains an architecture mismatch between the Driver and Application'.
Msg 7303, Level 16, State 1, Line 1 Cannot initialize the data source object of OLE DB provider 'MSDASQL' for linked server 'HiveSample'.
The setup process is really straightforward, nothing to configure or change until the end:
As you can see in the second print screen, this driver has been developed in collaboration with SIMBA Technologies Inc., at the end of the installation it’s recommended to go into the installation folder, by default for 64-bit is “C:Program FilesMicrosoft Hive ODBC Driver” and look at the following files/documents:
Includes material furnished by Simba Technologies, Inc.Used under license. Note: While Microsoft is not the author of the files below, Microsoft is offering you a license subject to the terms of the Microsoft Software License Terms for Microsoft Hive ODBC Driver (the “Microsoft Program”). Microsoft reserves all other rights. The notices below are provided for informational purposes only and are not the license terms under which Microsoft distributes these files.
Once installed, a sample data source is already pre-configured, but need modifications if you want to reuse for connecting to your HDInsight cluster:
Click on “Configure” and adjust parameters as reported in the print screen below:
NOTE: If you already tested a beta version of this ODBC driver in earlier HDInsight release, please be aware that the TCP port used has been changed from 563 to 443.
Insert “User Name” and “Password” based on your selection when provisioned the HDInsight cluster and then click on “Test” button; if everything is correct, you should receive the following output:
Be very careful also with the “Advanced Options” button, there are some important parameters to be aware of:
Now it’s time to open SQL Server Management Studio and create a “Linked Server” definition using the TSQL statement below:
EXEC master.dbo.sp_addlinkedserver @server = N'HiveSample', @srvproduct=N'HIVE', @provider=N'MSDASQL', @datasrc=N'Sample Microsoft Hive DSN', @provstr=N'Provider=MSDASQL.1;Persist Security Info=True;User ID=<<user_name>>; Password=<<password>>;'
You can now see the new “Linked Server” definition along with the sample table existing in the HIVE data warehouse on HDInsight cluster:
Now I want to run a simple SELECT statement to access data in the default HIVE sample table:
select * from [HiveSample].[HIVE].[default].hivesampletable
Mac vs pc compare. Unfortunately something went wrong and got the following error:
OLE DB provider 'MSDASQL' for linked server 'HiveSample' returned message 'Requested conversion is not supported.'.
Msg 7341, Level 16, State 2, Line 1 Cannot get the current row value of column '[MSDASQL].clientid' from OLE DB provider 'MSDASQL' for linked server 'HiveSample'.
This seems very strange to me since [clientid] in HIVE is a basic “string” type and the ODBC connector should be able to translate. Trusting the error message above, I was able to successfully execute the above query introducing an explicit type conversion in TSQL using CONVERT function as shown below and changing the 'default string column lenght' (ODBC DSN advanced options) parameter value to (8000):
select convert(varchar(8000),[clientid]) from [HiveSample].[HIVE].[default].hivesampletable
The reason for the error above is that HIVE does *not* provide the maximum data length for “string Download game league of stickman 2 mod apk. ” columns in the column metadata, then you need to use explicit cast/convert techniques in SQL Server TSQL to deal with it. Obviously you can use different lower numbers from 8000, since it's the maximum char length you can specify without using MAX qualifier in VARCHAR, it's up to you, but be sure to set the ODBC DSN advanced options parameters accordingly.
IMPORTANT: Be very careful with the parameter values contained in the ODBC “Advanced Options” property page, as described early in this post, since you may retrieve incorrect data due to precision lost and/or truncation.
In addition to the four-parts syntax used above, you can also use the OPENQUERY syntax as reported in the example below:
select * from openquery (HiveSample, ' select sessionid from hivesampletable)
If you want to check the HIVE native data type of a specific table, you can use the following Power Shell command:
Invoke-Hive 'describe hivesampletable'
NOTE: In order to use the Power Shell cmdlet for HDInsight, you need to follow the instructions below:
Install and configure PowerShell for HDInsight
Another useful query, still using Power Shell for submission, is reported below to show all the tables present in the HIVE metastore: Mac os zip free download.
Invoke-Hive 'show tables'
When you submit queries to HDInsight HIVE using the ODBC connector, be aware that every query will be translated to a Hadoop Map-Reduce Job, then the execution time may be long: if in your SQL Server installation you normally use a query timeout different from the default value of (0), that is infinite wait, you may have to change it, otherwise you will get an error before HDInsight will be able to process your query/job. In the properties of the “Linked Server”, you can see and eventually change the connection and query timeout values:
Download mac keyboard for windows. You may also have to change the SQL Server instance wide timeout value for “Query Timeout”, since by default is equal to 600 seconds (10 minutes), using the script below as an example for infinite wait:
sp_configure 'show advanced options',1
go
reconfigure
sp_configure 'remote query timeout (s)',0
![]()
go
reconfigure
go
Finally, remember that this is v1 of this new technology, then not all HIVE aspects are covered by the driver, then I would recommend you to check if everything you need is included. This is a short list of what is supported and what is not yet:
If you are a SQL Server DBA and then an expert of TSQL language, and want to know exactly which syntax and HiveQL syntax are supported, you can see the official link to the Hadoop HIVE documentation on Apache web site:
That’s all SQL Server folks…. Welcome to the Big data world!
The Cloudera Altus Director Azure Plugin is an implementation of the Cloudera Altus Director Service Provider Interface for the Microsoft Azure cloud platform.
Copyright and License
Mac dvdripper pro free download. Copyright © 2016-2018 Cloudera. Licensed under the Apache License, Version 2.0.
Cloudera, the Cloudera logo, and any other product or service names or slogans contained in this document are trademarks of Cloudera and its suppliers or licensors, and may not be copied, imitated or used, in whole or in part, without the prior written permission of Cloudera or the applicable trademark holder. Kashmir ki kali full movie download for mobile.
Hadoop and the Hadoop elephant logo are trademarks of the Apache Software Foundation. Amazon Web Services, the 'Powered by Amazon Web Services' logo, Amazon Elastic Compute Cloud, EC2, Amazon Relational Database Service, and RDS are trademarks of Amazon.com, Inc. or its affiliates in the United States and/or other countries. All other trademarks, registered trademarks, product names and company names or logos mentioned in this document are the property of their respective owners. Reference to any products, services, processes or other information, by trade name, trademark, manufacturer, supplier or otherwise does not constitute or imply endorsement, sponsorship or recommendation thereof by us.
Complying with all applicable copyright laws is the responsibility of the user. Without limiting the rights under copyright, no part of this document may be reproduced, stored in or introduced into a retrieval system, or transmitted in any form or by any means (electronic, mechanical, photocopying, recording, or otherwise), or for any purpose, without the express written permission of Cloudera.
Cloudera may have patents, patent applications, trademarks, copyrights, or other intellectual property rights covering subject matter in this document. Except as expressly provided in any written license agreement from Cloudera, the furnishing of this document does not give you any license to these patents, trademarks, copyrights, or other intellectual property. For information about patents covering Cloudera products, see http://tiny.cloudera.com/patents.
The information in this document is subject to change without notice. Cloudera shall not be liable for any damages resulting from technical errors or omissions which may be present in this document, or from use of this document.
These instructions describe how to run unit and live tests for the Cloudera Altus Director Azure Plugin.
PrerequisitesRequired environmental information
Before using this plugin you will need:
Resource limits
Per the Azure Reference Architecture (PDF), the supported instance types used in deploying a CDH cluster are large. Even with small test clusters you'll hit the default resource limits per subscription.
Increase the limits, especially core count, by filing a support ticket on the Azure portal.
Building everything
To build everything, from Launchpad's base directory run:
Unit tests
Unit tests are simple. From Launchpad's base directory run:
Live tests
Live tests are more complicated and require the following:
To run all Live Tests, from Launchpad's base directory run:
To run a specific test, from Launchpad's base directory run (note the change from
-am clean install to test ):
These instructions describe how to deploy the Cloudera Altus Director Azure Plugin.
PrerequisitesRequired fields
Before using this plugin you will need:
Pre deployment setupForward and reverse DNS
Cloudera Distribution of Hadoop (CDH) and Cloudera Altus Director require forward and reverse hostname resolution; this is not currently supported by Azure (Azure only supports forward resolution) and it's required to do name resolution using your own DNS server.
More instructions.
Resource limits
Per the Azure Reference Architecture (PDF), the supported instance types used in deploying a CDH cluster are large. Even with small test clusters you'll hit the default resource limits per subscription.
Increase the limits, especially core count, by filing a support ticket on the Azure portal.
IMPORTANT: The Cloudera Enterprise Reference Architecture for Azure Deployments (PDF) is the authoritative document for supported deployment configurations in Azure.
There are two files that the Cloudera Altus Director Azure Plugin uses to change settings:
The files and their uses are explained below.
|
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |