Big Data

Posts

Data Migration Using Apache Sqoop

July 03, 2023

Data Migration Using Apache Sqoop Sqoop: sql + hadoop = sqoop actually sqoop is one of the component in hadoop and it was initially developed and maintained by Cloudera. sqoop is a data pipeline tool used to transfer the data between RDBMS(Relational Database Management System) to Hadoop. It can be a any rdbms data. There are three ways to import from rdbms to hadoop 1) RDBMS to HDFS 2) RDBMS to Hive 3) RDBMS to Hbase we can do export to rdbms from hive and hdfs only 1)HDFS to RDBMS 2)Hive to RDBMS Here we use the Mysql(Rdbms) commands for import 1) MySql to HDFS bin/sqoop import --connect jdbc:mysql://localhost/db -username root -password 123 --table Persons -m1 2)Mysql to Hive bin/sqoop import --connect jdbc:mysql://localhost/db -username root -password 123 --table test --hive-table mysqltohive --create-hive-table --hive-import -m1 3)Mysql to Hbase bin/sqoop import --connect jdbc:mysql://localhost/test --username root --password root --table...

APACHE-HIVE INSTALLATION ON UBUNTU

June 26, 2023

Apache-Hive Installation On top Of Hadoop Note : Before that installation you should installed Hadoop step 1: https://downloads.apache.org/hive/hive-3.1.2/ this is download link for apache-hive-3.1.2 through that link you can download the hive. if you any other latest version use the link - https://hive.apache.org/general/downloads/ now you should start the hadoop step 2 : Once you download hive, extract the hive file and set hadoop environmental hive path in you bashrc file : like.. command for open bashrc file "gedit .bashrc" add path in your bashrc file.. export HIVE_HOME=<your hive path .......> export HIVE_CONF_DIR=<you hive conf path.....> excute the command "source .bashrc" step 3: Install mysql, it is need for hive.. Note : hive has a inbuilt rdbms derby command for install mysql-server "sudo apt-get install mysql-server" if you want login mysql the command "sudo mysql -u root -p" step 4: download mysql co...

APACHE-SPARK INSTALLATION ON UBUNTU

June 16, 2023

SPARK INSTALLATION ON UBUNTU strep 1: download java using the below link https://www.oracle.com/in/java/technologies/javase/javase8-archive-downloads.html and extract the tar file using the below command tar -zxvf jdk-8u202-linux-x64.tar.gz (Extract the tar file) step 2: spark download click the below link here is the direct download link https://dlcdn.apache.org/spark/spark-3.4.0/spark-3.4.0-bin-hadoop3.tgz in the link you can see the latest spark version 3.4.0 you can click and download. (or)if you want to download other versions clickbelow the link you can latest and old versions https://spark.apache.org/downloads.html extract the tar file using the below command: tar -zxvf <your spark path...> exmaple... tar -zxvf /home/user/spark-3.4.0-bin-hadoop3.tgz step 3: open the bashrc file using the command " gedit .bashrc " and set your spark and java path export JAVA_HOME=/home/user/jdk1.8.0_202 export SPA...

Hadoop Single Node Installation On Ubuntu

June 07, 2023

WHAT IS BIG DATA Big data refers to extremely large and complex data sets that cannot be effectively managed, processed, or analyzed using traditional data processing applications. It encompasses the four V's: volume, velocity, variety, and veracity. Volume: Big data refers to a vast amount of data generated from various sources, such as social media, sensors, transactions, and more. It typically exceeds the capacity of traditional database systems. Velocity: Big data is generated at high speed and often in real-time. Data is continuously produced, collected, and processed rapidly, requiring efficient and timely analysis. Variety: Big data includes various types of data, such as structured, unstructured, and semi-structured data. Structured data refers to organized data in a fixed format, while unstructured data is more flexible, including text, images, videos, social media posts, and more. Veracity: Big data can have issues with accuracy, reliability, and trustworthiness. Veraci...

Search This Blog