0x90e's Blog

Cuckoo Install Failure(pillow)

Posted on 2017-12-20 | In Security

Cuckoo Install Failure(pillow)

Baisc Info:

Error Message: Failed building wheel for pillow
OS: Ubuntu 16.04 LST
Cuckoo: 2.0.5
Python Version: 2.7
Python Virtual Environment: Conda
Read more »

Spark SQL DataFrame RDD

Posted on 2017-12-14 | In Big data

Spark SQL DataFrame RDD

DataFrame interoperating with RDDs

DataFrame interoperating with RDDs
Spark SQL支援兩種方式將既有的RDD轉換成DataFrame:
- 使用指定類型的物件，透過反射(reflection)將RDD中的schema推導出來後轉換成DataFrame
  - 限制: Scala 2.10最多只能指定22個field
- 使用interface創建schema，並應用在既有的RDD上得到DataFrame
  Read more »

Spark SQL DataFrame Operations

Posted on 2017-12-08 | In Big data

Spark SQL DataFrame Operations

Code

import org.apache.spark.sql.SparkSession
object DataFrameDemo {
  def main(args: Array[String]): Unit = {
    val spark = SparkSession
      .builder().master("local[2]")
      .appName("DataFrameDemo")
      .getOrCreate()
    val df = spark.read.format("json").load("file/people.json")
    println("Displays the content of the DataFrame to stdout")
    df.show()
    println("Print the schema in a tree format")
    df.printSchema()
    println("Select only the name column")
    df.select("name").show()
    // This import is needed to use the $-notation
    import spark.implicits._
    println("Select everybody, but increment the age by 1")
    df.select($"name", $"age" + 1).show()
    println("Select people older than 21")
    df.filter($"age" > 21).show()
    spark.stop()
  }
}

Spark SQL DataSet and DataFrame

Posted on 2017-12-05 | In Big data

Spark SQL DataSet and DataFrame

Spark SQL DataSet and DataFrame

DataSet

分布式的資料集合
Spark 1.6後開始支持
增強RDD的特性:
- 支持強類型(Strong typing)
- 支持Lambda方法
- 可使用Spark SQL的最佳化執行引擎來執行DataSet操作
可以使用JVM物件來構建，或者執行transformation方法來構建，例如map,flatMap等等
可以使用Java或Scala來進行操作，Python暫時不支持DataSet API
Read more »

Spark SQL Architecture

Posted on 2017-12-02 | In Big data

Spark SQL architecture

Spark SQL working with Hive

Posted on 2017-11-28 | In Big data

Spark SQL

Spark SQL Introduction

Spark SQL
Spark SQL是Apache Spark中用來處理結構化數據的子模組。
不僅僅侷限於處理SQL，還可以支持Hive、Parquet等外部數據源的操作與轉換。
在Spark程式中直接使用SQL語句或DataFrame API。
使用通用的方法訪問不同的外部數據源，如Hive, Avro, Parquet, ORC等，並且支持不同數據源間的Join操作。
透過存取MetaStore對既有的Hive進行存取。
支持JDBC與ODBC的訪問方式。

Compere Alluxio with HDFS

Posted on 2017-11-08 | In Big data

Compere Alluxio with HDFS

Purpose

在Hadoop上，對Alluxio與HDFS中的文件進行操作，比較其讀入的數據量和執行速度。
在Spark上，對Alluxio與HDFS中的文件進行操作，比較其讀入的數據量和執行速度。

Prepare for Testing File

# 測試文件大小大約為18.2MB
[hadoop@testmain ~]# ll page_views.dat 
-rwxr-xr-x 1 root root 19014993 Nov  5 00:21 page_views.dat
# 將文件上傳至Alluxio
[hadoop@testmain ~]# alluxio fs mkdir /wordcount/input/
Successfully created directory /wordcount/input/
[root@testmain ~]# alluxio fs copyFromLocal page_views.dat /wordcount/input/
# 將文件上傳至HDFS
[hadoop@testmain ~]$ hdfs dfs -put page_views.dat /wordcount/input/

Alluxio Instation

Posted on 2017-11-08 | In Big data

Alluxio Instation

Download Alluxio

# 解壓縮下載的tar包，或者自行編譯的tar包
[root@testmain ~]# tar -zxvf alluxio-1.6.0-hadoop-2.8-bin.tar.gz -C /opt/software
[root@testmain ~]# cd /opt/software
[root@testmain software]# ln -s alluxio-1.6.0-hadoop-2.8 alluxio
# 將file owner更改為啟動Alluxio的Linux user
[root@testmain software]# sudo chown -R hadoop:hadoop alluxio-1.6.0-hadoop-2.8/
[root@testmain software]# sudo chown -R hadoop:hadoop alluxio
# 配置環境變量
[root@testmain software]# sudo vim /etc/profile
export ALLUXIO_HOME="/opt/software/alluxio"
export PATH="$ALLUXIO_HOME/bin:$PATH"
[root@testmain software]# source /etc/profile

HBase Installation

Posted on 2017-11-02 | In Big data

HBase Installation

基礎配置

基本訊息

OS: CentOS 6.5 64bit
HBase: hbase-1.2.0-cdh5.7.0
JDK: 8u144

Hive Built-in Functions

Posted on 2017-09-25 | In Big data

Hive Built-in Functions

Built-in Functions

Apache Hive Built-in Functions

透過Hive command列出Built-in Functions

顯示出當前session所支持的Built-in Functions
Read more »

Cuckoo Install Failure(pillow)

Baisc Info:

Spark SQL DataFrame RDD

DataFrame interoperating with RDDs

Spark SQL DataFrame Operations

Code

Spark SQL DataSet and DataFrame

DataSet

Spark SQL architecture

Spark SQL

Spark SQL Introduction

Compere Alluxio with HDFS

Purpose

Prepare for Testing File

Alluxio Instation

Download Alluxio

HBase Installation

基礎配置

基本訊息

Hive Built-in Functions

Built-in Functions

透過Hive command列出Built-in Functions

Download Alluxio

基本訊息