0x90e's Blog

Chase Excellence,
Sucess will follow.


  • Home

  • Tags

  • Categories

  • Archives

Cuckoo Install Failure(pillow)

Posted on 2017-12-20 | In Security

Cuckoo Install Failure(pillow)

Baisc Info:

  • Error Message: Failed building wheel for pillow
  • OS: Ubuntu 16.04 LST
  • Cuckoo: 2.0.5
  • Python Version: 2.7
  • Python Virtual Environment: Conda
    Read more »

Spark SQL DataFrame RDD

Posted on 2017-12-14 | In Big data

Spark SQL DataFrame RDD

DataFrame interoperating with RDDs

  • DataFrame interoperating with RDDs
  • Spark SQL支援兩種方式將既有的RDD轉換成DataFrame:
    • 使用指定類型的物件,透過反射(reflection)將RDD中的schema推導出來後轉換成DataFrame
      • 限制: Scala 2.10最多只能指定22個field
    • 使用interface創建schema,並應用在既有的RDD上得到DataFrame
      Read more »

Spark SQL DataFrame Operations

Posted on 2017-12-08 | In Big data

Spark SQL DataFrame Operations

Code

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
import org.apache.spark.sql.SparkSession
object DataFrameDemo {
def main(args: Array[String]): Unit = {
val spark = SparkSession
.builder().master("local[2]")
.appName("DataFrameDemo")
.getOrCreate()
val df = spark.read.format("json").load("file/people.json")
println("Displays the content of the DataFrame to stdout")
df.show()
println("Print the schema in a tree format")
df.printSchema()
println("Select only the name column")
df.select("name").show()
// This import is needed to use the $-notation
import spark.implicits._
println("Select everybody, but increment the age by 1")
df.select($"name", $"age" + 1).show()
println("Select people older than 21")
df.filter($"age" > 21).show()
spark.stop()
}
}
Read more »

Spark SQL DataSet and DataFrame

Posted on 2017-12-05 | In Big data

Spark SQL DataSet and DataFrame

  • Spark SQL DataSet and DataFrame

DataSet

  • 分布式的資料集合
  • Spark 1.6後開始支持
  • 增強RDD的特性:
    • 支持強類型(Strong typing)
    • 支持Lambda方法
    • 可使用Spark SQL的最佳化執行引擎來執行DataSet操作
  • 可以使用JVM物件來構建,或者執行transformation方法來構建,例如map,flatMap等等
  • 可以使用Java或Scala來進行操作,Python暫時不支持DataSet API
    Read more »

Spark SQL Architecture

Posted on 2017-12-02 | In Big data

Spark SQL architecture

Read more »

Spark SQL working with Hive

Posted on 2017-11-28 | In Big data

Spark SQL

Spark SQL Introduction

  • Spark SQL
  • Spark SQL是Apache Spark中用來處理結構化數據的子模組。
  • 不僅僅侷限於處理SQL,還可以支持Hive、Parquet等外部數據源的操作與轉換。
  • 在Spark程式中直接使用SQL語句或DataFrame API。
  • 使用通用的方法訪問不同的外部數據源,如Hive, Avro, Parquet, ORC等,並且支持不同數據源間的Join操作。
  • 透過存取MetaStore對既有的Hive進行存取。
  • 支持JDBC與ODBC的訪問方式。
Read more »

Compere Alluxio with HDFS

Posted on 2017-11-08 | In Big data

Compere Alluxio with HDFS

Purpose

  1. 在Hadoop上,對Alluxio與HDFS中的文件進行操作,比較其讀入的數據量和執行速度。
  2. 在Spark上,對Alluxio與HDFS中的文件進行操作,比較其讀入的數據量和執行速度。

Prepare for Testing File

1
2
3
4
5
6
7
8
9
10
11
# 測試文件大小大約為18.2MB
[hadoop@testmain ~]# ll page_views.dat
-rwxr-xr-x 1 root root 19014993 Nov 5 00:21 page_views.dat
# 將文件上傳至Alluxio
[hadoop@testmain ~]# alluxio fs mkdir /wordcount/input/
Successfully created directory /wordcount/input/
[root@testmain ~]# alluxio fs copyFromLocal page_views.dat /wordcount/input/
# 將文件上傳至HDFS
[hadoop@testmain ~]$ hdfs dfs -put page_views.dat /wordcount/input/
Read more »

Alluxio Instation

Posted on 2017-11-08 | In Big data

Alluxio Instation

Download Alluxio

  • Download Alluxio
    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    # 解壓縮下載的tar包,或者自行編譯的tar包
    [root@testmain ~]# tar -zxvf alluxio-1.6.0-hadoop-2.8-bin.tar.gz -C /opt/software
    [root@testmain ~]# cd /opt/software
    [root@testmain software]# ln -s alluxio-1.6.0-hadoop-2.8 alluxio
    # 將file owner更改為啟動Alluxio的Linux user
    [root@testmain software]# sudo chown -R hadoop:hadoop alluxio-1.6.0-hadoop-2.8/
    [root@testmain software]# sudo chown -R hadoop:hadoop alluxio
    # 配置環境變量
    [root@testmain software]# sudo vim /etc/profile
    export ALLUXIO_HOME="/opt/software/alluxio"
    export PATH="$ALLUXIO_HOME/bin:$PATH"
    [root@testmain software]# source /etc/profile
Read more »

HBase Installation

Posted on 2017-11-02 | In Big data

HBase Installation

基礎配置

基本訊息

  • OS: CentOS 6.5 64bit
  • HBase: hbase-1.2.0-cdh5.7.0
  • JDK: 8u144
Read more »

Hive Built-in Functions

Posted on 2017-09-25 | In Big data

Hive Built-in Functions

Built-in Functions

  • Apache Hive Built-in Functions

透過Hive command列出Built-in Functions

  • 顯示出當前session所支持的Built-in Functions
    Read more »
1…4567
0x90e

0x90e

64 posts
8 categories
25 tags
E-Mail GitHub
© 2016 — 2020 0x90e
Powered by Hexo
|
Theme — NexT.Pisces v5.1.3
Unique Visitor Page View