Setup ELK(ElasticSearch、Logstash、Kibana) Environment Posted on 2018-01-22 | In Big data Setup ELK(ElasticSearch、Logstash、Kibana) Environment 此文章描述如何建立ELK運行環境,以及如何啟動與使用。 ElasticSearch的安裝可以參考此篇文章 Basic Info OS: CentOS 7.3 Nginx Version: 1.8.0 ElasticSearch Version: 5.5.0 Logstash Version: 5.5.0 Kibana Version: 5.5.0 Read more »
Nginx Installation Posted on 2018-01-22 | In Misc Nginx InstallationBasic Info OS: CentOS: 7.3 Nginx Version: 1.8.0 PCRE Version: 8.36 zlib Version: 1.2.8 OpenSSL Version: 1.0.1j Read more »
ElasticSearch Installation Posted on 2018-01-11 | In Big data ElasticSearch InstallationBasic InfoOS Version: CentOS 7.3Java Version: 1.8.0_151ElasticSearch Version: 5.5.0 Read more »
Spark SQL Configuration and Performance Tuning Posted on 2018-01-05 | In Big data Spark SQL Configuration and Performance TuningConfiguration Properties Spark SQL Performance Tuning 下面列出需要關注的屬性,以及其對應的Spark source code Spark source code: org.apache.spark.sql.internal.SQLConf.scala buildConf指出可以透過–conf傳入的屬性名稱,例如–conf spark.sql.sources.default createWithDefault指出屬性的預設值 Read more »
Spark SQL External Data Source Posted on 2017-12-28 | In Big data Spark SQL External Data SourceSpark SQL Data Sources External Data Source APIWhy 多個不同數據源使用不同的方式進行讀寫操作,非常不方便 不同讀入與寫出的格式之間的轉換 Read more »
Spark SQL Thrift Server Posted on 2017-12-25 | In Big data Spark SQL Thrift ServerIntroduce Spark SQLs當中提到的Standard Connectivity就是指Thrift Server。 Thrift Server提供JDBC/ODBC的接口,讓用戶透過JDBC/ODBC連接到Thrift Server,然後透過Spark SQL的訪問或處理數據。 Thrift Server啟動後,會啟動一個Spark SQL應用程式,所有通過JDBC/ODBC連接進來的客戶端皆共享這個Spark SQL應用程式的資源;換句話說,不同客戶端之間是共享數據的。 Compared with Spark Application spark-shell與spark-sql都是Spark Application,每次提交作業都要申請各自的資源,作業之間資源獨立。 Thrift Server無論連入多少個客戶端都是一個Spark Application,且只要申請一次資源,而且客戶端之間的數據可以共享。 Read more »
Spark SQL Built-in and UDF Functions Posted on 2017-12-24 | In Big data Spark Sql Built-in and UDF FunctionsSpark Sql中常用的函數分兩類,分別為Built-in函數與UDF函數,以下分別提出案例說明該如何使用 Read more »
Spark SQL DataFrame API Programming Posted on 2017-12-23 | In Big data Spark SQL DataFrame API Programming 此文章舉出多個常用的DataFrame API例子 DataFrame常用API皆在org.apache.spark.sql的Dataset.scala和functions.scala內,可以透過觀看相關程式源碼來熟悉DataFrame Read more »
Cuckoo Network Analysis Failure Posted on 2017-12-23 | In Security Cuckoo Network Analysis FailureBaisc Info: Error Message: CuckooOperationalError: Error running tcpdump to sniff the network traffic during the analysis OS: Ubuntu 16.04 LST Cuckoo: 2.0.5 Python Version: 2.7 Read more »
Cuckoo Installation Posted on 2017-12-22 | In Security Cuckoo InstallationBaisc Info Host OS: Ubuntu 16.04 LST Guest OS: Windows 7 x64 Professional SP1 Cuckoo: 2.0.5 Virtualization software: VirtualBox 5.0.40 Python Version: 2.7.14 Python Virtual Environment: Conda 本文章中所有的Host OS pip安裝皆在Conda虛擬環境中執行,以便於Python library管理 不使用虛擬環境(Conda/Virtualenv),不影響Cuckoo的安裝與使用 Read more »