DataX & DataWeb 安装记录

DataX & DataX-Web 安装记录

简介

DataX 是一个异构数据源离线同步工具,致力于实现包括关系型数据库 (MySQL、Oracle 等)、HDFS、Hive、ODPS、HBase、FTP 等各种异构数据源之间稳定高效的数据同步功能。

DataX 本身作为离线数据同步框架,采用 Framework +
plugin 架构构建。将数据源读取和写入抽象成为 Reader/Writer 插件,纳入到整个同步框架中。

image.png

Reader: Reader 为数据采集模块,负责采集数据源的数据,将数据发送给 Framework。
Writer: Writer 为数据写入模块,负责不断向 Framework 取数据,并将数据写入到目的端。
Framework: Framework 用于连接 reader 和 writer,作为两者的数据传输通道,并处理缓冲,流控,并发,数据转换等核心技术问题。

  1. DataX 完成单个数据同步的作业,我们称之为 Job,DataX 接受到一个 Job 之后,将启动一个进程来完成整个作业同步过程。DataX Job 模块是单个作业的中枢管理节点,承担了数据清理、子任务切分 (将单一作业计算转化为多个子 Task)、TaskGroup 管理等功能。
  2. DataXJob 启动后,会根据不同的源端切分策略,将 Job 切分成多个小的 Task (子任务),以便于并发执行。Task 便是 DataX 作业的最小单元,每一个 Task 都会负责一部分数据的同步工作。
  3. 切分多个 Task 之后,DataX Job 会调用 Scheduler 模块,根据配置的并发数据量,将拆分成的 Task 重新组合,组装成 TaskGroup (任务组)。每一个 TaskGroup 负责以一定的并发运行完毕分配好的所有 Task,默认单个任务组的并发数量为5。
  4. 每一个 Task 都由 TaskGroup 负责启动,Task 启动后,会固定启动 Reader—>Channel—>Writer 的线程来完成任务同步工作。
  5. DataX 作业运行起来之后,Job 监控并等待多个 TaskGroup 模块任务完成,等待所有 TaskGroup 任务完成后 Job 成功退出。否则,异常退出,进程退出值非 0DataX 调度流程。
  6. 举例来说,用户提交了一个 DataX 作业,并且配置了 20 个并发,目的是将一个 100 张分表的 mysql 数据同步到 odps 里面。DataX 的调度决策思路是:DataXJob 根据分库分表切分成了 100 个 Task。根据 20 个并发,DataX 计算共需要分配 4 个 TaskGroup。4 个 TaskGroup 平分切分好的 100 个 Task,每一个 TaskGroup 负责以5个并发共计运行25个 Task。

安装 DataX

准备

  1. JDK 1.8.201 环境以上
  2. Python 2.7 (Python3 也可以但是需要修改部分文件)
  3. MySQL 数据库

安装

解压到指定目录 -C 后面的参数为想要解压到的目录

1
tar -zxvf datax.tar.gz -C /home/centos/

自检

1
2
cd /home/centos/datax/bin
./datax.py ../job/job.json

image.png

安装成功。

安装 DataX-Web

准备

安装

  1. 解压到指定目录
    1
    tar -zxvf datax-web-2.1.2.tar.gz -C /home/centos/
    1. 进入解压后的目录,找到bin目录下面的install.sh文件,如果选择交互式的安装,则直接执行
      1
      ./bin/install.sh

在交互过程中,如果服务器上存在 MySQL 则会出现填写 MySQL 信息的交互语句;如果服务器上没有安装 MySQL 则不显示。

在交互模式下,对各个模块的 package 压缩包的解压以及 configure 配置脚本的调用,都会请求用户确认, 可根据提示查看是否安装成功,如果没有安装成功,可以重复尝试;

如果不想使用交互模式(推荐),跳过确认过程,则执行以下命令安装

1
./bin/install.sh --force

使用强制安装模式的话需要的把 datax-web-2.1.2/bin/db 目录下的 datax_web.sql 文件在数据库中执行,创建一个 DataX_Web 使用的数据库,并在此数据库下运行这个 SQL 文件。下面是这个 SQL 文件的内容:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
/*
Navicat Premium Data Transfer

Source Server : localhost
Source Server Type : MySQL
Source Server Version : 50725
Source Host : localhost:3306
Source Schema : datax_web

Target Server Type : MySQL
Target Server Version : 50725
File Encoding : 65001

Date: 15/12/2019 22:27:10
*/

SET NAMES utf8mb4;
SET FOREIGN_KEY_CHECKS = 0;

-- ----------------------------
-- Table structure for job_group
-- ----------------------------
DROP TABLE IF EXISTS `job_group`;
CREATE TABLE `job_group` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`app_name` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '执行器AppName',
`title` varchar(12) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '执行器名称',
`order` int(11) NOT NULL DEFAULT 0 COMMENT '排序',
`address_type` tinyint(4) NOT NULL DEFAULT 0 COMMENT '执行器地址类型:0=自动注册、1=手动录入',
`address_list` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器地址列表,多地址逗号分隔',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 2 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of job_group
-- ----------------------------
INSERT INTO `job_group` VALUES (1, 'datax-executor', 'datax执行器', 1, 0, NULL);

-- ----------------------------
-- Table structure for job_info
-- ----------------------------
DROP TABLE IF EXISTS `job_info`;
CREATE TABLE `job_info` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_group` int(11) NOT NULL COMMENT '执行器主键ID',
`job_cron` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '任务执行CRON',
`job_desc` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`add_time` datetime(0) NULL DEFAULT NULL,
`update_time` datetime(0) NULL DEFAULT NULL,
`author` varchar(64) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '作者',
`alarm_email` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '报警邮件',
`executor_route_strategy` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器路由策略',
`executor_handler` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务handler',
`executor_param` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务参数',
`executor_block_strategy` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '阻塞处理策略',
`executor_timeout` int(11) NOT NULL DEFAULT 0 COMMENT '任务执行超时时间,单位秒',
`executor_fail_retry_count` int(11) NOT NULL DEFAULT 0 COMMENT '失败重试次数',
`glue_type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'GLUE类型',
`glue_source` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL COMMENT 'GLUE源代码',
`glue_remark` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'GLUE备注',
`glue_updatetime` datetime(0) NULL DEFAULT NULL COMMENT 'GLUE更新时间',
`child_jobid` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '子任务ID,多个逗号分隔',
`trigger_status` tinyint(4) NOT NULL DEFAULT 0 COMMENT '调度状态:0-停止,1-运行',
`trigger_last_time` bigint(13) NOT NULL DEFAULT 0 COMMENT '上次调度时间',
`trigger_next_time` bigint(13) NOT NULL DEFAULT 0 COMMENT '下次调度时间',
`job_json` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL COMMENT 'datax运行脚本',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 7 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;


-- ----------------------------
-- Table structure for job_jdbc_datasource
-- ----------------------------
DROP TABLE IF EXISTS `job_jdbc_datasource`;
CREATE TABLE `job_jdbc_datasource` (
`id` bigint(20) NOT NULL AUTO_INCREMENT COMMENT '自增主键',
`datasource_name` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '数据源名称',
`datasource_group` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT 'Default' COMMENT '数据源分组',
`jdbc_username` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '用户名',
`jdbc_password` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '密码',
`jdbc_url` varchar(500) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'jdbc url',
`jdbc_driver_class` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'jdbc驱动类',
`status` tinyint(1) NOT NULL DEFAULT 1 COMMENT '状态:0删除 1启用 2禁用',
`create_by` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '创建人',
`create_date` datetime(0) NULL DEFAULT CURRENT_TIMESTAMP(0) COMMENT '创建时间',
`update_by` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '更新人',
`update_date` datetime(0) NULL DEFAULT NULL COMMENT '更新时间',
`comments` varchar(1000) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '备注',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 6 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci COMMENT = 'jdbc数据源配置' ROW_FORMAT = Dynamic;


-- ----------------------------
-- Table structure for job_lock
-- ----------------------------
DROP TABLE IF EXISTS `job_lock`;
CREATE TABLE `job_lock` (
`lock_name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '锁名称',
PRIMARY KEY (`lock_name`) USING BTREE
) ENGINE = InnoDB CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of job_lock
-- ----------------------------
INSERT INTO `job_lock` VALUES ('schedule_lock');

-- ----------------------------
-- Table structure for job_log
-- ----------------------------
DROP TABLE IF EXISTS `job_log`;
CREATE TABLE `job_log` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`job_group` int(11) NOT NULL COMMENT '执行器主键ID',
`job_id` int(11) NOT NULL COMMENT '任务,主键ID',
`job_desc` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`executor_address` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器地址,本次执行的地址',
`executor_handler` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务handler',
`executor_param` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务参数',
`executor_sharding_param` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务分片参数,格式如 1/2',
`executor_fail_retry_count` int(11) NULL DEFAULT 0 COMMENT '失败重试次数',
`trigger_time` datetime(0) NULL DEFAULT NULL COMMENT '调度-时间',
`trigger_code` int(11) NOT NULL COMMENT '调度-结果',
`trigger_msg` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL COMMENT '调度-日志',
`handle_time` datetime(0) NULL DEFAULT NULL COMMENT '执行-时间',
`handle_code` int(11) NOT NULL COMMENT '执行-状态',
`handle_msg` text CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL COMMENT '执行-日志',
`alarm_status` tinyint(4) NOT NULL DEFAULT 0 COMMENT '告警状态:0-默认、1-无需告警、2-告警成功、3-告警失败',
`process_id` varchar(20) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'datax进程Id',
`max_id` bigint(20) NULL DEFAULT NULL COMMENT '增量表max id',
PRIMARY KEY (`id`) USING BTREE,
INDEX `I_trigger_time`(`trigger_time`) USING BTREE,
INDEX `I_handle_code`(`handle_code`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 0 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Table structure for job_log_report
-- ----------------------------
DROP TABLE IF EXISTS `job_log_report`;
CREATE TABLE `job_log_report` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`trigger_day` datetime(0) NULL DEFAULT NULL COMMENT '调度-时间',
`running_count` int(11) NOT NULL DEFAULT 0 COMMENT '运行中-日志数量',
`suc_count` int(11) NOT NULL DEFAULT 0 COMMENT '执行成功-日志数量',
`fail_count` int(11) NOT NULL DEFAULT 0 COMMENT '执行失败-日志数量',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `i_trigger_day`(`trigger_day`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 28 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of job_log_report
-- ----------------------------
INSERT INTO `job_log_report` VALUES (20, '2019-12-07 00:00:00', 0, 0, 0);
INSERT INTO `job_log_report` VALUES (21, '2019-12-10 00:00:00', 77, 52, 23);
INSERT INTO `job_log_report` VALUES (22, '2019-12-11 00:00:00', 9, 2, 11);
INSERT INTO `job_log_report` VALUES (23, '2019-12-13 00:00:00', 9, 48, 74);
INSERT INTO `job_log_report` VALUES (24, '2019-12-12 00:00:00', 10, 8, 30);
INSERT INTO `job_log_report` VALUES (25, '2019-12-14 00:00:00', 78, 45, 66);
INSERT INTO `job_log_report` VALUES (26, '2019-12-15 00:00:00', 24, 76, 9);
INSERT INTO `job_log_report` VALUES (27, '2019-12-16 00:00:00', 23, 85, 10);

-- ----------------------------
-- Table structure for job_logglue
-- ----------------------------
DROP TABLE IF EXISTS `job_logglue`;
CREATE TABLE `job_logglue` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_id` int(11) NOT NULL COMMENT '任务,主键ID',
`glue_type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'GLUE类型',
`glue_source` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL COMMENT 'GLUE源代码',
`glue_remark` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'GLUE备注',
`add_time` datetime(0) NULL DEFAULT NULL,
`update_time` datetime(0) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Table structure for job_registry
-- ----------------------------
DROP TABLE IF EXISTS `job_registry`;
CREATE TABLE `job_registry` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`registry_group` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`registry_key` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`registry_value` varchar(191) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`update_time` datetime(0) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE,
INDEX `i_g_k_v`(`registry_group`, `registry_key`, `registry_value`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 26 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;



-- ----------------------------
-- Table structure for job_user
-- ----------------------------
DROP TABLE IF EXISTS `job_user`;
CREATE TABLE `job_user` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`username` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '账号',
`password` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '密码',
`role` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '角色:0-普通用户、1-管理员',
`permission` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '权限:执行器ID列表,多个逗号分割',
PRIMARY KEY (`id`) USING BTREE,
UNIQUE INDEX `i_username`(`username`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 10 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

-- ----------------------------
-- Records of job_user
-- ----------------------------
INSERT INTO `job_user` VALUES (1, 'admin', '$2a$10$2KCqRbra0Yn2TwvkZxtfLuWuUP5KyCWsljO/ci5pLD27pqR3TV1vy', 'ROLE_ADMIN', NULL);



/**
v2.1.1脚本更新
*/
ALTER TABLE `job_info`
ADD COLUMN `replace_param` VARCHAR(100) NULL DEFAULT NULL COMMENT '动态参数' AFTER `job_json`,
ADD COLUMN `jvm_param` VARCHAR(200) NULL DEFAULT NULL COMMENT 'jvm参数' AFTER `replace_param`,
ADD COLUMN `time_offset` INT(11) NULL DEFAULT '0'COMMENT '时间偏移量' AFTER `jvm_param`;
/**
增量改版脚本更新
*/
ALTER TABLE `job_info` DROP COLUMN `time_offset`;
ALTER TABLE `job_info`
ADD COLUMN `inc_start_time` DATETIME NULL DEFAULT NULL COMMENT '增量初始时间' AFTER `jvm_param`;

-- ----------------------------
-- Table structure for job_template
-- ----------------------------
DROP TABLE IF EXISTS `job_template`;
CREATE TABLE `job_template` (
`id` int(11) NOT NULL AUTO_INCREMENT,
`job_group` int(11) NOT NULL COMMENT '执行器主键ID',
`job_cron` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '任务执行CRON',
`job_desc` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL,
`add_time` datetime(0) NULL DEFAULT NULL,
`update_time` datetime(0) NULL DEFAULT NULL,
`user_id` int(11) NOT NULL COMMENT '修改用户',
`alarm_email` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '报警邮件',
`executor_route_strategy` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器路由策略',
`executor_handler` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器任务handler',
`executor_param` varchar(512) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '执行器参数',
`executor_block_strategy` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '阻塞处理策略',
`executor_timeout` int(11) NOT NULL DEFAULT 0 COMMENT '任务执行超时时间,单位秒',
`executor_fail_retry_count` int(11) NOT NULL DEFAULT 0 COMMENT '失败重试次数',
`glue_type` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT 'GLUE类型',
`glue_source` mediumtext CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL COMMENT 'GLUE源代码',
`glue_remark` varchar(128) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'GLUE备注',
`glue_updatetime` datetime(0) NULL DEFAULT NULL COMMENT 'GLUE更新时间',
`child_jobid` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '子任务ID,多个逗号分隔',
`trigger_last_time` bigint(13) NOT NULL DEFAULT 0 COMMENT '上次调度时间',
`trigger_next_time` bigint(13) NOT NULL DEFAULT 0 COMMENT '下次调度时间',
`job_json` text CHARACTER SET utf8 COLLATE utf8_general_ci NULL COMMENT 'datax运行脚本',
`jvm_param` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'jvm参数',
`project_id` int(11) NULL DEFAULT NULL COMMENT '所属项目Id',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 22 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;

/**
添加数据源字段
*/
ALTER TABLE `job_jdbc_datasource`
ADD COLUMN `datasource` VARCHAR(45) NOT NULL COMMENT '数据源' AFTER `datasource_name`;

/**
添加分区字段
*/
ALTER TABLE `job_info`
ADD COLUMN `partition_info` VARCHAR(100) NULL DEFAULT NULL COMMENT '分区信息' AFTER `inc_start_time`;

/**
2.1.1版本新增----------------------------------------------------------------------------------------------
*/

/**
最近一次执行状态
*/
ALTER TABLE `job_info`
ADD COLUMN `last_handle_code` INT(11) NULL DEFAULT '0' COMMENT '最近一次执行状态' AFTER `partition_info`;

/**
zookeeper地址
*/
ALTER TABLE `job_jdbc_datasource`
ADD COLUMN `zk_adress` VARCHAR(200) NULL DEFAULT NULL AFTER `jdbc_driver_class`;

ALTER TABLE `job_info`
CHANGE COLUMN `executor_timeout` `executor_timeout` INT(11) NOT NULL DEFAULT '0' COMMENT '任务执行超时时间,单位分钟' ;

/**
用户名密码改为非必填
*/
ALTER TABLE `job_jdbc_datasource`
CHANGE COLUMN `jdbc_username` `jdbc_username` VARCHAR(100) CHARACTER SET 'utf8mb4' NULL DEFAULT NULL COMMENT '用户名' ,
CHANGE COLUMN `jdbc_password` `jdbc_password` VARCHAR(200) CHARACTER SET 'utf8mb4' NULL DEFAULT NULL COMMENT '密码' ;
/**
添加mongodb数据库名字段
*/
ALTER TABLE `job_jdbc_datasource`
ADD COLUMN `database_name` VARCHAR(45) NULL DEFAULT NULL COMMENT '数据库名' AFTER `datasource_group`;
/**
添加执行器资源字段
*/
ALTER TABLE `job_registry`
ADD COLUMN `cpu_usage` DOUBLE NULL AFTER `registry_value`,
ADD COLUMN `memory_usage` DOUBLE NULL AFTER `cpu_usage`,
ADD COLUMN `load_average` DOUBLE NULL AFTER `memory_usage`;

-- ----------------------------
-- Table structure for job_permission
-- ----------------------------
DROP TABLE IF EXISTS `job_permission`;
CREATE TABLE `job_permission` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT '主键',
`name` varchar(50) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NOT NULL COMMENT '权限名',
`description` varchar(11) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT '权限描述',
`url` varchar(255) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`pid` int(11) NULL DEFAULT NULL,
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 3 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;



ALTER TABLE `job_info`
ADD COLUMN `replace_param_type` varchar(255) NULL COMMENT '增量时间格式' AFTER `last_handle_code`;


ALTER TABLE `job_info`
ADD COLUMN `project_id` int(11) NULL COMMENT '所属项目id' AFTER `job_desc`;

ALTER TABLE `job_info`
ADD COLUMN `reader_table` VARCHAR(255) NULL COMMENT 'reader表名称' AFTER `replace_param_type`,
ADD COLUMN `primary_key` VARCHAR(50) NULL COMMENT '增量表主键' AFTER `reader_table`,
ADD COLUMN `inc_start_id` VARCHAR(20) NULL COMMENT '增量初始id' AFTER `primary_key`,
ADD COLUMN `increment_type` TINYINT(4) NULL COMMENT '增量类型' AFTER `inc_start_id`,
ADD COLUMN `datasource_id` BIGINT(11) NULL COMMENT '数据源id' AFTER `increment_type`;

CREATE TABLE `job_project` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'key',
`name` varchar(100) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL COMMENT 'project name',
`description` varchar(200) CHARACTER SET utf8mb4 COLLATE utf8mb4_general_ci NULL DEFAULT NULL,
`user_id` int(11) NULL DEFAULT NULL COMMENT 'creator id',
`flag` tinyint(4) NULL DEFAULT 1 COMMENT '0 not available, 1 available',
`create_time` datetime(0) NULL DEFAULT CURRENT_TIMESTAMP(0) COMMENT 'create time',
`update_time` datetime(0) NULL DEFAULT CURRENT_TIMESTAMP(0) COMMENT 'update time',
PRIMARY KEY (`id`) USING BTREE
) ENGINE = InnoDB AUTO_INCREMENT = 1 CHARACTER SET = utf8mb4 COLLATE = utf8mb4_general_ci ROW_FORMAT = Dynamic;


ALTER TABLE `job_info`
CHANGE COLUMN `author` `user_id` INT(11) NOT NULL COMMENT '修改用户' ;

ALTER TABLE `job_info`
CHANGE COLUMN `increment_type` `increment_type` TINYINT(4) NULL DEFAULT 0 COMMENT '增量类型' ;

配置

  1. 修改文件 datax-web/modules/datax-admin/conf/application.yml,修改网页端口号、用户名、密码以及数据库连接地址和数据库名。
1
2
3
4
5
6
7
8
server:
port: 8020 # 网页端口号
spring:
datasource:
username: xxxxx # 数据库用户名
password: xxxxxxxx # 数据库密码
url: jdbc:mysql://192.169.54.xxx:xxxx/datax_web?serverTimezone=Asia/Shanghai&useLegacyDatetimeCode=false&useSSL=false&nullNamePatternMatchesAll=true&useUnicode=true&characterEncoding=UTF-8
driver-class-name: com.mysql.jdbc.Driver
  1. 修改 datax-web/modules/datax-admin/conf/bootstrap.properties
1
2
3
4
5
DB_HOST=192.169.54.xxx # 数据库地址
DB_PORT=xxxx # 数据库端口
DB_USERNAME=xxxxx # 数据库用户名
DB_PASSWORD=xxxxxxxx # 数据库密码
DB_DATABASE=datax_web # 库名
  1. 修改 datax-web/modules/datax-admin/bin/env.properties
1
2
3
4
5
WEB_LOG_PATH=${BIN}/../logs
WEB_CONF_PATH=${BIN}/../conf

DATA_PATH=${BIN}/../data
SERVER_PORT=8020 # 网页端口号
  1. 修改datax-web/modules/datax-executor/conf/application.yml
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
server:
port: 8022 # 网页端口号

logging:
config: classpath:logback.xml
path: ${data.path}/applogs/executor/jobhandler # 日志路径

datax:
job:
admin:
addresses: http://127.0.0.1:8020 # 网页地址端口
executor:
appname: datax-executor
ip:
port: 9999 # 执行器端口号,多个executor的话需要配置不同的端口号
logpath: ${data.path}/applogs/executor/jobhandler
logretentiondays: 30
accessToken:

executor:
jsonpath: /opt/softwares/datax/script # 脚本文件存储地址

pypath: /opt/softwares/datax/bin/datax.py # DataX安装路径
  1. 修改 datax-web/modules/datax-executor/bin/env.properties
1
2
3
4
5
6
7
8
9
10
11
12
13
SERVICE_LOG_PATH=${BIN}/../logs
SERVICE_CONF_PATH=${BIN}/../conf
DATA_PATH=${BIN}/../data

JSON_PATH=${BIN}/../json

EXECUTOR_PORT=9999 # 执行器端口号

DATAX_ADMIN_PORT=8020 # 网页端口号

PYTHON_PATH=/opt/softwares/datax/bin/datax.py # DataX安装路径

SERVER_PORT=9504 # 服务端口号

启动

1
2
3
4
5
6
7
8
cd /home/centos/datax-web/bin

# 同时启动控制器和执行器
./start-all.sh

# 如果想单独启动,则使用命令
./start.sh -m datax-admin
./start.sh -m datax-executor

注意:datax-admin 启动较慢,如果 datax-executor 先启动了但是无法获取与尚未完全启动的 datax-admin 通讯可能会报错,但无需处理。datax-executor 每会隔三十秒重新通讯直至成功。如果不想看到报错也可以分开启动,当 datax-admin 完全启动后在启动 datax-executor。

启动成功如下:
image.png

image.png

可以在 datax-web/modules/datax-admin/bin/console.outdatax-web/modules/datax-executor/bin/console.out 两个文件中查看启动控制台输出日志.

登录访问

浏览器中输入 http://ip:网页端口号/index.html ,访问 Datax-Web 网页。

  • 默认用户名:admin
  • 默认密码:123456

可以在 datax-web/modules/datax-admin/bin/console.out 中查看具体信息。

1
2
3
4
5
6
7
8
9
10
11
12
13
21:16:16.769 admin [main] INFO  o.a.c.h.Http11NioProtocol - Starting ProtocolHandler ["http-nio-8020"]
21:16:16.796 admin [main] INFO o.s.b.w.e.t.TomcatWebServer - Tomcat started on port(s): 8020 (http) with context path ''
21:16:16.799 admin [main] INFO c.w.d.a.DataXAdminApplication - Started DataXAdminApplication in 233.764 seconds (JVM running for 234.493)
21:16:18.000 admin [datax-web, admin JobScheduleHelper#scheduleThread] INFO c.w.d.a.c.t.JobScheduleHelper - >>>>>>>>> init datax-web admin scheduler success.
21:16:47.842 admin [main] INFO c.w.d.a.DataXAdminApplication - Access URLs:
----------------------------------------------------------
Local-API: http://127.0.0.1:8020/doc.html
External-API: http://192.169.54.74:8020/doc.html
web-URL: http://127.0.0.1:8020/index.html
----------------------------------------------------------
21:17:04.213 admin [http-nio-8020-exec-1] INFO o.a.c.c.C.[.[.[/] - Initializing Spring DispatcherServlet 'dispatcherServlet'
21:17:04.213 admin [http-nio-8020-exec-1] INFO o.s.w.s.DispatcherServlet - Initializing Servlet 'dispatcherServlet'
21:17:04.225 admin [http-nio-8020-exec-1] INFO o.s.w.s.DispatcherServlet - Completed initialization in 12 ms

image.png

注意:必须要 /index.html 否则无法正常访问。

参考

  1. Datax 与 Datax-Web 安装部署_dataxweb安装-CSDN博客

DataX & DataWeb 安装记录
https://hexo.leelurker.com/posts/64530
作者
LeeLurker
发布于
2024年7月8日
许可协议