您好,登录后才能下订单哦!
密码登录
登录注册
点击 登录注册 即表示同意《亿速云用户服务条款》
# SQL-Hive中Select From怎么用
## 一、基础语法与概念
### 1.1 SELECT FROM语句的核心作用
SELECT FROM是HiveQL(Hive Query Language)中最基础且最常用的查询语句,主要用于:
- 从Hive表中提取特定列的数据
- 配合WHERE等子句实现条件过滤
- 作为复杂查询的基础构建块
基本语法结构:
```sql
SELECT [ALL | DISTINCT] select_expr, select_expr, ...
FROM table_reference
[WHERE where_condition]
[GROUP BY col_list]
[HAVING having_condition]
[ORDER BY col_list]
[LIMIT number];
-- 查询所有列
SELECT * FROM employees;
-- 查询特定列
SELECT employee_id, name, department
FROM employees;
-- 使用列别名
SELECT employee_id AS id,
name AS "员工姓名",
salary*12 AS annual_salary
FROM employees;
-- 获取不重复的部门列表
SELECT DISTINCT department FROM employees;
-- 多列组合去重
SELECT DISTINCT department, job_title
FROM employees;
-- 基本比较
SELECT * FROM sales
WHERE amount > 1000;
-- 日期比较(Hive需使用特定格式)
SELECT * FROM orders
WHERE order_date >= '2023-01-01';
-- NULL值判断
SELECT * FROM customers
WHERE phone_number IS NOT NULL;
-- AND/OR组合
SELECT * FROM products
WHERE category = 'Electronics'
AND price BETWEEN 500 AND 2000;
-- NOT运算符
SELECT * FROM employees
WHERE NOT department = 'HR';
-- IN运算符
SELECT * FROM students
WHERE grade IN ('A', 'B');
-- LIKE模糊匹配
SELECT * FROM products
WHERE product_name LIKE '%Pro%';
-- RLIKE正则匹配
SELECT * FROM logs
WHERE message RLIKE 'error[0-9]+';
-- 单列排序
SELECT * FROM employees
ORDER BY hire_date DESC;
-- 多列排序
SELECT * FROM sales
ORDER BY region ASC, amount DESC;
-- 返回前10条记录
SELECT * FROM large_table
LIMIT 10;
-- 分页模拟(Hive 2.0+)
SELECT * FROM table
ORDER BY id
LIMIT 10 OFFSET 20;
SELECT
COUNT(*) AS total_employees,
AVG(salary) AS avg_salary,
MAX(salary) AS max_salary,
MIN(salary) AS min_salary,
SUM(salary) AS total_payroll
FROM employees;
-- 按部门统计
SELECT
department,
COUNT(*) AS emp_count,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department;
-- 多列分组
SELECT
department,
job_level,
COUNT(*)
FROM employees
GROUP BY department, job_level;
-- 筛选分组结果
SELECT
department,
AVG(salary) AS avg_salary
FROM employees
GROUP BY department
HAVING avg_salary > 10000;
-- 查询特定分区(显著提升性能)
SELECT * FROM sales
WHERE dt='2023-01-15'
AND region='APAC';
-- 显示分区信息
SHOW PARTITIONS sales;
-- 创建分桶表示例
CREATE TABLE bucketed_users (
id INT,
name STRING
) CLUSTERED BY (id) INTO 4 BUCKETS;
-- 对分桶列使用WHERE条件
SELECT * FROM bucketed_users
WHERE id = 123;
-- 按百分比抽样
SELECT * FROM huge_table
TABLESAMPLE(10 PERCENT);
-- 按数据量抽样
SELECT * FROM huge_table
TABLESAMPLE(100 ROWS);
-- WHERE子句中的子查询
SELECT * FROM employees
WHERE salary > (
SELECT AVG(salary) FROM employees
);
-- FROM子句中的子查询
SELECT dept.avg_salary
FROM (
SELECT department, AVG(salary) AS avg_salary
FROM employees
GROUP BY department
) dept
WHERE dept.avg_salary > 8000;
WITH
dept_stats AS (
SELECT department, COUNT(*) AS emp_count
FROM employees
GROUP BY department
),
high_emp_depts AS (
SELECT department
FROM dept_stats
WHERE emp_count > 50
)
SELECT * FROM employees
WHERE department IN (
SELECT department FROM high_emp_depts
);
– 推荐 SELECT id, name, status FROM large_table;
2. **分区裁剪**:充分利用分区字段过滤
```sql
-- 高效查询
SELECT * FROM partitioned_table
WHERE year=2023 AND month=6;
合理使用并行:
SET hive.exec.parallel=true;
SET hive.exec.parallel.thread.number=8;
JOIN优化:
-- 确保大表在JOIN右侧
SELECT /*+ MAPJOIN(small) */ a.*
FROM big_table a JOIN small_table b
ON a.id = b.id;
数据倾斜:
-- 检查键值分布
SELECT key_column, COUNT(*)
FROM table
GROUP BY key_column
ORDER BY 2 DESC LIMIT 10;
内存配置:
SET hive.map.aggr.hash.percentmemory=0.5;
SET mapreduce.reduce.memory.mb=4096;
执行计划分析:
EXPLN EXTENDED
SELECT count(*) FROM large_table;
查询设计原则:
格式规范建议:
-- 良好的格式化示例
SELECT
e.employee_id,
e.name,
d.department_name
FROM
employees e
JOIN departments d ON e.dept_id = d.dept_id
WHERE
e.status = 'ACTIVE'
AND e.hire_date > '2020-01-01'
ORDER BY
e.salary DESC
LIMIT 100;
监控与调优:
通过掌握这些SELECT FROM的使用技巧,您将能够高效地从Hive中提取所需数据,为大数据分析奠定坚实基础。 “`
免责声明:本站发布的内容(图片、视频和文字)以原创、转载和分享为主,文章观点不代表本网站立场,如果涉及侵权请联系站长邮箱:is@yisu.com进行举报,并提供相关证据,一经查实,将立刻删除涉嫌侵权内容。