了解详情 >

lose weight 💪🏻

小舟從此逝 江海寄餘生🧘 is inputting

If you really want to do it you can! Just keep on trying your best and practice whenever you have time.

Diligence is not a race against time, but continuous, dripping water wears through the rock.

Set your mind to it and you can do it !

Plan Time Topic Level2
1. 9:00~10:00 Career SparkSQL
Work BI
Real-Time Data Warehouse / Flink
1. Covid Covid Covid
1. 8:10~9:00 data warehouse SQL / BI / Spark
2. 9:10~10:00 project
1. 6:30~7:30 English 1.1 IELTS Writing (Morning)
1.2 EF English (晚上)
2. 7:40~8:10 猴子SQL 2.1 SQL Cartesian product /kɑːˈtiːzɪən,kɑːˈtiːʒ(ə)n/
3. 8:20~8:50 2022 leetcode 3.1 binary-search
3.2 dfs + stack
3.3 dynamic programming
3.4 sliding window & hash
4. 9:00~9:50 spark basic 4.1 mr vs spark (4)
4.2 rdd / dataframe / dataset
4.3 rdd operations - transformation + action
4.4 cache + persist
4.5 spark join

1. SQL

No. Question Answer
1. ✅SQL:查找重复数据? group by 列名 having count(列名) > n
2. ✅SQL:如何查找第N高的数据? limit 1, n
3. ✅SQL:查找不在表里的数据 t1 & t2 join, where t2.field = NULL
4. ✅SQL:如何比较日期数据?
197. Rising Temperature
自关联 + datediff

DATEDIFF(w1.recordDate, w2.recordDate) = 1 AND w1.Temperature > w2.Temperature;
5. ✅SQL:各科成绩均分大于80人数和占比 sum(case when 1, 0), count(b.id)
join (select avg(score) from t group by id)
6. SQL:连续出现N次的内容? 方法2: window function, lead, where
7. SQL:经典topN问题 window function: row_number() over (partition by … order by…
8. SQL:面试必备——SQL窗口函数你会了吗?


2. Data Warehouse BI


First few minutes of self-introduction

what problem is the main purpose of data warehouse to solve

for mysql and other business system data:

💘① data accesses & historically stores;
💘② cleans and processes(ETL);
💘③ effectively manages, layered construction;
💘④ fits the needs of the data of the business system;

finally, it provides a database that meets the data usage needs of business scenarios.


what are the benefits of the hierarchical design of the data warehouse

数据仓库分层是通过对数据从无序到有序,从明细到汇总,从汇总到应用的设计。 主要是为了提升数据使用效率,方便问题定位,减少重复开发,统一数据口径等问题。
Data warehouse layering is the design of data :

💘① from disorder to order
💘② from detail to summary
💘③ from summary to application.

The main purpose is to improve:

💘① the efficiency of data use (每层粒度不同,需开发一个应用层的表直接根据现有的汇总层进行开发即可)
💘② reduce repeated dev, (the granularity of each layer is different, dev a new app-table from summary-layer)
💘③ Easy to locate problems
💘④ unify data calibers and other issues.

Convenient data lineage tracking: When there is a problem with the data of the application layer table, we can quickly locate its associated table through lineage tracking, because the hierarchical structure is clear, everything can be easily tracked; if there is no hierarchy, you may think of spiders the same as the net.

  1. 数据仓库中的主题是什么?解决什么问题?

The topic of data warehouse is a way to :

classify and abstract the understanding of the business meaning and requirements of data from a high level.

for example: 💘① user topic, 💘② order topic, 💘③ evt topic, 💘④ financial topics, etc. will be generated.

The main problem to be solved is to classify data into different categories, to facilitate business use of data and to facilitate data processing by data warehouses according to data requirements;

What are the points considered in data modeling, and then randomly give you a business scenario to ask you how to design the model if you build it

You pick a project that impressed you the most and describe it and why it impressed you the most


What is the largest amount of data you have processed, and how to optimize when you encounter performance problems

The understanding of the data center, and the difference between the data warehouse and the data lake

there is little difference between the two at the practical level;
it is just that the former has higher strategic expectations at the conceptual level /kənˈsɛptʃʊəl/

data-center vs data-warehouse: The green highlight is the difference.
data-center vs data-warehouse: The green highlight is the difference.

data warehouse is mainly defined as BI; but according to the application of the real-world scenario /sɪˈnɑːrɪəʊ/,

the data warehouse is not only used for reports, it already contains user_profile and outputs business_systems.

The main process of MAPREDUCE, what is the process of SHUFFLE in MAP stage and REDUCE stage

Difference between SORT BY and ORDER BY

The difference between bucketing and PARTITION, and what are the respective mechanisms of bucketing and PARTITION

Talk about your understanding of metadata management and data asset management

What do you think are your strengths and weaknesses in this position?

Talk about your understanding of the skills required for this position, and if you do this position, what are your work ideas in the next six months?

✨1️⃣ SQL、Python
✨2️⃣ Spark、Hadoop、Hive、MMP、Flink
✨3️⃣ Data Warehouse BI (methodology dimensional modeling 、data governance etc…)
✨4️⃣ Business knowledge
✨5️⃣ Computer Basic

work ideas:

  1. platform & toosl - DDP/DAMP/US (data dev platform / data assets management platform / unified scheduler System)
  2. business model
  3. data process, data flow…

Based on your understanding of traditional data warehouses, what kind of business has real-time requirements?

3. Spark

  1. history / advantages of spark

  2. Why Spark is faster than Map Reduced

  3. Spark 数据倾斜的原理和不同场景下的解决方案是什么,MPP架构数据下的数据倾斜解决方案是什么
    What are the principles of Spark data skew and solutions in different scenarios, and what are the solutions for data skew under MPP architecture data

3.1 advanced


如何搭建 SparkSQL 离线数仓

Spark SQL

4. Project

5. Leetcode

6. Bhv

2022.06.18 Moives by Robert V.
《Passengers》2016 by Jennifer Lawrence / Chris Pratt

Pierce Brosnan Hosts the 2019 Breakthrough Prize Ceremony

<LIGHT YEARS AWAY> It is also the theme song of the movie " Space Traveler " in China, G.E.M.
Lionel Richie: 2019 Breakthrough Prize Ceremony

6. Youtube

Day in the Life of a Tencent Working