About

I am a father/sport-lover/programmer living in Suzhou,China.

I used to work in Microsoft(2014-2017), AISpeech(2017-2018), a start-up(2018) and Mobvoi(2018-2021). Now I work in Horizon robot, focusing on In-vehicle intelligent interaction. I also teach children basketball and coach an adult basketball team in spare time.

2022 blogging plan

Notes on NLP tools, Hugging face, Scapy, Rasa
Notes on Voice interface design and survey on industrial VUI platform.
Plan to learn techs about auto driving.

2021 review This year I only post two articles , both about Wenet. One is a short note to explain RPE(Relative Positional Embedding) in wenet, anthor is a very long tutorial about design and implmentation of wenet. I spent a lot time and effort on the later one. Hope this could help the advanced users and ASR learner to understand wenet and e2e ASR system implementation better. In the second half of 2022, I changed job and spend most time on voice application in vehicle and NLP technology. Still no time is spent on Deep Learning compiler. Maybe I am not so interesting in this area.

2021 blogging plan

New trends in E2E ASR.
Deep Learning compiler.

2020 review

Only 3 on Kaldi nnet3 and ivector and 2 on CTC decoding.

I write some notes on transducer and attention base ASR model, especilly on the comparision on streaming method. But it is still a draft and cloud not be put online. Maybe this year I will publish this after reading more papers and refining these notes.

Haven’t spent much time on TVM.

2020 blogging plan

List my 2020 blogging plan and will review it in the end of 2020. This helps me track my focus of every year.

Notes about Kaldi. Kaldi is a great open-source ASR tool， which contains every aspects of traditional ASR and shows consistent code style and detailed comments. Thanks to Dan and kaldi contributors for giving such a gift to the world. I mainly plan to learn 3 parts of Kaldi.
- Nnet3. Learn how a nerual netword engine is designed.
- Decoder. Now I know the basic Decoder in Kaldi. Hope to understand more.
- Ivector. This feature is magical.It helps great in some SID and ASR tasks, I want to understand this model better.
Streaming E2E ASR, CTC/RNN-t/Attetion. E2E has domainated the SOTA for all AI applications except ASR. Although it has given SOTA results on special domains or scenarios, it is hard to debug and incorparate separate LMs. Google even used RNN-t in its device ASR production.
If having more time, I hope to learn the stuffs of deep-learning system design, such as pytorch/TVM.

github / zhihu / linkedin