Data Coding 101 – Intro To Bash – ep4 (with video)

后端存储 Data36

Data Coding in Bash – this is episode 4! Let’s continue with some basic bash best practices. Today I’ll show you 9 tips, that will make your data coding life in the command line much easier.

This week I’ll also start experimenting with video format. Feel free to feedback in comments if you liked it or not. I first suggest to watch the video, and I’ll list the tools and commands below anyway. So you can jump on your Terminal and try them after watching the vid!

Note: If you are new here, I suggest to start with these previous data coding and/or bash articles:

And here’s the video:

Do Bash + Data Coding Faster!

Bash best practice #1

Tab key:Any time when you start typing in the command line and you hit the TAB key, it automatically extends your typed-in text, so you can spare some characters.

Bash best practice #2

Up/down arrows:They help you to bring back your previous commands. So if you misspelled something or if you want to do a small modification, you don’t have to type the whole command again.

Bash best practice #3

history –» This bash tool prints your recently used commands on your terminal screen. (Pro tip: try it with grep ! Eg. history |grep 'cut' will list you all the commands you have used and contained cut .)

Bash best practice #4

CTRL + R or clear : It cleans your screen. Better for your mind, better for your eyes!

Do Bash + Data Coding Cleaner!

First install CSVKit!

sudo pip install csvkit

Note: First I’ve heard about the CSVKit bash tools in this book: Jeroen Janssen – Data Science at the Command Line .

Then try csvlook and csvstat :

Bash best practice #5

csvlook helps you to see your csv files in a much cleaner, much processable-by-humans format. Eg. here’s a short sample from our flightdelays.csv file:

cat 2007.csv |head |cut -d',' -f12,13,14,15| csvlook

Bash best practice #6

csvstat gives you back some basic statistics about your dataset. Try:

cat demo1.csv | csvstat

(Even median is there! Remember last time how hard it was to get it?)

Note: csvstat is unfortunately not so great with bigger files. So you can’t use it for the flightdelays.csv for example, because that file is way too big.

Bash best practice #7

Enter-enter-enter

This will sound dummy, but I assure you this is a real problem and this is a real solution for it. This is what data scientists do, when they use command line in real life. The problem: when you cat a file on your screen, then another one right after, it’s really hard to find the first row of the second file. The main reason is, that the prompt looks like every other line in your files. If you’ve watched the video above, you have seen, that I colored my prompt. That’s part of the solution, but to make sure I will find the first line of my second file, before the second cat I usually hit 10-15 blank enters.

Right?

Do Bash + Data Coding Smarter!

As I’ve mentionedearlier, these articles of mine are just the beginning and they give you the approach and the basic bash tools. But on the long term I suggest 2 major ways to continuously extend your knowledge.

Bash best practice #8

man –» this is a bash command to learn more about specific command line tools. Eg. try:

man cat –» and you will get into the manual of cat . It works for almost every command. ( man cut , man grep , etc…) The good thing in it is that in each manual you can find a great list for all the options for the given command.

Best practice #9

Googling + StackOverflow

I know this is something I should not even mention, but still: if you have a question, you can be sure that someone has already asked it and another one has already answered it somewhere on the internet. So just don’t forget to use Google first. Most of the answers are on a website called StackOverflow – by the way. If it’s accidentally not there, Stackoverflow is still awesome, because you can also ask questions there. There are many nice and smart people there, from whom you will get an answer fast, so don’t be shy! :wink:

A great book about Data Science at the Command Line

And eventually it’s time to mention a great book about Bash!

Jeroen Janssen – Data Science at the Command Line

As far as I know, this is the one and only book that writes about bash as a tool for data scientists. It comes with many great tips and bash best practices! It assumes that you have some initial Python and/or R knowledge, but if you don’t, I still recommend it. If you have read my Data Coding 101 articles about bash so far, it won’t cause any issue the understand the most of this nice book!

Conclusion

Today we went through some great tools to make your job in bash cleaner, faster and smarter.

Next week I’ll show you two major control flow components of bash: the if command and the while loops. (And they are even more important, as we are gonna use the same logic later in Python and R as well!)

If you don’t want to miss any of my new data contents (articles, videos, e-books, etc), subscribe to myNewsletter!

Cheers,

Tomi Mester

稿源:Data36 (源链) | 关于 | 阅读提示

本站遵循[CC BY-NC-SA 4.0]。如您有版权、意见投诉等问题,请通过eMail联系我们处理。
酷辣虫 » 后端存储 » Data Coding 101 – Intro To Bash – ep4 (with video)

喜欢 (0)or分享给?

专业 x 专注 x 聚合 x 分享 CC BY-NC-SA 4.0

使用声明 | 英豪名录

登录

忘记密码 ?

切换登录

注册