Space of a muser

Insight

Enjoy life, relish the moment!


Tricks of shell

Note: some comands refer to commandlinefu website

Combining find with xargs

The combination of find print0 and xargs -0 is extremely useful when dealing with multiple files which may contain space in file name.

1
2
find genomes/ -name '*.fa' -print0 | \
    xargs -0 -I{} -n1 kraken-build --add-to-library {} --db $DBNAME

Using NR==FNR in awk

Filtering the file2 based on specific fields with a given pattern list file1.

1
2
3
awk -F"\t" 'NR==FNR{a[$0];next}($5 in a)' OFS="\t" file1 file2

awk -F"\t" 'NR==FNR{a[$0];next}{if ($5 in a){$5=$1}print $1,$5}' OFS="\t" file1 file2

Concatenating several stdouts

1
2
(awk ...; paste ...)
cat <(awk...) <(sed...)

Tricks of paste

Coverting each n rows as 1 row with n cols. - indicates treating stdin as file

1
paste -d"\t" - - - -

Using next in awk

We have file1 as follows

1
2
3
4
5
6
7
8
9
10
11
12
day1
task1	success
task2	success
task3	failed
day2
task1	success
task2	failed
task3	failed
day3
task1	success
task2	success
task3	success

but we want this

1
2
3
4
5
6
7
8
9
day1	task1	success
day1	task2	success
day1	task3	failed
day2	task1	success
day2	task2	failed
day2	task3	failed
day3	task1	success
day3	task2	success
day3	task3	success

1
awk '/^day/{d=$0;next;}{print d, $1, $2}' OFS="\t" file1

rename a file from old to new name

1
mv filename.{old,new}

retrieve the external IP address

1
curl ifconfig.me

remove all files do not match given extensions from a folder

1
rm !(*.foo|*.bar|*.baz)

create a script of the last command

1
echo "!!" > last.sh

reuse last command

1
2
3
4
5
6
7
8
9
10
# all parameters
!*
# last parameters
!$
# nth parameters
!:n
# reuse but replace the matched part with new string, usually to correct the typo
^foo^boo
# the last command without arguments
!:-

Backticks (``) are evil

1
2
3
4
5
6
7
echo "The date is: $(date +%D)"

# This is a simple example of using proper command nesting using $() over ``. There are a number of advantages of $() over backticks. First, they can be easily nested without escapes:
program1 $(program2 $(program3 $(program4)))
#versus
program1 `program2 \`program3 \`program4\`\``
# Second, they're easier to read, then trying to decipher the difference between the backtick and the singlequote: `'. The only drawback $() suffers from is lack of total portability. If your script must be portable to the archaic Bourne shell, or old versions of the C-shell or Korn shell, then backticks are appropriate, otherwise, we should all get into the habit of $(). Your future script maintainers will thank you for producing cleaner code.

Display a block of text with matched range

1
awk '/start_pattern/,/stop_pattern/' file.txt

remove duplicate entries without sorting

1
2
awk '!x[$0]++' file
awk '!x[$1]++' file

Find Duplicate Files (based on size first, then MD5 hash)

1
2
3
find -not -empty -type f -printf "%s\n" | sort -rn | uniq -d | xargs -I{} -n1 find -type f -size {}c -print0 | xargs -0 md5sum | sort | uniq -w32 --all-repeated=separate

fdupes -r .

Define a quick calculator function

1
? () { echo "$*" | bc -l; }

recursively remove all empty folders

1
find . -type d -empty -delete

create a quick bak file

1
cp file.txt{,.bak}

Search recursively to find a word or phrase in certain file types, such as C code

1
find . -name "*.[ch]" -exec grep -i -H "search pharse" {} \;

Exclude multiple columns using AWK

1
awk '{$1=$3=""}1' file

list matched multiple files with pattern

1
ls mydir/pg_{cds_snp,snp_diversity,kb_cov,kb_window_snp_count}.txt

split fasta file into single seq file using AWK

1
awk '/^>/{file="seq_"++d} {print > file}' < input.fa
最近的文章

find LCA using nested dictionary in Python

于  LCA
更早的文章

R coding tricks

于  R
comments powered by Disqus