Introduction to Unix & Linux: Basics-1

Introduction to Unix & Linux: Basics-1
A comprehensive beginner's guide to mastering the command line, file systems, and essential Unix/Linux concepts.
The Origins: Unix and the Birth of Linux
1
1970s - Unix Created
Unix was created at Bell Labs by Ken Thompson and Dennis Ritchie. Designed as a multi-user, multitasking operating system with a powerful command line interface, Unix revolutionized computing by introducing concepts that remain fundamental today.
2
1991 - Linux Launched
Linux, a Unix-like operating system, was launched by Linus Torvalds as an open-source kernel. This free alternative to proprietary Unix systems democratized access to powerful computing tools and sparked a global collaboration movement.
3
Today - Everywhere
Today, Linux powers everything from smartphones to supercomputers, showcasing its remarkable adaptability and widespread adoption. From Android devices to the world's fastest supercomputers, Linux has become the backbone of modern computing infrastructure.
Common Terminologies
Before diving into hands-on bioinformatics analysis, understanding these fundamental terms used in Unix-based systems is essential. They form the bedrock for interacting with Linux, command-line tools, and specialized bioinformatics software.
1
OS Foundation
Unix
A family of multitasking, multi-user operating systems. Linux and macOS are notable variants. Most bioinformatics tools are developed for Unix-based platforms, ensuring consistent command behavior across systems like Linux (Ubuntu, CentOS), macOS, and HPC servers.
2
User Interface
Terminal
The application where users input commands. It provides a text-based interface to interact with the computer, passing user input to the shell without executing commands itself. Examples include the Linux Terminal, macOS Terminal, MobaXterm, and Google Colab's terminal.
3
Command Interpreter
Shell
A program that interprets user commands. It acts as a crucial translator between user input and system actions, reading commands typed in the terminal and communicating with the operating system to perform tasks. Without a shell, typed instructions would be meaningless to the computer.
4
Standard Shell
Bash
The Bourne Again SHell is the most commonly used shell interpreter. It's the default on most Linux systems and is widely adopted in bioinformatics pipelines, high-performance computing, and cloud servers. While other shells exist (zsh, sh, fish), Bash remains the industry standard.
Why Unix/Linux? The Power of the Command Line
While Graphical User Interfaces (GUIs) are intuitive for simple tasks, they become inefficient and prone to error for repetitive operations. Imagine extracting specific data from thousands of files manually – a task that would be tedious and time-consuming with a GUI.
The Unix shell, a Command-Line Interface (CLI), automates such repetitive tasks rapidly and accurately. It enables powerful scripting, combining tools into robust pipelines, handling large datasets, and interacting with remote servers and supercomputers, making it an essential skill for advanced computing.
Why Unix/Linux? The Power of the Command Line
Efficiency & Automation
Text-based shell interface enables powerful automation, scripting, and remote management, dramatically boosting productivity and allowing repetitive tasks to be completed in seconds.
Speed & Precision
Commands are terse but powerful, designed for efficiency and speed in executing complex tasks. What takes dozens of clicks in a GUI can be accomplished with a single line of code.
Case Sensitivity
A critical distinction: 'File' is not the same as 'file' in this environment, requiring precise command input. This strictness ensures accuracy and prevents ambiguous operations.
Scripting Capabilities
Shells like Bash and Zsh provide robust scripting and command chaining capabilities for advanced workflows, enabling you to automate entire processes and analyses.
Advanced Command Line Usage
The Unix shell allows you to run complex operations with just a few commands, interact seamlessly with high-performance computing servers, and write reproducible analysis scripts. Whether you're processing gigabytes of genomic data or managing hundreds of files simultaneously, the command line provides unmatched power and flexibility.
Navigating the File System: Basic Commands
Linux distributions like Ubuntu and Kali already have a terminal available. On Ubuntu, press Ctrl + Alt + T to open it quickly.
Essential Navigation Commands
The basic syntax of a command is: command -options argument. For example, ls -l Documents would list the contents of the Documents directory in a detailed long format.
pwd — Print current directory path
ls — List files and directories; options: -l (detailed), -a (all files including hidden)
cd — Change directory; special shortcuts: .. (up one level), ~ (home directory)
Most commands support a --help option to display usage information, e.g., ls --help
These commands are your compass for navigating the structured world of Unix/Linux file systems, allowing you to move through directories and inspect their contents with ease.
cd /mnt/c/Users/hp/Downloads
Understanding Files and Folders in Ubuntu
The filesystem manages files and directories, organizing data hierarchically like a tree. Files hold information, while directories (or folders) contain files or other directories.
The image above illustrates a typical directory structure, showing user home directories like "larry," "imhotep," and "ubuntu" within the home directory. The home directory itself is located at the root of the filesystem, represented by a single / (slash). This root is the top-most directory from which all other directories branch.
Understanding Your Location
When using the shell, we navigate this hierarchy. Your current location is called the current working directory. To find out where you are, use the pwd command:
pwd
/home/ubuntu
The output, such as /home/ubuntu, indicates your home directory, which is your default location when opening a new terminal. "Ubuntu" in this case is the username.
Understanding Paths
This way of specifying locations is called a path:
/ at the start denotes the root directory
home is a folder within the root
Subsequent / characters act as separators between folders
ubuntu is the final folder in this specific path
Note that the / character has two meanings: it represents the root directory when at the beginning of a path, and it acts as a separator within a path.
Getting Your Terminal Ready
To fully participate in the workshop and leverage the power of the command line, ensure your system is equipped with a functional terminal application. Follow the instructions below based on your operating system:
Windows 10/11: MobaXterm
Download the MobaXterm Portable edition.
Unzip the file and copy the folder to your Desktop.
Run MobaXterm_Personal_XX.X.exe.
Allow network access if prompted.
Start a local terminal session.
Install nano by typing apt install nano, then y twice.
Mac OS: Built-in Terminal
Open Spotlight Search (⌘ + space) and type "terminal" to launch.
Optional: For advanced features, consider installing iTerm2.
Linux: Built-in Terminal
Your Linux distribution already includes a terminal.
On Ubuntu, you can typically open it by pressing Ctrl + Alt + T.
Verify nano is installed by typing nano --version. If not, use sudo apt install nano.
Google Colab: Your Web-Based Terminal
Google Colab is a free, web-based Jupyter notebook environment that requires no setup, making it an excellent platform for learning Unix/Linux basics. It's accessible from any device with a browser, streamlining your learning experience.
Why Choose Google Colab?
✅ No Installation Required
Jump straight into coding without the hassle of software setup.
✅ Accessible Anywhere
Work from any device with internet access and a web browser.
✅ Free & Integrated
Completely free with a Google account, and your work saves directly to Google Drive.
✅ Pre-installed Tools
Comes with essential Unix/Linux tools ready to use, plus automatic code execution.
Getting Started with Google Colab
01
Create/Login to Google Account
Visit accounts.google.com/Login to create a new account or log in with your existing Gmail.
02
Access Google Colab
Open your web browser and navigate to colab.research.google.com. Sign in with your Google account credentials.
03
Create a New Notebook
From the Colab welcome screen, click "New notebook" (or go to File → New notebook). This will open a blank Jupyter notebook, ready for your commands.
04
Test Unix/Linux Commands
In the first code cell, type !echo "Hello from Colab!" (note the exclamation mark before the command). Press Ctrl + Enter to run. You should see "Hello from Colab!" as output.
05
Command Format in Colab
All Unix/Linux commands in Colab must be prefixed with an exclamation mark: ![Unix Command].
Listing Files and Changing Directories
Listing Files
To view the contents of your current directory, use the ls (listing) command:
ls

Documents    Downloads    Music        Public
Desktop      Movies       Pictures     Templates
This displays all visible files and folders in your current location, giving you an overview of what's available.
Changing Directory
The cd ("change directory") command changes your current working directory. You can specify a directory using an absolute path (starting from the root /), or a relative path (relative to your current directory).
To navigate to a specific path:
cd /content/sample_data/
We can check our current location with pwd. To move up one directory (to the parent directory), use ..:
cd ..
The shell interprets ~ (tilde) at the start of a path as your user's home directory (e.g., /home/ubuntu), making it a quick shortcut to return home.
Tab Completion: Your Time-Saving Tool
Tab completion is one of the most valuable productivity features in the Unix shell. It helps you avoid typing long file and directory names by automatically completing them for you.
01
Type Partial Name
Start typing part of a filename or directory name
02
Press Tab
Press the Tab ↹ key once
03
Auto-Complete
If the name is unique, the shell completes it automatically
04
See Options
If multiple options exist, press Tab ↹ twice to see all possibilities
Practical Example
For example, if you are in /tmp and type:
ls sample_da
then press Tab ↹, the shell automatically completes to:
ls /content/sample_data/
Tab completion not only saves time but also prevents typos and helps you discover what files and directories are available. Make it a habit to use Tab ↹ frequently—it will dramatically speed up your workflow!
Managing Files and Directories
mkdir <dir>
Create a new directory to organize your files
touch <file>
Create an empty file or update the timestamp of an existing file
cp <source> <dest>
Copy files or directories to a new location
mv <old> <new>
Move or rename files and directories
rm <file>
Remove files permanently (no recycle bin!)
Practical Example
mkdir my_project
cd my_project
touch report.txt
cp report.txt final_report.txt
mv final_report.txt /content/sample_data/bioinformatics_day1
rm report.txt
Mastering these fundamental commands is crucial for effective file management, enabling you to organize, move, copy, and delete your data efficiently within the Unix/Linux environment. These operations form the backbone of day-to-day file system interactions.
Creating Directories: Step by Step
We now know how to explore files and directories, but how do we create them in the first place? Let's walk through the process of creating and organizing directories effectively.
First, we should see where we are and what we already have. Let's go back to our data-shell directory and use ls to see what it contains:
cd /content/sample_data/
ls
anscombe.json                 mnist_test.csv           mnist_train_small.csv
california_housing_test.csv   README.md
california_housing_train.csv
Now, let's create a new directory called thesis_notes using the mkdir ("make directory") command :
mkdir bioinformatics_day1
The new directory is created in the current working directory. We can verify this with ls:
ls
anscombe.json                 mnist_test.csv
bioinformatics_day1           mnist_train_small.csv
california_housing_test.csv   README.md
california_housing_train.csv
The mkdir command is your primary tool for creating organized directory structures. You can also create nested directories using the -p flag: mkdir -p parent/child/grandchild
What's in a File Name? Understanding File Extensions
You may have noticed that all of the files in our data directory are named "something dot something". For example, README.txt, which indicates this is a plain text file.
The second part of such a name is called the filename extension, and it indicates what type of data the file holds. While Unix/Linux doesn't strictly require extensions to identify file types, they serve as helpful hints for both humans and programs.
.txt
Plain text file containing unformatted text
.csv
Text file with tabular data where columns are separated by commas
.tsv
Similar to CSV but values are separated by tabs
.log
Text file containing messages produced by software while it runs
.pdf
Portable Document Format for formatted documents
.png
Portable Network Graphics image file
Remember: In Unix/Linux, file extensions are conventions, not requirements. The system determines file type by examining the file's contents, not its name. However, using appropriate extensions makes your files more organized and easier to work with.
Moving and Renaming Files
In our data-shell directory we have a file called things.txt, which contains a note of books to read for our thesis. Let's move this file to the thesis_notes directory we created earlier, using the mv ("move") command:
Moving Files
touch things
mkdir thesis_notes
mv things.txt thesis_notes/
The first argument tells mv what we're "moving", while the second is where it's to go. In this case, we're moving things.txt to thesis_notes/. We can check the file has moved there:
ls thesis_notes

things.txt
Renaming Files
This isn't a particularly informative name for our file, so let's change it! Interestingly, we also use the mv command to change a file's name. Here's how we would do it:
mv thesis_notes/things.txt thesis_notes/books.txt
In this case, we are "moving" the file to the same place but with a different name.
Important: Be careful when specifying the target file name, since mv will silently overwrite any existing file with the same name, which could lead to data loss.
The command mv also works with directories, and you can use it to move or rename an entire directory just as you use it to move an individual file.
Removing Files and Directories
The Unix command used to remove or delete files is rm ("remove"). For example, let's remove one of the files we copied earlier:
mkdir backup
touch ref1.txt ref2.txt
rm backup/ref1.txt
We can confirm the file is gone using ls backup/.
Removing Directories
What if we try to remove the whole backup directory we created in the previous exercise?
rm backup

rm: cannot remove `backup': Is a directory
We get an error. This happens because rm by default only works on files, not directories.
The rm command can remove a directory and all its contents if we use the recursive option -r, and it will do so without any confirmation prompts:
rm -r backup
Deleting Is Forever
The Unix shell doesn't have a trash bin that we can recover deleted files from (though most graphical interfaces to Unix do). Instead, when we delete files, they are unlinked from the file system so that their storage space on disk can be recycled. Tools for finding and recovering deleted files do exist, but there's no guarantee they'll work in any particular situation, since the computer may recycle the file's disk space right away.
Given that there is no way to retrieve files deleted using the shell, rm -r should be used with great caution (you might consider adding the interactive option rm -r -i).
To remove empty directories, we can also use the rmdir command. This is a safer option than rm -r, because it will never delete the directory if it contains files, giving us a chance to check whether we really want to delete all its contents.
Wildcards: Working with Multiple Files
Wildcards are special characters that can be used to access multiple files at once, dramatically increasing your efficiency when working with many files. The most commonly-used wildcard is *, which is used to match zero or more characters.
1
*.pdb
Matches every file that ends with the '.pdb' extension
2
p*.pdb
Only matches pentane.pdb and propane.pdb, because the 'p' at the front only matches filenames that begin with the letter 'p'
The Question Mark Wildcard
Another common wildcard is ?, which matches any character exactly once. For example:
?ethane.pdb would only match methane.pdb (whereas *ethane.pdb matches both ethane.pdb and methane.pdb)
???ane.pdb matches three characters followed by ane.pdb, giving cubane.pdb ethane.pdb octane.pdb
When the shell sees a wildcard, it expands the wildcard to create a list of matching filenames before running the command that was asked for. As an exception, if a wildcard expression does not match any file, Bash will pass the expression as an argument to the command as it is. For example, typing ls *.pdf in the molecules directory (which does not contain any PDF files) results in an error message that there is no file called *.pdf.
Navigation Exercise
Starting from /home/amanda/data, which of the following commands could Amanda use to navigate to her home directory (/home/amanda)?
1
cd .
2
cd /
3
cd /home/amanda
4
cd ../..
5
cd ~
6
cd home
7
cd ~/data/..
8
cd
9
cd ..
Correct Options:
3. Yes: This is an example of using the full absolute path
5. Yes: ~ stands for the user's home directory, in this case /home/amanda
7. Yes: Unnecessarily complicated, but correct
8. Yes: Shortcut to go back to the user's home directory
9. Yes: Goes up one level
Viewing and Editing File Contents
Viewing Commands
cat <file> — Display entire file content
head <file> / tail <file> — Show first/last 10 lines by default
more <file> / less <file> — Paginate file content for easier reading
grep <pattern> <file> — Search for text patterns inside files
Editing Tools
For editing, common tools include nano (simple and beginner-friendly) and the more powerful, advanced options like vim and emacs, which are staples for experienced users.
Looking Inside Files
For example, let's take a look at the  california_housing_train.csv file in the sample_data directory.
We will start by printing the whole content of the file with the cat command, which stands for "concatenate" (we will see why it's called this way in a little while):
cd sample_data
cat california_housing_train.csv
Sometimes it is useful to look at only the top few lines of a file (especially for very large files). We can do this with the head command:
head california_housing_train.csv
Customizing File Views with Options
By default, head prints the first 10 lines of the file. We can change this using the -n option, followed by a number, for example:
head -n 2 california_housing_train.csv
"longitude","latitude","housing_median_age","total_rooms","total_bedrooms","population","households","median_income","median_house_value"
-114.310000,34.190000,15.000000,5612.000000,1283.000000,1015.000000,472.000000,1.493600,66900.000000
This displays only the first 2 lines, giving you a quick peek at the file's beginning without overwhelming your screen.
Similarly, we can look at the bottom few lines of a file with the tail command:
tail -n 2 california_housing_train.csv
-124.300000,41.800000,19.000000,2672.000000,552.000000,1298.000000,478.000000,1.979700,85800.000000
-124.350000,40.540000,52.000000,1820.000000,300.000000,806.000000,270.000000,3.014700,94600.000000
This is particularly useful when monitoring log files or checking the most recent entries in a dataset.
Interactive File Browsing
Finally, if we want to open the file and browse through it interactively, we can use the less command:
less california_housing_train.csv
less will open the file in a viewer where you can use ↑ and ↓ to move line-by-line or the Page Up and Page Down keys to move page-by-page. You can exit less by pressing Q (for "quit"). This will bring you back to the console.
The name "less" comes from the Unix philosophy that "less is more"—it's an improvement over the older more command, allowing backward navigation through files.
Counting Words, Lines, and Characters
The wc (word count) command is a powerful tool for analyzing text files. It can count lines, words, and characters in one or more files.
# Count lines, words, and characters in the sample file
!echo "Complete word count:"
!wc /content/sample_data/bioinformatics_day1/sample_data.txt
  Complete word count:
 10  44 282 /content/sample_data/bioinformatics_day1/sample_data.txt
In this case, we used the * wildcard to count lines, words, and characters (in that order, left-to-right) of all our PDB files. The output shows three columns: lines, words, and characters for each file, with a total at the bottom.
Customizing Word Count Output
Often, we only want to count one of these things, and wc has options for all of them:
-l
Counts lines only
-w
Counts words only
-c
Counts characters only
For example, the following counts only the number of lines in each file:
# Count lines only
!echo "Line count only:"
!wc -l /content/sample_data/bioinformatics_day1/sample_data.txt
Line count only:
10 /content/sample_data/bioinformatics_day1/sample_data.txt
This focused output makes it easy to quickly assess file sizes or compare the number of records across multiple files. The -l option is particularly useful when working with data files where each line represents a record or observation.
Combining and Redirecting Output
Combining Files
Earlier, we said that the cat command stands for "concatenate". This is because this command can be used to concatenate (combine) several files together. For example, if we wanted to combine all PDB files into one:
cat *.pdb
This displays the contents of all PDB files one after another in the terminal.
Redirecting Output
The commands we've been using so far print their output to the terminal. But what if we wanted to save it into a file? We can achieve this by redirecting the output of the command to a file using the > operator.
# List files and save to a file using >
!ls -l /content/sample_data/bioinformatics_day1/*.pdb > /content/sample_data/bioinformatics_day1/pdb_files.txt
!cat /content/sample_data/bioinformatics_day1/pdb_files.txt
Now, the output is not printed to the console, but instead sent to a new file. We can check that the file was created with ls.
The > operator will create a new file or overwrite an existing file. If you want to append to an existing file instead, use >>.
Sequencing Commands and Shell Scripting
Command Chaining
Unix provides multiple ways to sequence commands:
; — Sequential execution (run regardless of success)
&& — Conditional success (run next only if previous succeeds)
|| — Conditional failure (run next only if previous fails)
Pipes
Use | to pass the output of one command as input to another. For example, ls -l | grep ".txt" lists files and filters for text files.
Variables
Use $1, $@ for script inputs, allowing your scripts to accept arguments and become more flexible
Wildcards
Use * to match multiple characters in file names (e.g., rm *.log to delete all log files)
Shell Scripting
Save commands in .sh files and run them with bash script.sh for reproducible workflows
Summary and Next Steps
Powerful Environment
Unix/Linux offers a robust and flexible environment through its command line interface, providing unmatched control over your computing resources
Efficiency Unlocked
Mastering basic commands streamlines file and system management tasks, transforming hours of manual work into seconds of automated processing
Automation & Beyond
Understanding file formats and scripting opens doors to automation and advanced usage, enabling reproducible research and scalable data analysis
Continue Your Journey
Ready to dive deeper? Explore advanced shell scripting, system administration, and diverse Linux distributions. The command line skills you've learned today form the foundation for powerful computational work in bioinformatics, data science, and beyond.
Start Practicing
Explore Advanced Topics
Open your terminal and try these commands today! The best way to learn is by doing. Practice these commands regularly, experiment with different options, and don't be afraid to make mistakes—that's how you learn!
Thank You
We hope this guide has empowered you to begin your journey with Unix/Linux and the command line. Remember, every expert was once a beginner—keep practicing, stay curious, and don't hesitate to explore!
Quiz: Practice Exercise 1
Question 1:
For this exercise, make sure you are in the course materials directory:
cd ~/Desktop/data-shell
Make a copy of the sequencing directory named backup.
Hint: Think about which command you use to copy files and directories. Remember that directories require a special option!
Quiz Answer: Exercise 1
Answer:
When copying an entire directory, you will need to use the option -r with the cp command (-r means "recursive").
What Doesn't Work
If we run the command without the -r option, this is what happens:
cp sequencing backup

cp: -r not specified; omitting directory 'sequencing'
This message is already indicating what the problem is. By default, directories (and their contents) are not copied unless we specify the option -r.
The Correct Solution
This would work:
cp -r sequencing backup
Running ls we can see a new folder called backup:
ls

README.txt  backup  books_copy.txt  coronavirus  molecules  sequencing  thesis_notes
The -r (recursive) flag is essential when working with directories. It tells cp to copy the directory and all of its contents, including subdirectories.
Quiz: Practice Exercise 2
Question 2: Oral Discussion
For this exercise, make sure you are in the course materials directory:
cd ~/Desktop/data-shell
Part A
What does cp do when given several filenames and a directory name?
mkdir -p backup
cp molecules/cubane.pdb molecules/ethane.pdb backup
Part B
In the example below, what does cp do when given three or more file names?
cp molecules/cubane.pdb molecules/ethane.pdb molecules/methane.pdb/
Take a moment to think about these questions before moving to the next card with the answer. Try running these commands in your terminal to observe the behavior!
Quiz Answer: Exercise 2
Answer:
1
Part A: Multiple Files to Directory
If given more than one file name followed by a directory name (i.e., the destination directory must be the last argument), cp copies the files to the named directory. This is the standard way to copy multiple files at once.
2
Part B: Error Case
If given three file names, cp throws an error such as the one below, because it is expecting a directory name as the last argument:
cp: target 'molecules/methane.pdb' is not a directory
The command fails because cp interprets the last argument as the destination, and in this case, it's a file, not a directory.
Key Takeaway: When using cp with multiple source files, the last argument must be a directory where all the files will be copied.
Quiz: Wildcard Challenge
Question 3:
Change into the molecules directory. Which ls command(s) will produce this output?
ethane.pdb   methane.pdb
1
ls *t*ane.pdb
2
ls *t?ne.*
3
ls *t??ne.pdb
4
ls ethane.*
Remember: The * wildcard matches zero or more characters, while ? matches exactly one character.
Quiz Answer: Wildcard Challenge
Answer:
1
No
This shows all files whose names contain zero or more characters (*) followed by the letter t, then zero or more characters (*) followed by ane.pdb. This gives ethane.pdb methane.pdb octane.pdb pentane.pdb.
2
No
This shows all files whose names start with zero or more characters (*) followed by the letter t, then a single character (?), then ne. followed by zero or more characters (*). This will give us octane.pdb and pentane.pdb but doesn't match anything which ends in thane.pdb.
3
Yes ✓
This fixes the problems of option 2 by matching two characters (??) between t and ne. This correctly matches ethane.pdb and methane.pdb.
4
No
This only shows files starting with ethane., which would only match ethane.pdb, missing methane.pdb.
Quiz: Output Redirection Exercise
Question 4:
Move to the directory sequencing and complete the following tasks:
01
List and Save
List the files in the run1/ directory. Save the output in a file called sequencing_files.txt.
02
Observe Replacement
What happens to the content of that file after you run the command ls run2 > sequencing_files.txt?
03
Append Instead
The operator >> can be used to append the output of a command to an existing file. Re-run both of the previous commands, but instead use the >> operator the second time. What happens now?
Hint: Remember the difference between > (overwrite) and >> (append)!
Quiz Answer: Task 1
Answer - Task 1:
To list the files in the directory we use ls, followed by > to save the output in a file:
ls run1 > sequencing_files.txt
We can check the content of the file:
cat sequencing_files.txt

sampleA_1.fq.gz
sampleA_2.fq.gz
sampleB_1.fq.gz
sampleB_2.fq.gz
sampleC_1.fq.gz
sampleC_2.fq.gz
sampleD_1.fq.gz
sampleD_2.fq.gz
The output shows all the FASTQ files from the run1 directory, neatly saved in our text file. This demonstrates how redirection with > creates a new file and writes the command output to it.
Quiz Answer: Task 2
Answer - Task 2:
If we run ls run2/ > sequencing_files.txt, we will replace the content of the file:
cat sequencing_files.txt

sampleE_1.fq.gz
sampleE_2.fq.gz
sampleF_1.fq.gz
sampleF_2.fq.gz
Notice that the original content from run1 is completely gone! The file now only contains the files from run2.
Important: The > operator overwrites the existing file content. All previous data is lost. This is why it's crucial to understand the difference between > and >>!
Quiz Answer: Task 3
Answer - Task 3:
If we start again from the beginning, but instead use the >> operator the second time we run the command, we will append the output to the file instead of replacing it:
ls run1/ > sequencing_files.txt
ls run2/ >> sequencing_files.txt

cat sequencing_files.txt

sampleA_1.fq.gz
sampleA_2.fq.gz
sampleB_1.fq.gz
sampleB_2.fq.gz
sampleC_1.fq.gz
sampleC_2.fq.gz
sampleD_1.fq.gz
sampleD_2.fq.gz
sampleE_1.fq.gz
sampleE_2.fq.gz
sampleF_1.fq.gz
sampleF_2.fq.gz
Perfect! Now we have all files from both directories in a single list. The >> operator preserved the original content and added the new content at the end.
Quiz: Coronavirus Variants Challenge
Question 5:
In the directory coronavirus/variants/, there are several CSV files with information about SARS-CoV-2 virus samples that were classified according to clades (these are also commonly known as coronavirus variants).
01
Combine Files
Combine all files into a new file called all_countries.csv
Hint: Use wildcards and output redirection
02
Filter for Alpha
Create another file called alpha.csv that contains only the Alpha variant samples
Hint: Think about pattern searching
03
Count Alpha Samples
How many Alpha samples are there in total?
Hint: Use the appropriate counting command
Quiz Answer: Coronavirus Variants
Task 1: Combine All Files
We can use cat to combine all the files into a single file:
cat *_variants.csv > all_countries.csv
The wildcard *_variants.csv matches all CSV files ending with "_variants.csv", and the > operator saves the combined output to a new file.
Task 2: Filter for Alpha Variant
We can use grep to find a pattern in our text file and use > to save the output in a new file:
grep "Alpha" all_countries.csv > alpha.csv
We could investigate the output of our command using less alpha.csv.
Task 3: Count Alpha Samples
We can use wc to count the lines of the newly created file:
wc -l alpha.csv
Giving us 38 as the result.
This exercise demonstrates how to chain together multiple commands to perform data analysis tasks: combining files, filtering by pattern, and counting results. These are common operations in bioinformatics workflows!
Keep Learning!
Congratulations on completing this comprehensive guide to Unix and Linux! You've taken your first steps into a powerful world of command-line computing. Remember, mastery comes with practice and experimentation.
The command line is a tool that grows with you—the more you use it, the more efficient and creative you'll become. Don't be discouraged by challenges; every error message is a learning opportunity.
Practice More
Join Our Community