Using Perl and Regular Expressions to Process HTML Files - Part 4
In Part 3 we wrote a script (script2.pl) that enabled us to enter filenames at the command prompt:
c:>perl script2.pl file1.htm file2.htm file3.htm
Although this script enables us to process as many files as we want to, the drawback is that all the filenames need to be manually typed in. This is fine if you only want to process a few files, but if you've got hundreds or thousands to process, this approach would not be feasible.
Note: Due to display considerations, in the example code shown in this article, square brackets '[..]' are used in HTML/script tags instead of angle brackets ''.
script2.pl
1 foreach $file (@ARGV) {
2 rename $file, "$file.bak";
3 open (IN, "$file");
5 while ($line = [IN]) {
6 $line =~ s/[h1]/[h1 class="big"]/;
7 (print OUT $line);
8 }
9 close IN;
10 close OUT;
11 }
In script2.pl, it's line 1 that enables us to enter filenames at the command prompt. script3.pl, which is listed below, provides us with a way to process all the HTML files (that have a .htm extension) in the current directory/folder. This is the directory where all the files to be processed, and the script itself, are located.
script3.pl
1 opendir(DIR, ".") or die "can't opendir: $!";
2 @allfiles = grep (/.htm$/i, readdir DIR);
3 closedir(DIR);
4 foreach $name (@allfiles) {
5 rename $file, "$file.bak";
6 open (IN, "$file");
8 while ($line = [IN]) {
9 $line =~ s/[h1]/[h1 class="big"]/;
10 (print OUT $line);
11 }
12 close IN;
13 close OUT;
14 }
The only difference between script2.pl and script3.pl is the first few lines. Let's look at the new lines in script3.pl.
Line 1
Opens the current directory (signified by a dot ".") for processing. It is given a directory handle of DIR. If the directory cannot be opened, an error message is displayed.
Line 2
This line reads in all the ,htm files in the directory, and puts them in an array called @allfiles. In Perl, a '@' indicates an array, and a '$' indicates a variable. A variable stores a single value, whereas an array stores a list of values.
grep is a search command from the UNIX world.
Note that there should be a backslash character directly before the '.htm', but it isn't being displayed.
Line 3
This line closes the DIR directory handle.
Running the script
c:>perl script3.pl
In Part 5 we'll look at how to read in specific files from specific directories.
Related Tags: html, perl, regular, expression, expressions, convert, conversion, process, processing
About the Author: John Dixon is a web developer and technical author. These days, John spends most of his time developing dynamic database-driven websites using PHP and MySQL.
Go to http://www.computernostalgia.net to view one of John's sites. This site contains articles and photos relating to the history of the computer.
To find out more about John's work, go to http://www.dixondevelopment.co.uk.
Recent articles in this category:
- Improper Way Of Marketing Reflects Poorly On A Company.
New business, product or service everything requires visibility, awareness in order to come into the - Replacing Paper Prints With Online Versions
Nowadays saving out on resources and being additionally informative are both aspects that are in. In - Stop Smoking Effectively
If I told you of a way that you could stop smoking harmful tobacco would you believe it? Most people - What Is Runtime Error 182? And How To Fix It
Are you finding an effective way to fix runtime error 182? Do you think fixing runtime error 182 is - Do You Know How To Fix Runtime Error 87 In Minutes?
Are you finding an effective way to fix runtime error 87? Do you think fixing runtime error 87 is to - Knowledge About Avi, Avi Player, Avi Converter On Mac
Knowledge about AVI, AVI player, AVI Converter on MacWhat is an AVI?AVI, an acronym for Audio Video - Buy Your Highly Successful Email Survey Software Today
Email Survey Software- Boost Your Business and Increase ProfitsAn email survey software could be one - Xrm - The Anything Relationship Management Solution
I recently attended the Microsoft Dynamics West Region FY11 Sales Planning Retreat. This year's meet - What Are The Benefits Of Working With Electronic Medical Records
Recording medical information is a vital part of health care services. These records are necessary f - Basic Factor To Make Website Business Oriented
Internet is home for millions of websites. The online business is becoming more and more competitive
Most viewed articles in this category:
- Parental Control Software
Parental control software is software that can help parents protect their children when they are onl - Digital Asset Management Software
Managing and organizing your organization's documents is a critical component to your business's suc - AdobeRGB vs. sRGB
Understanding color spaces I'll try to explain it very simplified, but understandable for everyone - Confessions of a Prankster
I wanted to get a jump on April Fool's Day, partially because of the long, cold winter blues, and pa - Malicious Thoughts About The Spyware Ills Of My PC
Who would think I was capable of such revengeful thoughts about the parties responsible for inflicti - Recover File and Recover Deleted File Tools
Data recovery software is a very effective way of retrieving data from a worn or damaged hard disk d - Life without Windows
Ubuntu, a user-friendly version of Linux, has been running so nicely on my home PC that I decided to - What Benefit Does an Online Software Download Site Offer You?
Are you having a problem that where you find a good softeware when you consider to have a try or wan - Maintaining A Website
There was an era when people were talking about how to create a website using html coding or some ea - Benefits Of Proper Time Tracking
Have you ever written down time when you have started and finished your work? Maybe you have had mul