PDA

View Full Version : Creating bulk a href commands into one index.html file


Joeteck
04-21-10, 09:00 AM
Just as the title states.

I need an index.html file created with links to my tens of 1000's of HTML files for google.

Do any of your know of a program that could do this?

I would show you my source, however I don't want my website to crash your browser. I currently have the file directory turned on, so it will list all of the html files for testing purposes, and will be turned off once an Index.html has been created.

petteyg359
04-21-10, 11:27 AM
Bash script. Place a link to each file with the filename as link text.
for x in $(ls *.html); do
echo "<a href=\"$x\">$x</a>"
done

Joeteck
04-21-10, 01:18 PM
Bash script. Place a link to each file with the filename as link text.
for x in $(ls *.html); do
echo "<a href=\"$x\">$x</a>"
done


X = index.html ??

edit: how is it executed? all 180K html files need to be opened in a browser?

petteyg359
04-21-10, 02:33 PM
Copy that into a file (name it makeindex or something), and put the file in the same directory as your HTML files. If you're running Linux, run "chmod +x makeindex" (or whatever you named the file), then "./makeindex > index.html" and it will create index.html with a link to each file on subsequent lines. If you're running Windows, get Cygwin, open a cygwin prompt in the folder, and do that same chmod and ./ commands.

Joeteck
04-22-10, 10:51 AM
Copy that into a file (name it makeindex or something), and put the file in the same directory as your HTML files. If you're running Linux, run "chmod +x makeindex" (or whatever you named the file), then "./makeindex > index.html" and it will create index.html with a link to each file on subsequent lines. If you're running Windows, get Cygwin, open a cygwin prompt in the folder, and do that same chmod and ./ commands.


Dude, you lost me... Running windows..

That above code does not have index.html. How will it create that file?

petteyg359
04-22-10, 11:40 AM
If you're using Linux, you just cd to the directory with the HTML files. If you're on Windows, you need to install Cygwin and open a Cygwin Bash shell in the folder with the HTML files.


Put the code in a plain text file (extension doesn't matter, name doesn't matter, just as long as it is plain text, not Word or anything that adds formatting) in the same folder as the HTML files.

Run "chmod +x filename" where filename is the file you put the script in.

Run "./filename > index.html" where filename is the file you put the script in.

The window will look slightly different (colors and font) if you're using Cygwin. You'll have manually remove the index.html link from index.html, I forgot to check for that in the script.

Quigsby
04-22-10, 12:13 PM
Just as the title states.

I need an index.html file created with links to my tens of 1000's of HTML files for google.

Do any of your know of a program that could do this?

I would show you my source, however I don't want my website to crash your browser. I currently have the file directory turned on, so it will list all of the html files for testing purposes, and will be turned off once an Index.html has been created.

Do you have the html files in some folder hierarchy or all in the same/current directory? Is index.html at the root directory? Do you just want to run this one time or have it real-time?

petteyg359
04-22-10, 01:04 PM
A somewhat better script. Allows you to type "script somefile.html" to write to somefile.html. Displays the potential output and confirms you want to overwrite the file.

OUTFILE=$1
OUTDATA=""
for x in $(ls *.html); do
[[ $x != "index.html" ]] && OUTDATA=$OUTDATA"<a href=\"$x\">$x</a>\n"
done
echo -e $OUTDATA
read -p "Write to $OUTFILE? (y/N)"
[[ $REPLY == "y" || $REPLY == "Y" ]] && echo -e $OUTDATA > $OUTFILE

EDIT: Improved again. First parameter specifies output file. If no parameter is specified, defaults to output index.html. Subsequent parameters are names of files to exclude. If excluding, the output file must be specified.

OUTFILE=${1:-"index.html"}
OUTDATA=""
shift
EXCLUDES=$@
for x in $(ls *.html); do
EXCLUDE=false
for y in $EXCLUDES; do
[[ $x == $y ]] && EXCLUDE=true && break
done
[[ $EXCLUDE == false ]] && OUTDATA=$OUTDATA"<a href=\"$x\">$x</a>\n"
done
echo -e $OUTDATA
read -p "Write to "$OUTFILE"? (y/N)"
[[ $REPLY == "y" || $REPLY == "Y" ]] && echo -e $OUTDATA > $OUTFILE

Joeteck
04-22-10, 01:57 PM
A somewhat better script. Allows you to type "script somefile.html" to write to somefile.html. Displays the potential output and confirms you want to overwrite the file.

OUTFILE=$1
OUTDATA=""
echo $1
for x in $(ls *.html); do
[[ $x != "index.html" ]] && OUTDATA=$OUTDATA"<a href=\"$x\">$x</a>\n"
done
echo -e $OUTDATA
read -p "Write to $OUTFILE? (y/N)"
[[ $REPLY == "y" || $REPLY == "Y" ]] && echo -e $OUTDATA > $OUTFILE


What programming language is this?

Joeteck
04-22-10, 02:11 PM
error message. Help

Am I doing something wrong?

petteyg359
04-22-10, 02:15 PM
They're all Bash scripts.

You're mixing up the first script's instructions with the updated ones. The first script is used like "script > index.html". The second and third scripts are used like "script index.html" to write to index.html.

Joeteck
04-22-10, 02:19 PM
They're all Bash scripts.

You're mixing up the first script's instructions with the updated ones. The first script is used like "script > index.html". The second and third scripts are used like "script index.html" to write to index.html.


mixing up? Dude.. You're leading the blind! I have no freakin idea what I'm doing!

I'm just trying everything out, no idea if it will work or not..

From the looks of it, no.

I have no idea what the syntax is...

edit : the code here, is in this file name: link-html.txt

OUTFILE=${1:-"index.html"}
OUTDATA=""
shift
EXCLUDES=$@
for x in $(ls *.html); do
EXCLUDE=false
for y in $EXCLUDES; do
[[ $x == $y ]] && EXCLUDE=true && break
done
[[ $EXCLUDE == false ]] && OUTDATA=$OUTDATA"<a href=\"$x\">$x</a>\n"
done
echo -e $OUTDATA
read -p "Write to "$OUTFILE"? (y/N)"
[[ $REPLY == "y" || $REPLY == "Y" ]] && echo -e $OUTDATA > $OUTFILEI really appreciate all the help and work your doing, but I can't get it to fire off...

Joeteck
04-22-10, 02:45 PM
I found this site, to help me along, but still not working. (http://www.linuxconfig.org/Bash_scripting_Tutorial)

petteyg359
04-22-10, 03:35 PM
Could be Cygwin uses really old version of Bash. Not sure what those errors are. Works for me on a "real" Linux install :) I don't know the Windows equivalent tools to manipulate parameters and redirect output to a file, etc. Maybe you can find somebody who can translate to a .bat file for DOS.

kayson
04-22-10, 04:15 PM
If you're running a server and you have something like php or asp installed it would be very easy to write a script to do that.

Joeteck
04-22-10, 04:23 PM
Could be Cygwin uses really old version of Bash. Not sure what those errors are. Works for me on a "real" Linux install :) I don't know the Windows equivalent tools to manipulate parameters and redirect output to a file, etc. Maybe you can find somebody who can translate to a .bat file for DOS.


Maybe its too many html files...

There are over 180,000.

kayson
04-22-10, 06:54 PM
You could also do a quick C++ console program to do the same thing.

Easiest way would probably be a bat script:
http://www.computerhope.com/forhlp.htm

Quigsby
04-23-10, 08:01 AM
If you're running Windows and you can specify the folder path you could use this code:

Option Explicit

Call Main()

Sub Main()

Const strStartingFP = "C:\Temp\"

Dim objFSO
Set objFSO = CreateObject("Scripting.FileSystemObject")

Dim objFolder

If objFSO.FolderExists(strStartingFP) Then
Set objFolder = objFSO.GetFolder(strStartingFP)
Else
Set objFSO = Nothing
Exit Sub
End If

Dim objOutputFile
Set objOutputFile = objFSO.CreateTextFile(strStartingFP & "index.html",True)

Dim collFiles, File
Set collFiles = objFolder.Files
For Each File in collFiles
If Right(File.Name,5) = ".html" AND File.Name <> "index.html" Then
objOutputFile.WriteLine "<a href=" & File.Name & ">" & File.Name & "</a>"
End If
Next

objOutputFile.Close
Set collFiles = Nothing
Set objOutputFile = Nothing
Set objFolder = Nothing
Set objFSO = Nothing

End Sub

Just update strStartingFP with the file path. Save this to a file like sometext.vbs and then run it.