Thursday, February 28, 2013

sed to the rescue...

I had one of those dreaded tasks at work today - update a bazillion (okay, actually just 24, but it felt like a lot) servers with a config file that needs to be modified to reference the name of each of the servers. I either had to update each server in turn, or I could write a script to do it for me.  

I decided to write a script, and I will pretend that it took less time to write and run the script than doing it manually.  It wasn't perfect - I had to copy/paste the password for each server because I didn't want to mess around with using keygen. I also didn't want to install software like expect.  Expect is a software utility that allows you to write scripts that provide interactive programs with the parameters they might...well...expect.  In this case it would have been handy to use since it could provide scp with the user's password. The script I wrote did what I needed it to, and I'm sure I'll be able to use something like this script in the future.    

Assuming that the source file is on server01, and the source file to be updated and copied is named somefile.conf, then this is what the script looked like:

#!/bin/sh

servers="server02 server03 server04 server05 server06 server07 server08 server09 server10 server11 server12 server13 server14 server15 server16 server17 server18 server19 server20 server21 server22 server23 server24"
filepath="/some/file/path/somefile.conf"

for server in $servers
do
      echo "Working on $server."
      SEDARG="s/server01/$server/g"
      sed $SEDARG $filepath > $filepath.$server
      scp $filepath.$server someuser@$server:$filepath
      rm $filepath.$server
done
echo "Done."

Tuesday, February 19, 2013

Solr - HTMLStripCharFilter...

I am attempting to store a bit of data that I fetch from a website in Solr.  The data sometimes has HTML markup, so I decided to use the HTMLStripCharFilterFactory in the fields analyzer.

Here is an example of the field type that I created:

<fieldType name="strippedHtml" class="solr.TextField">
   <analyzer>
      <charFilter class="solr.HTMLStripCharFilterFactory"/>
      <tokenizer class="solr.WhitespaceTokenizerFactory"/>      
      <filter class="solr.LowerCaseFilterFactory" />
   </analyzer>
</fieldType>


I used the field type of strippedHtml in a field called itemDescription, and when I do a search after indexing some data I can see that the itemDescription contains data that still has HTML markup.  I used the analyzer tab in Solr to see what would happen on index of HTML data, and I could see that none of the markup appears to be stripped out.

It turns out that most of the HTML was encoded so that the angle bars are replaced with the escaped values.  I will need to find a way to remove the escaped values.

Friday, February 8, 2013

Code puzzler on DZone and optimizations...

Something that I've really enjoyed about DZone.com is that they have recurring themes for certain posts. One recurring set of posts is the Thursday Code Puzzler. Now and then there will be a really interesting solution. For example, one puzzler was to count the number of 1's that occurred in a set of integers. ie, {1, 2, 11, 14} = 4 since the number 1 occurs 4 times in that set. One of the solutions used map reduce to come up with the solution. I thought that was particularly neat.

I decided to try out the recent code puzzler for finding the largest palindrome in a string. Here was my first method:

public static int getLargestPalindrome(String palindromes) {

        String[] palindromeArray = palindromes.split(" ");
        int largestPalindrome = -1;

        for(String potentialPalindrome : palindromeArray) {
            if (potentialPalindrome.equals(new StringBuffer(potentialPalindrome).reverse().toString()) && potentialPalindrome.length() > largestPalindrome)
                largestPalindrome = potentialPalindrome.length();
        }

        return largestPalindrome;

    }

It works, but it felt like cheating to use the StringBuffer.reverse() method. Here is my second method:
    public static int getLargestPalindrome(String palindromes) {

        int largestPalindrome = -1;

        for(String potentialPalindrome : palindromes.split(" ")) {
            int start = 0;
            int end = potentialPalindrome.length() - 1;
            boolean isPalindrome = true;

            while (start <= end) {
                if (potentialPalindrome.charAt(start++) != potentialPalindrome.charAt(end--)) {
                    isPalindrome = false;
                    break;
                }
            }

            if (isPalindrome && potentialPalindrome.length() > largestPalindrome) {
                largestPalindrome = potentialPalindrome.length();
            }

        }

        return largestPalindrome;

    }

It works faster than the first method. I'm sure the performance improvement is mainly due to the first method's creation of new StringBuffer objects (and also new Strings for the reversed value) for each potential palindrome, but accessing locations in an array for half the length (worst case in 2nd version) is bound to be less work than (worst case in 1st version) comparing every character in a string to another string.

Saturday, February 2, 2013

Thread pool example using Java and ExecutorService...

Using thread pools is something that is very easy to implement using Java's ExecutorService. The Java ExecutorService class allows you to specify the number of asynchronous tasks that you want to process. Here is an example of an ExecutorService class being instantiated where numThreads is an integer specifying the number of threads to create for the thread pool:
ExecutorService executorService = Executors.newFixedThreadPool(numThreads);
You pass a runnable in the ExecutorService's execute method like this:
executorService.execute(someRunnableObject);
I created a sample method that uses an ExecutorService to similute working on text files. It will move files from the source path to an archive path unless there is a "lock" file found. The "lock" file is an empty file that is named identically to one of the files that is being "worked" on. The lock file is used to ensure that the same file isn't attempted to be worked on by multiple threads. I made this sample because I figured this might be a nice way to handle indexing data in csv files to a Solr server. Here is the method that does the work (which would be very poorly named if it weren't sample code):
public static void DoWorkOnFiles(String sourcePath, String archivePath, int numThreads) throws IOException {

    Random random = new Random();
    ExecutorService executorService = Executors.newFixedThreadPool(numThreads);

    File sourceFilePath = new File(sourcePath);
    if (sourceFilePath.exists()) {
        Collection<java.io.File> sourceFiles = FileUtils.listFiles(sourceFilePath, new String[]{"txt"}, false);

        for (File sourceFile : sourceFiles) {
            File lockFile = new File(sourceFile.getPath() + ".lock");
            if (!lockFile.exists()) {
                executorService.execute(new SampleFileWorker(sourceFile.getPath(), archivePath, random.nextInt(10000)));
            }
        }
        // This will make the executor accept no new threads
        // and finish all existing threads in the queue
        try {
            executorService.shutdown();
            executorService.awaitTermination(10000, TimeUnit.MILLISECONDS);
        } catch (InterruptedException ignored) {
        }
        System.out.printf("%nFinished all threads.%n");
    }
    else {
        System.out.printf("%s doesn't exist. No work to do.%n", sourceFilePath);
    }
}
The SampleFileWorker class looks like this:
import org.apache.commons.io.FileUtils;
import java.io.File;
import java.io.IOException;

public class SampleFileWorker implements Runnable {

    private final String sourcePath;
    private final String archivePath;
    private final int testDelay;

    public SampleFileWorker(String sourcePath, String archivePath, int testDelay) {
        this.sourcePath = sourcePath;
        this.archivePath = archivePath;
        this.testDelay = testDelay;
    }

    @Override
    public void run() {

        try {
            File lockFile = new File(sourcePath + ".lock");
            if (!lockFile.exists()) {
                lockFile.createNewFile();
            } else {
                return;
            }

            File sourceFile = new File(sourcePath);
            String archiveFilePath = archivePath.concat(File.separator + sourceFile.getName());
            File archiveFile = new File(archiveFilePath);

            System.out.printf("Simulating work on file %s.%n", sourcePath);
            System.out.printf("Starting: %s%n", sourcePath);
            System.out.printf("Delay:    %s%n", testDelay);

            try {
                Thread.sleep(testDelay);
            } catch (InterruptedException ignored) {
            }

            System.out.printf("Done with: %s%n", sourcePath);
            System.out.printf("Archiving %s to %s.%n", sourceFile, archivePath);

            FileUtils.moveFile(sourceFile, archiveFile);
            sourceFile.delete();
            lockFile.delete();

        } catch (IOException e) {
            e.printStackTrace();
        }
    }
}