My name is Chelsea Hicks, and I am a senior at the University of Texas at San Antonio. I am majoring in Computer Science concentrating in Software Engineering and Computer and Information Security. My research interests are Computer Security, Data Mining, and Information Retrieval.


At REUIR, I am working with Dr. Ngu and Matthew Scheffer on our project: Domain Specific Deep Web Discovery and Cataloging.

I will be working on this project from June 1st through August 3rd, 2011.

"Domain Specific Deep Web Discovery and Cataloging"


In this project, we investigate a new paradigm in the retrieval and discovery of Deep Web sources based on a rich service class description. This will lead to the development of an automated utility for integration and access to a large number of dynamically changing deep Web sources. Today's search engines can index and retrieve surface Web sources, which are static html pages on the Web.  But they cannot retrieve content in Deep Web sources, which are dynamically generated html pages from searchable databases or documents. The domains we are concentrating on are airfare and people searching.

- This description was taken from REU's website and edited to be a bit more specific.


We are currently in the finalizing stages of our project. We have determined we will not be able to fully complete this project and more work will need to be done by next year's REU, but we have developed a framework and a working prototype to show that this does indeed work.


Below you will find a blog of what I've learned in REUIR and my experiences. I will try to update daily, but this may not always be possible.


July 4-July August 1st, 2011

I apologize for the huge time span covers, but the research at this point is mainly making it work. It is currently working for some websites, and will need to be improved by next years REU. However, I do believe we have enough working to show that the framework does work. So we are currently working on writing the paper, making the presentation, and testing. Final presentations are August 2nd!


Interesting things that have happened:

July 19, 2011: Graduate Panel - good advice on when/how to apply to graduate school.

July 22, 2011: Guest Speaker Kriste Grauman from UT Austin, who spoke about Computer Vision. I really enjoyed her lecture! I'm just not sure if its research I'm interested in.

June 29 -July 1, 2011

Back to work! Time to work on debugging more and getting it to work with more sites.

June 28, 2011

Today we visited Southwest Research Institute in San Antonio, Texas. I had an even better time today then I did yesterday, as there were a lot more tours then there were at IBM. We got to see a lot of things, including robots! Their research seems very interesting! They had a great cafeteria too.


June 27, 2011

Today we visited IBM's location at Austin, Texas and I had a great time. The morning I found a bit boring as it was mainly lectures about why we should work there, but in the afternoon we got to tour IBM and it was lovely! I got to see their new supercomputer that has now been announced, their servers, and some of their testing areas. It was pretty neat, and I'd love to go again!

June 20-24, 2011

This week we've been concentrating on extract labels and forms in a semi-automated way. It's taking a while because I have to figure out the patterns that can exist, then code them so later on they can be extracted a long way. After I have some of it done, Matt works on getting it to work with his code. This week has been slow but some progress has been made.


June 18-19, 2011

It's the weekend! I relaxed, and Matt was able to get Selenium working for the most part. We'll work out the kinks on Monday.

June 17, 2011

Today, we just tried getting a driver.java that Dr. Ngu gave us working while trying to get HttpUnit working. After presentations, we decided at David's suggestion to change from HttpUnit to Selenium. Hopefully this will work a bit better.

June 16,2011

The sample SCD is working by the end of today! Matt's working on HttpUnit, but it's hating any website with any javascript. Maybe we'll have better luck tomorrow.

June 15, 2011

Today is halfway through the week. I am working on a test SCD, so we can test it to see what we need to change for our domain. Today will be a lot of research through websites to see what they have in common.


June 14, 2011

Today we yet again changed our domain, so we will be able to compare our results to previous research. We are now working on Airfare, and hopefully we can stick to this domain. It'll be important at the end of this to have data to compare to, and this way we should be good to go. Today is a day to catch up on some reading and research.


June 13, 2011

Today Matt and I re-evaluated our domain again, changing from video games to medical diseases, in hopes that it'll be easier to find deep web sources for this. We have decided to use Crawler4J for our crawler, and messed around with it today. Tomorrow it'll be time for reading, learning how to use HTTPunit in java, and trying to figure out some XML!


June 11-12, 2011

Weekend time! I spent the weekend relaxing and having fun. As always, it felt too short!


June 10, 2011

Today is a bit more of a disorganized day. Two hours after arriving to work, Dr. David Buttler gave us a lecture regarding Large-Scale Text analysis at LLNL. It was pretty interesting, and he does a good job presenting material. At 2pm today, all of us will have to give presentations on what we've learned so far. I'm dreading this a bit, but it'll be a nice break from research as I'm getting stuck. I'm looking forward to this weekend, thats for sure!

June 9, 2011

Today was more work regarding the crawlers. Crawler4j is up and going, and now nutch and tomcat are playing nicely. Now I have to see if I can get Nutch to cooperate with Solr, but we are doing test crawls and they seem to be going well. I'm at a bit of a loss what we'll be doing soon, but hopefully ideas will come to us. Also, earlier today at 11am, Seth defended his thesis, and I thought he did a really good job. I hope the professors agree too!

June 8, 2011

Today was another day full of bugs and problems. Ubuntu is finally up and running, and by the end of today we got the two crawlers to work. We're practicing and seeing how the crawlers work. Probably the rest of the week will be seeing how the crawlers work, and getting them to be more selective as we need. However, Nutch isn't working 100%. The crawler works, but Tomcat doesn't want to be nice! Hopefully this will be resolved tomorrow.


June 7, 2011

Today was a day full of bugs and errors. There was a lot of trial and error to try to get Ubuntu up and going, while trying to work on getting our two crawlers to work. Today is just a day of debugging.

June 6, 2011

Today has been a lot of basic research. Matthew and I met up with Dr. Ngu to get an idea of what we want to do. We decided we are going to try to improve DeepPeep through a few ways, so hopefully we'll be successful. We've run into some problems of how the linux servers here seem not to have ant, and we can't figure out how to install it. We've tried sshing into our local university's servers, but the difference in java and ant versions may be contributing a problem. We almost have some code working, but we need to figure out the order of arguments first. After we have this code working, we can begin ways on merging some code Dr. Ngu made and DeepPeep to see if we can get better results. Hopefully things will work out.

So other than debugging today, there's been lots of research papers to read. I'm looking forward to when we're a bit further along, as this is a slow start!

June 4-5, 2011

This weekend was pretty fun, I just went shopping and catching up on my rest. Sunday night I headed back up to REUIR to get ready for Monday!

June 3, 2011

Today was filled with the four mentors discussing what exactly their research is, and what projects we could work on. Dr. Gao informed us that we could switch mentors if we decided to, and to make our decision after today to hear the various topics. Dr. McKenny discussed spatial databases, Dr. Gao discussed searching just text pages such as craiglist and compiling it into clusters, Dr. Ngu discussed Deep Wep Searching, and Dr. Lu discussed machine learning.

From these topics, I'm glad that I was assigned Dr. Ngu. I'm becoming more and more interested in deep web searching, and I'm going to try to implement machine learning with Matthew so we can publish a research paper. I think I'll be pretty happy researching this, but only time will tell.

At 3pm, Seth Orell, a Graduate Student, gave us a lecture over the Lucere Engine, which seems pretty interesting. I personally don't see a use for me yet, but I can see how it's important in general. He gave a very brief overview, and I'll have to try to research it some more sometime.

After the lecture, we all met up at Rio Vista Park and had a Welcome Party for about two hours. The guys mainly played sports while I talked to people who were taking breaks. Afterwards, I drove back home to San Antonio and started my weekend!


June 2, 2011

Today things started at 10am, in the meeting room. The day started off with a lecture of data mining and information retrieval by Dr. Gao. This lecture was only supposed to last two hours, but it actually took 3 hours. We took a hour break at noon for lunch, and resumed afterwards. This lecture was a bit confusing from just the amount of material we covered, as it is two classes worth. However, he e-mailed us the slideshow so I'm sure after some more reviewing, that I will have a better grasp over the material. I believe that I understand the very basics that he went over.

At 2pm, Dr. Lu gave us a hour long presentation on Machine Learning. I didn't know what machine learning was before, but this lecture was very brief and introductory and very interesting. She gave us the four types of machine learning: learning associations, unsupervised learning, supervised learning, and reinforcement learning. She gave many visual examples, and for unsupervised and supervised learning she gave us youtube examples. I hope I can find a way to implement one of these concepts to my research.

At 3pm, Dr. Ngu gave us a one hour lecture on how to conduct research, then a one hour lecture on research ethics. This was very informative, and for the ethics we have handouts to read. I am planning on reading them this weekend, as I would like to have a better idea of what to avoid doing once I start actually researching.

At 5pm, I met with Dr. Ngu and Matthew, where decided we needed to brainstorm to figure out what exactly we were going to research. I hope that we have an "aha" moment soon, as I would like to begin researching next week!

I'm very excited, and tomorrow I'll hear overviews from all the professors on their topics and afterwards, there will be a welcome back party that I'm looking forward to!


June 1, 2011

Today was the first day of REUIR. I arrived at 11:45am or so, getting caught in traffic and a bit lost in San Marcos. Today was an orientation day of sorts.

When I arrived, Dr. Ngu was beginning her presentation on the REU's program overview, and what was expected of us. I will be working Monday-Friday from 9am-5pm, and may need to work some weekends in order to get the research complete. There are many different goals for us to achieve such as: Switching from a different major to Computer Science, Writing a research paper, becoming a good researcher, going to graduate school and so on. Achieving any of these goals is considered being a success in REU. My main goal at this time is to write a research paper within the 9 weeks at REU, which may be hard but I believe it will be worth it.


We took lunch at 12:30pm for a hour. We arrived back in the meeting room at 1:30pm, where Phil Tracy gave us an overview of the Technology available on campus, how to sign in, and all other relevant information. Then our graduate Mentor, David Anastasiu, told us how to edit our student web page.

At 2:30pm we were guided to our offices, which we share with our groups, and made sure our office key worked. Then we worked on getting our logins to work, which took some work since we weren't quite entirely sure how to do it. However, we successfully changed all of our passwords by 3pm.

At 3pm David brought us to our dorms to make sure our keys work, and we got a door key to get inside the building. I'm quite excited as I have a room to myself, as I am the only girl at REUIR. After everyone tested their keys, we went the long way to get our parking permits so David could show us around campus. After we got our parking permits, we were free to take off and enjoy the rest of the night.

Today was a good day as I learned a lot on how the program is going to go, and I'm excited for the rest of the program!