I checked out of my hotel this morning and drive to Phipps Conservatory and Botanical Gardens. I spent about 2 hours viewing all the plants. I took pictures of many pretty colourful plants. I visited every room except “The Stove Room”; the heat was beyond my tolerance.
Afterwards, I drove home. It was a long journey. I took a total of five breaks; one per hour. It is good to be back home. I finally unloaded the car and unpacked everything.
Tomorrow, I shall spend the Labour Day holiday shopping and having lunch with my family. Then it is back to work on Tuesday.
The science centre is located next to Heinz Stadium and there were people attending the game. Getting to the stadium was a bit of an adventure. Many of the local streets were closed. I had to use the main roads and drive in queue with the stadium traffic.
I spent most of the day at the Carnegie Science Center. I got to watch all the different planetarium shows and OMNIMAX shows. I also got to see their humongous model train display occupying an area of 83 feet by 30 feet. It is an impressive recreation of Forbes Field with thousands of animations.
Afterwards, I drove to the Phipps Conservatory and Botanical Gardens. Unfortunately, I got there after closing hour. Next, I ice skated at PPG Place then dined at a Chinese resturant.
Tomorrow, I shall visit the Phipps Conservatory and Botanical Gardens again.
I toured Carnegie Mellon University today and found a few interesting things, such as:
[list:b395fdb955]The university has a kilt marching band known as The Band without Pants. One of the band members is completing his major in bagpipe playing.
In one building, there is a very long inclined floor. If you put two chairs together and slide yourself down the high end, you will reach a speed of 25 MPH by the time you reach the other end.[/list:u:b395fdb955]
Afterwards, I attempted to drive to Denny’s for a quick dinner. One of the roads was closed. I tried to find an alternate route with my TomTom but it kept me driving around in circles. It was frustrating. Eventually I gave up on Denny’s and found a small family operated Italian resturant. Everything was home made and it was delicious. So the day ended well.
Tomorrow, I visit the Carnegie Science Center.
Hello from Pittsburgh! :wave:
I took a leisurely six hour drive across Pennsylvania. As I was driving west, a rain storm was moving east. Visibility was terrible for about one hour but eventually I drove pass the storm and arrived in Pittsburgh.
I arrived at the hotel just in time for their complimentary barbecue. The accommodation is rather nice. The room has a small kitchen. After the barbecue, hubby and I took a short walk into the centre of town. We found a small convenience store and a Chinese resturant. Now we are having a late night supper and watching a film. Tomorrow, we will tour Carnegie Mellon University.
I got started on another new project. My office has separate systems for each major product that is produced and delivered to clients. In this system, just like the earlier one, it has reached its peak processing capacity and will not be able to deliver reports in time unless additional hardware is added to distribute the database and the processing.
In the first system (grid computing), only the problem of additional processing capacity need to be resolved. However, in this system, both additional storage and processing capacity need to be resolved. The solution used in grid computing is based on a architecture where one has dedicated data servers and dedicated processing servers. No data is physically landed on the grid. It passes through in network memory into the grid for applying computational and then is passed through network memory for storage on the data server.
In this system, the technology used required collocated data and processing. The earlier solution will not work here. To address this problem, addition hardware needs to be added to the system and the existing database and processing must be distributed across the servers.
I will be working with a group of other experts to determine the best way to slice the data and processing so that the products may be produced as efficiently as possible.
I have started work on a project that deals with enormous amount of data that contains personally identifiable information. To use this data properly, personally identifiable information must be removed appropriately before any reports may be released. For European data, the Safe Harbour requirements must be enforced. For the United States, each state legislates its own restrictions.
I have been asked to design a process that handle all these requirements and a reasonable number of legislative variations and produce a set of reusable data restriction filters than may be used in the data factory to restrict or reveal personally identifiable information as required to legally satisfy the requirements of various educational, commercial and government institutions.
This week, I have attended 3 or 4 meetings daily. Experts from dozens of departments are giving me their assessment of how the regulations influence the processing requirements. I have already drawn three preliminary designs, all on the back of a paper napkin. Hopefully by the end of next week, I can turn my design over to a technical writer and produce a first draft architecture document.
Working with QA, three attempts were made to generate 22 billion transactions in the test environment. Unfortunately, the test environment did not have a sufficient base and only 80 million transactions were produced. I thought of submitting a request to copy live production data into test to get the desired volume. Unfortunately, there is an on-going production cycle producing reports for clients and no one can afford the risk of interrupting delivery.
I suggested to my team that they extract a minimum number of valid key attribute values from production to remap each key attribute in the 80 million transaction file to 300 new keys. Thus expanding the transaction volume to 24 billion transactions and maintaining sufficient data integrity to perform a valid test.
With the resources I had available, it took 25 hours to generate the data. Now I can run a performance benchmark from home during the evening (it is the only time I can get full use of the hardware without disrupting other work my team is working on).
The Linux grid has been working well in production. My development team has completed the conversion of all the legacy processes to utilize the grid, and the QA team has engaged the development team to assemble a regression test plan. I expect they will need about two months to complete all testing.
In addition, increases in transaction volume, new privacy regulations, and new product features have increased the processing requirement for production. To meet the increase demand, 16 additional Linux boxes will be added to the production gird (for a total of 20 servers each with 4 AMD Opteron CPUs). The UNIX admin group have spent most of this week building, configuring and testing theses new boxes. A special scheduler add-on is also being developed and tested to dynamically manage the allocation of multiple Linux boxes to multiple distributed processes based on processing volume and priority.
Another lesson learnt by the team from this project is that the ability to improve performance can be severely limited when you cherry pick hot spots to improve instead of taking the time to review the overall architecture and optimise the processing from the source. Although one can make these processes run faster by distributing them over higher performance hardware and more hardware, there is a significant amount of redundant processing that may be eliminated if the team can revamp the entire system and perform a regression test. However, this approach allows me to meet production schedule requirements. I do have a plan for a total revamp of the system. This activity will take place in January through June, when production demand is lower.
This week, I was running a performance benchmark from home. Using a VPN connection, I accessed my Linux grid from home and started loading test data. For reasons unknown, the applications timed out when it was distributing data over a fibre network. The UNIX admin could not resolve the problem and avoided it by switching to a copper network. For my test environment, there is no noticeable different in performance. However, in the production system the fibre network will be needed to provide greater bandwidth for smoother movement of data among 20 servers.
This is my last holiday for the summer season. I am spending the last week of August in Washington D.C. and Baltimore, Maryland.
I have been driving on and off for about six hours today. I finally made it to the hotel. I am staying at a Holiday Inn in Washington D.C. The hotel is adjacent to Homeland Security.
For the first time, I decided to leave my laptop home. Instead of a laptop, I brought my USB flash drive with me. It has FireFox U3 and other software installed on it. All I have to do is plug it into the hotel’s PC to access the Internet with all my saved bookmark and cookies. This was my plan and it works! I am typing this blog through FireFox U3 running off my USB flash drive.
This week, I plan to so some sightseeing and take a campus tour of Georgetown University. Once that is done, I shall head for Baltimore, Maryland to visit the aquarium and take a campus tour of John Hopkins University.
I am currently leading a project to convert existing reporting applications from running on a single HP enterprise server (16 CPUs, upgradeable to 64 CPU) to a grid of Linux boxes. It is a new and exciting effort for me, and I am primarily responsible for making technical decisions and coordinating activities across the organization.
The decision to budget money for a technical infrastructure change was justified by showing senior management through benchmark performance testing that scalability at a substantially lower cost/performance ratio may be obtained using the grid architecture, and the support of production control/capacity planning analysis revealing that the allocated resources on the existing architecture will be exhausted before the start of the third quarter.
The UNIX admin group setup four Linux boxes (4 AMD Opteron CPUs each) and four small HP server (2 AMD Opteron CPUs each) on an isolated fibre network segment. This isolates all the I/O between these five machines from the general network segment.
The small HP server acts as a control node. It partitions the data and distributes the work across the four Linux boxes, and forward the computational results of the Linux boxes to a report formatting application. The initial production benchmarks were encouraging. The new grid architecture was able to process the same volume of data (100 billion transactions) in 25% of time required by the old centric architecture (16 hour processing window reduced to 4 hours).
To date, the most critical bottleneck in the production process has been converted over to the new grid architecture. My team successfully completed a rigorous production install validation this week. This is a significant first step because it means that manufacturing will have sufficient computational resources well into next year.
Starting next week, my team will double in size (from 5 to 10 people) and take on the remaining work. The new team members have been trained in the new technologies, now I need to review the conversion and transition plan with them.
There many things learned during this project. I am only going to cover the top two.
On the technical side, my team discovered that data partitioning may be an implicit function of a data source such as an Oracle database table. Oracle partitions data using an independent algorithm from the external partition that is use against data from flat file sources and streams. This independent algorithm distributes the data differently. For any parallel data bridging to work successfully, data from all sources must be partitioned using the same algorithm. Repartitioning of data is needed to support utilization of a scalable Linux grid. So in order to avoid being cornered into a big bang approach to deployment, my team will use run-time selectable conditional repartitioning to allow the converted processes to execute N-N-N ways (use of the grid without repartitioning) during the transition period, and then run N-M-N ways (use of the grid with repartitioning) after the transition period. My team will be validating this approach next week.
On the non-technical side, there are many people in support, operations, and production control who recognise the need to make these changes in order to survive. However, the new changes present new administrative processes. They currently do not see how they fit into the new production processes to start taking custodianship of the new administrative processes. I think I shall ask a business architect to resolve this organizational change issue through a series of facilitated workshops.