Johan Louwers - Tech blog: September 2007

Monday, September 24, 2007

Bloom filter stuff

Bloom filters is a space-efficient probabilistic data structure that is used to test whether an element is a member of a set. False positives are possible, but false negatives are not. The Bloom filter was conceived by Burton H. Bloom in 1970

Where X is in Y like comparisons can be quite resource intensive, code having comparisons and equations like this can run for days when you execute them against large datasets. Bloom filters can be a big relieve and save you a lot of resources and time.

In Perl for example this is a lookup hash, a handy idiom for doing existence tests:

foreach my $e ( @things ) { $lookup{$e}++ }

sub check {
my ( $key ) = @_;
print "Found $key!" if exists( $lookup{ $key } );
}

When running this against a small set of data and in a situation where time is not a very big issue this will work fine. However if one or possibly both are against you you might want to use a bloom filter. In Perl this would look something like this:

use Bloom::Filter;

my $filter = Bloom::Filter->new( error_rate => 0.01, capacity => $SONG_COUNT );
open my $fh, "enormous_list_of_titles.txt" or die "Failed to open: $!";

while (<$fh>) {
chomp;
$filter->add( $_ );
}

sub lookup_song {
my ( $title ) = @_;
return unless $filter->check( $title );
return expensive_db_query( $title ) or undef;
}

An empty Bloom filter is a bit array of m bits, all set to 0. There must also be k different hash functions defined, each of which maps a key value to one of the m array positions.

For a good hash function with a wide output, there should be little if any correlation between different bit-fields of such a hash, so this type of hash can be used to generate multiple "different" hash functions by slicing its output into multiple bit fields. Alternatively, one can pass k different initial values (such as 0, 1, ..., k-1) to a hash function that takes an initial value; or add (or append) these values to the key.

For larger m and/or k, independence among the hash functions can be relaxed with negligible increase in false positive rate (Dillinger & Manolios (2004a), Kirsch & Mitzenmacher (2006)). Specifically, Dillinger & Manolios (2004b) show the effectiveness of using enhanced double hashing or triple hashing, variants of double hashing, to derive the k indices using simple arithmetic on two or three indices computed with independent hash functions.

To add an element, feed it to each of the k hash functions to get k array positions. Set the bits at all these positions to 1.
To query for an element (test whether it is in the set), feed it to each of the k hash functions to get k array positions. If any of the bits at these positions are 0, the element is not in the set – if it were, then all the bits would have been set to 1 when it was inserted. If all are 1, then either the element is in the set, or the bits have been set to 1 during the insertion of other elements.

Unfortunately, removing an element from this simple Bloom filter is impossible. The element maps to k bits, and although setting any one of these k bits to zero suffices to remove it, this has the side effect of removing any other elements that map onto that bit, and we have no way of determining whether any such elements have been added. The result is a possibility of false negatives, which are not allowed.

Removal of an element from a Bloom filter can be simulated by having a second Bloom filter that contains items that have been removed. However, false positives in the second filter become false negatives in the composite filter, which are not permitted. This approach also limits the semantics of removal since adding a previously removed item is not possible.

However, it is often the case that all the keys are available but are expensive to enumerate (for example, requiring many disk reads). When the false positive rate gets too high, the filter can be regenerated; this should be a relatively rare event.

Bloom filter in:
Perl
C/C++
Ruby
Java

Monday, September 17, 2007

Create apps accounts from database.

When working as a consultant in different systems you have from time to time the situation that you have a account to the oracle apps schema but you do not have a applications login.

There are a couple if things you can do in this situation, (1) ask a DBA to create a account for you, (2) create the account yourself by running a PL/SQL script. Running the following PL/SQL script is in most cases NOT a smart thing to do because you circumvent the security protocol of a customer. However, in cases you are working on a test environment in your own company rules can be somewhat less tight than this, meaning you can run this without any problems.

Please note that before you run this script you migt want to change some things to your own taste so the script generates a user account whith the username you prefer. In any case a APPLICATION_DEVELOPER responsibility is assigned and a SYSTEM_ADMINISTRATOR responsibility is assigned.

Download oracle_create_applications_account.sql

Thursday, September 13, 2007

Tuning custom Oracle application.

In some cases it is possible that, after you created your custom Oracle application, the performance is not what you and your user community had in mind. You could just start debugging and looking for ways to redo your code immediately and directly start looking into your code. Most likely you have a idea of which operations take the most time and which operations take the most time in the opinion from the users.

It is however wise to do some research first. The first thing you would will most likely like to know is which packages and tables are used the most in your application. For this you can use the following queries:

This query will report on the top 10 tables that where you do a insert, select, update or delete or possibly create a lock on. This can be a good starting point, it could be wise to create for example a index or check if the table might contain a lot of “junk” data which actually could be placed in a history table. By keeping the data in your table limited you good increase some of the speed of your application because a possible full table scan has less data to look into. oracle_top_10_tables.sql

After you created a overview of the tables that are mostly used and maybe after you tuned some of the tables it is wise to take a look at what procedures are used the most. By tuning those you might also gain some performance. This query will look for the mostly used functions, packages, package bodies, procedures and triggers. oracle_top_10_procedures.sql

Having this you can start looking into your code on where you might expect to be gaining some performance. Before you do so you might want to know which SQL statements are intensively used.

By using the following query you get a the top 10 SQL statements by there buffer gets, oracle_top_statements_by_buffer_gets.sql
And the top 10 of SQL statements by the number of disk reads, oracle_top_statements_by_disk_reads.sql

By having this information you have a good starting point to work on the tuning of your custom made Oracle application.

Wednesday, September 12, 2007

Oracle XML Publisher and XDODTEXE executable.

By using XML publisher you can easily create reports and documents which can be started from a concurrent request within Oracle Applications. To enable your concurrent request to do so you need to have the executable XDODTEXE, a Java Concurrent Program as a executable in your concurrent request.

It could however happen that this executable is not available, there is a threat on Oracle forums where this is discussed. The reason for this is most likely that you do not have applied the installation of XML Publisher 5.6.0. Even do you might even be on version 5.6.3 you first have to have installed version 5.6.0. In a ideal situation Oracle would warn you to install 5.6.0 first because only this version contains a installation of XDODTEXE however this is not happening.

If you are missing XDODTEXE as a executable you can check the installed version by using this query:

SELECT
DECODE (bug_number
, '3554613', '4.5.0'
, '3263588', 'XDO.H'
, '3822219', '5.0.0'
, '4236958', '5.0.1'
, '4206181', '5.5.0'
, '4561451', '5.6.0'
, '4905678', '5.6.1'
, '5097966', '5.6.2'
, '5472959','5.6.3')
FROM
ad_bugs
WHERE
bug_number IN( '3554613'
,'3263588'
,'3822219'
,'4236958'
,'4206181'
,'4561451'
,'4905678'
,'5097966'
,'5472959')

If 5.6.0 is not in the list, even as a higher version is, you need to install 5.6.0 and apply all the patches upwards again on your system. This will make sure you have XDODTEXE available and you are on the correct patchlevel again.

Tuesday, September 11, 2007

Spawn concurrent request from pl/sql

When developing pl/sql code for oracle applications you would like to start concurrent requests directly from a pl/sql package. To do so Oracle has provided the option to use fnd_request.submit_request. Fnd_requests is a package in the apps schema.

Please note the example below where we use fnd_request.submit_request to start the concurrent request OEOIMP.

v_request_id := fnd_request.submit_request('ONT','OEOIMP','Order Import','' ,FALSE,'','','','N','1','4','','','','Y','N',CHR(0));
COMMIT;

IF v_request_id > 0
THEN
FND_FILE.PUT_LINE(FND_FILE.OUTPUT,'DEBUG, Successfully submitted');
ELSE
FND_FILE.PUT_LINE(FND_FILE.OUTPUT,'DEBUG, Not Submitted');
END IF;

The parameters needed to start submit_request are the following:

- Application (varchar2) (shortname of the application)
- Program (varchar2) (shortname of the concurrent request)
- Description (varchar2) (description of the concurrent request)
- Start_time (varchar2) (time to start, if null then immediate)
- Sub_request (Boolean) (is this a sub request true/false)

And you have argument1 until argument100. You could enter all the 100 arguments even if they are null or you could use after the last argument you need the following: ,CHR(0) By using this there will only by used the arguments you need and you do not have to fill up the number of arguments with null values.

To find out all the short descriptions of all the concurrent requests you can use this query:

SELECT
conpro.CONCURRENT_PROGRAM_ID
,conpro.CONCURRENT_PROGRAM_NAME
,conpro.DESCRIPTION
,conpro.APPLICATION_ID
FROM
FND_CONCURRENT_PROGRAMS_VL conpro

To find all the short names of the applications on your system you can use this query:
SELECT
appview.APPLICATION_ID
,appview.APPLICATION_SHORT_NAME
,appview.APPLICATION_NAME
,appview.DESCRIPTION
FROM
fnd_application_all_view appview

Unable to ship confirm

Problem description: unable to Ship Confirm in Oracle Order Management Super User. In some cases the ‘Ship Confirm” button is grayed out.

This is in the ‘Shipping Transactions’ screen under the Delivery tab. The ‘Shipping Transactions’ screen can be found under ‘Order Management’. Shipping -> Transactions.

The reason for this is that your user account is not listed under the ‘Shipping Execution Grants. To enable the button you have to list your user account here. You can find this under ‘Oracle Order Management Super User’. Menu: Setup -> Shipping -> Grants and Role Definitions -> Grants.

How America searches: Mobile

More and more people are using mobile devices to access the internet. Getting directions, checking your e-mail, reading files and websites…. As of March 2007 the number of wireless subscribers has climbed to nearly 234 million, reaching more than 72 precent of the total population, according to industry tallies by CTIA The Wireless Association. With mobile devices on hand throughout the day and the number of mobile internet users topping 20 million, wireless is beginning to deliver on its long-held promise of becoming the “third screen”.

The digital marketing agency icrossing has delivered a comprehensive report on the figures of mobile internet use and they way of searching on the American region.

Sunday, September 09, 2007

Supercomputing By Reservation

Supercomputers keep growing ever faster, racing along at the blazing speed of nearly one petaflops – 10 to the fifteenth, or one thousand trillion calculations per second – equivalent to around 250 thousand of today’s laptops.

In contrast, the experience of a computational scientist can be anything but fast -- waiting hours or days in a queue for a job to run and yield precious results needed for further steps. The unpredictably of queues can impede the course of research, slowing progress with unexpected periods of waiting.

To address this problem, the San Diego Supercomputer Center (SDSC) at UC San Diego has released version 1.0 of a new User Portal, featuring an innovative user-settable reservation system that gives researchers more control over when their jobs will run on the center’s supercomputers. The service, not previously offered in high performance computing centers, is debuting on SDSC’s DataStar and TeraGrid Cluster systems.

“We’ve had a lot of feedback in user surveys asking for faster turnaround time,” said Anke Kamrath, director of User Services at SDSC. “While we couldn’t eliminate the queue, especially on popular machines like DataStar, we realized that a service that lets users themselves schedule ‘windows’ of reserved time would let them complete jobs more reliably and get more done.”

The reservation system can make computing more efficient in various situations. For example, a user with a large allocation may start a full machine job that will run for a day, only to find a minor problem causes it to quickly fail. Instead of being able to simply fix the problem and restart, the user is faced with going to the end of the queue and again waiting hours or days for the job to run. With SDSC’s new User Portal, this user can now easily set a reservation for a full-machine job, ensuring that they can complete the job in a timely way, even if minor problems occur.

Another research group may be debugging a new code. To do this they need to run many short jobs in succession, working as a team to troubleshoot the results of each run, and then trying again.

But each time they want to restart the code, they have to sit in the queue, potentially wasting many hours as the group awaits the results of each run. Using the reservations feature in the portal, the researchers can now schedule several hours of machine time for multiple debugging runs, making efficient use of the team’s time.

Other researchers may need to be sure they run in conjunction with a scheduled event such as observing time on an electron microscope or other instrument. Efforts are also underway to use this capability to support the co-scheduling of jobs to run across TeraGrid-wide systems.

“SDSC’s User Portal offers a clean interface that shields users from the complexity of the underlying service,” said Diana Diehl, who leads SDSC’s Documentation and Portals group. “Just like an airline reservation system makes intricate arrangements in a few minutes for travelers at their computers, SDSC’s reservation system carries out complicated tasks to arrange the supercomputer reservation, making sure that it follows policies, doesn’t disrupt jobs currently in the queue, interfaces with the user’s account, and allows time for preventive maintenance.”

While users have always been able to reserve time manually, the process can be slow and cumbersome. SDSC’s new user-settable system democratizes access to reliable computing, letting any user log in with either their TeraGrid or SDSC account and easily reserve time themselves. Rather than carving up the machine among various pre-selected users, this approach allows users to reserve up to full machine runs, encouraging use of the power of the full supercomputer to advance science into new realms.

The new user-settable system has been carefully designed to provide reservations that are in balance with existing jobs in the queue, and reservations carry a premium cost over jobs run without a reservation.

Based on GridSphere, the portal offers a Web interface to accomplish tasks such as running jobs and moving data that would ordinarily require complex command-line scripts. In the future, more features will be added to the User Portal through portlets such as accessing the SDSC Storage Resource Broker (SRB) data management system, the HPSS archival tape storage system, and visualization tools.

“It was an enormous task to create such a complex system,” said Kamrath. “It required teamwork among groups from Documentation to Production Systems across the center, and couldn’t have been done without SDSC’s large pool of expertise in a number of areas.”

The large team required to create the SDSC User Portal and user-settable reservation system includes, in management and development, Anke Karmrath, Diana Diehl, Patricia Kovach, Nancy Wilkins-Diehr, as well as Fariba Fana, Mona Wong, Ken Yoshimoto, Martin Margo, Andy Sanderson, J.D. Bottorf, Bill Link, Doug Weimer, Mahidhar Tatineni, Eva Hocks, Leo Carson, Tiffany Duffield, Krishna Muriki, and Alex Wu; in testing, Subha Sivagnanam, Leon Hu, Cuong Phan, Nicole Wolter, Kyle Rollin, Ella Xiong, Jet Antonio, and Shanil Daya.

Note: This story has been adapted from a news release issued by University of California - San Diego.

Corporate blogging.

Blogging turns out to become a more widely used marketing tool for companies. Not only big companies are starting to use corporate weblog solutions but it also enables smaller companies to enhance there online marketing with weblogs. In most cases people from management and the thought leaders of companies are asked to maintain a corporate weblog. In some cases the bloggers are completely free in what they place on there weblog in most cases the bloggers are support by marketing teams who help them create there posts and keep a eye on the corporate value of a post.

In general there are some things to keep in mind when starting a corporate or non-corporate weblog.

- Transparency is Key
- Develop a Community
- Be Consistent
- Make a Policy
- Be Committed
- Acknowledge Faults and Missteps
- Take the Good with the Bad
- Ensure Weblog Usability

Keeping these guidelines in mind will help you to make a success of your corporate weblog. Companies can benefit from weblogs in several ways. They get more exposure which is in the opinion of a marketing department always a good thing, they have the ability to give thought leaders a platform to expose the knowledge they have a so show what knowledge the company has itself. There is a option to create a community of devoted readers and get feedback from this community. And those are just some of the advantages that are there. Also a big plus is that most bloggers get a hang of it and start to enjoy it and after some time do not consider this as part of a job but more part of fun.

Some good examples of corporate weblogs can be found here:
MacroMedia corporate weblog: http://weblogs.macromedia.com
Sun microsystems corporate weblogs: http://blogs.sun.com/
Weblogs from people working at IBM development: http://www-128.ibm.com/developerworks/blogs/

As this trend is getting more and more speed there is also done more research, please find some interesting research papers here:

A research paper from Durbin Media Group: CORPORATE PRIMER ON BUSINESS BLOGGING
E-Business Consortium, University of Wisconsin-Madison : Corporate Weblogging Best Practices
Lewis 360 : the business value of blogging
Bloomberg Marketing : Blogs: Beyond A Corporate Handshake