Adam Porat's blog

Monday, August 18, 2014

how to organize a multi-type elasticsearch query

Suppose your search involves quering a parent type as well as one or more of its child types.

Well, it's pretty obvious that you should use a Bool Query to combine the queries of the different document types.

However, suppose each type involves both a query and a filter. How would you then combine all the queries and filters together?

My first attempt at this was to use a single Filtered Query for the entire query, and each type would contribute to the overall FilteredQuery/query and FilteredQuery/filter.
I later found out this model sometimes produced incorrect search results, was complicated and perhaps not so efficient.

The more logical way to do this, is that each type produces its own Filtered Query, and the top Bool Query mearly joins those filtered queries together. This way, each type has it own independant clause, which results in a simple and clean overall query structure. And the search results are always correct, too.

The writer is R&D team leader at Niloosoft Hunter HRMS

Thursday, August 14, 2014

Clarifying elasticsearch TopChildren, "factor" & "estimated hits size"

I found the TopChildren documentation to not be totally clear. So here is my clarification.

The "estimated hits size" (also reffered to in the documentation as "hits expected") referes to the number of child documents hits. That is to say - how many child documents will be looked for in the query on the child docs.

The set of child documents thus found, are then aggregated into parents.

If you asked for 10 parents (query size=10), elasticsearch will use the default factor value of 5, and search for 50 child documents (the "hits expected" as mentioned above). The found documents will then be aggregated into parent documents.

In case several child docs belong to the same parent, the aggregation may result in less parents than asked for. In this case, if there are additional child documents to query, elasticsearch will expand the query to include more child doc, using the incremental_factor parameter.

The total_hits in the response would not be accurate if the "estimated hits size" is less than the number of child documents which actually match the query. The larger the "estimated hits size" is (controlled by the factor parameter), the larger the potentiall total_hits. But this of course hurts performance.

An additional factor to be aware of, is that the x amount of parent documents is the number of docs returned by the TopChildren query itself. This amount may be further reduced by adjacent or higher -level queries/filters.

If this short explanation clarifyed things for you, please leave a comment and let me know :)

The writer is R&D team leader at Niloosoft Hunter HRMS

Wednesday, April 17, 2013

Java static class Vs. C# static class

While static fields and methods in Java and C# have the same function and meaning, static classes are different.

In C#, a static class may contain only static members, and cannot be instantiated into an object. It is also sealed - another class cannot inherit from it.

In Java, a static class is called a static member class. It must be declared inside another class. It is actually a regular class - it has none of the restrictions mentioned above for C# static classes. It is similar to a regular nested class in C#. So what's "static" about it? It can only reference static members of its containing class. This is in contradiction to a Java nonstatic member class, which automatically holds a reference to its containing class via the "this" keyword.

In Java, to make a static class behave more like a C# static class (namely to prevent its instantiation into an object), make its constructor private.

The writer is R&D team leader at Niloosoft Hunter HRMS

Tuesday, August 28, 2012

How to create a certificate for Google Domain Registration

As a developer in Niloosoft HunterHRMS, I needed to register our domain with Google in order to start integrating with Google Docs. As part of the registration process, I had to submit a certificate file. Creating the file has been an unpleasant process, as even Google's instructions are very insuffcient. Therefore, for myself and others, I hereby summarize the exact step-by-step instructions on how to create this precious file.
Google currently requires a pem cert file.

1. Download open ssl to the computer
2. Create 2 environment variables on the computer:
A. Name: RANDFILE Value: .rnd
B. Name: OPENSSL_CONF Value: The full file name of the openssl.cnf file. For example: D:\Program Files\openssl-0.9.8k_X64\openssl.cnf (To reach Envrironment Variables editing window: computer -> properties -> Advanced system settings -> Advanced tab -> Environment Variables button.)
3. Run openssl.exe (only after the environment variables have been created!)
4. Create private key file + certificate file:
(Note: you will be prompted to enter the certificate information. When you are asked for the Common Name, enter the domain - such as www.hrms.me)
OpenSSL> req -x509 -nodes -days 365 -newkey rsa:1024 -sha1 -keyout myrsakey.pem -out myrsacert.pem
5. Create a pfx certificate (used by .Net) based on the pem private key + certificate:
(Note: you will be prompted for a password. It may be left empty)
OpenSSL> pkcs12 -export -in myrsacert.pem -inkey myrsakey.pem -out CertForGoogle.pfx -name "Cert for Google".

Sunday, August 26, 2012

Why WCF always gives CommunicationObjectFaultedException?

I have been frustrated regarding WCF - it always seemed to give the same obscure exception:

CommunicationObjectFaultedException - The communication object cannot be used because it is in the Faulted state

Finally I found out why.

Actually WCF usually throws clear exceptions. You just have to use it right.

I have been using WCF proxys with a C# using statement. And here the trouble lies.

What the C# using statement does is this: it wraps up the using block in a try clause, and in the finally clause it disposes the object given in the using clause. In other words, it ensures the object gets disposed even if an exception occured inside the block.

This is very good for working with files, etc. But with WCF it is a problem. Why? because the WCF proxy's Dispose method actually calls the proxy's Close method. But this Close method has a catch: if the proxy is in a Faulted state, Close() throws the infamous exception CommunicationObjectFaultedException.

So what happens is this: you call a method on a service and get an exception. This puts the proxy in a faulted state. Then the using block calls Dispose on the proxy, and the CommunicationObjectFaultedException is thrown.

A WCF proxy in a faulted state still holds some resources. And to dispose of them, you need to call the proxy's Abort method. This method does not throw an exception if the proxy is in a faulted state.

So the recommended pattern to use a WCF proxy is this:

// Create the proxy object here

try

{

proxy.MyMethod();

}

finally
{
if (proxy != null)
{
if (proxy.State == System.ServiceModel.CommunicationState.Faulted)
{
proxy.Abort();
}
else
{
proxy.Close();
}
}
}

The writer is a .Net team leader at Niloosoft Hunter HRMS

Thursday, February 2, 2012

SQL Server Compact Edition vs. SQL Express Edition

I've tried using SQL Server Compact Edition 4.0 for a certain Hunter HRMS application. The result was slow and I was getting "timed out waiting for a lock" exceptions. I did a lot to try to prevent any locks - close any open connections/DataReaders etc., even use NOLOCK hints, but still these timeouts would appear.
I then changed to SQL Express edition with minimal code changes to accommodate the change. All the problems disappeared - response time was excellent, with no timeouts.
Thought I should share this with the world.

Thursday, January 12, 2012

Table and Table-Cell Style Properties

There is some confusion regarding the different styling possibilities of a table, and which properties should be set on a table element and which on a cell (td) element. So let's clear this up.
The confusion is partly caused due to 2 deprecated html table attributes: cellspacing and cellpadding.
Let's look at the relevant ways to affect a table, and the ways to accomplish them (recommended and deprecated).
1. Control the padding inside each cell
The deprecated way has been to set the cellpadding attribute on the table.
The recommended way is to set the CSS padding property on the cell elements (td or th).
2. Control the space between the cell borders
The deprecated way has been to set the cellspacing attribute on the table.
The recommended way it to set the CSS property border-spacing on the table element.
3. Collapse the cell borders into a single border
This is done using the CSS property border-collapse on the table element.
Note: if you collapse the border, any border-spacing value will be ignored.
4. Set the style of the border of the cells
Style the border using CSS on the cells (td / th).

Addition note: you cannot set the margin on a table cell. It will be ignored. Use border-spacing on the table element instead.

Adam Porat is a team leader at Niloosoft on Hunter HRMS product.