Corporate names at times have special characters as part of the corporate
name, e.g. Café. It seems some filing systems can handle them and others cannot.
What needs to be done for a filing system to allow special characters to be input
saved and searched on?
Dealing with accented characters (e.g. Café) in company information (such as company
name) and in searches can be a difficult problem for BOS and STS divisions in all
jurisdictions. Many current solutions just cannot handle these types of characters
being included as part of an official filing, and cannot properly search for them,
even if they were existent. However, there are solutions.
First, let’s look at how one would generate a special character as part of typed input, e.g. Café, into a web page. Contrary to some thinking, special keyboards are not required. While in the web page, I type the letters “Caf” and then launch the Windows Character Map to find and copy the “é” character. Then I paste that character after “Caf” to create “Café”. (In Windows XP, the Windows Character Map can be found at Start>All Programs>Accessories>System Tools>Character Map.) Of course, there are other ways of creating a word containing a special character (such as in MS Word 2007 by using Insert>Symbol>More Symbols…) and then simply copy and paste the entire word into the web page.
The next consideration is whether the application that is capturing your input will allow you to type a word with a special character, capture it and pass it to the database. For example, amazon.com will allow Café in the search function whereas some other web sites will not allow you to do so. Caution needs to be exercised here as you may think that the application has captured your word with a special character but, in fact, it may only allow you to type it in but what is actually captured by the application and/or stored in the database may be different, e.g. Cafe or Caf. A good indication whether the special character has been captured is to search on it. When searching on Café in amazon.com, it will return items containing both Café and Cafe. So, to handle special characters, the application must be able to capture them and pass them on to the database as well as being able to retrieve them from the database and output them for consumption such as a display.
The database must also be able to handle and store special characters. Although an application may capture a special character, upon passing it to the database, that character may be stripped off or changed by a stored procedure before actually being stored in the database. Also, the database may not be enabled to store special characters. There are, actually, a couple of different ways to address this problem in the case of Microsoft SQL Server 2005 or higher. (This can also be done in Oracle but for illustrative purposes we will only refer to Microsoft SQL Server.)
The first approach involves creating your database with a collation that is declared to be accent-insensitive (AI). By default, most collations are accent-sensitive (AS) which do not allow special characters to be stored. If a database is created with an AI collation, it can not only store special characters properly, but it considers them equivalent to the base character (without the accent in the case of the “e”) for the purposes of search. If the database were created in this way, the application could be written in such a way as to allow the desired behaviors. (This is the case with amazon.com.)
In the second approach, the database already exists—which is likely the case for most state jurisdictions. In this scenario, you can achieve the same results by phrasing your query to use a different collation type. For example: “SELECT * FROM Company where (CompanyName COLLATE SQL_Latin1_General_CP1_CI_AI ) like ‘Cafe’” will match the Café case above. (The techie guys will understand this. Suffice it to say that an existing database can typically be queried in such a way to leverage special characters. Of course, the application source code would also have to be modified to take advantage of this behavior.) It should be noted that if you need full-text searching to work in the same manner, you will need to change the underlying collation to be AI and use the full-text AI features.
If you need specific detail concerning getting your system to handle special characters, give us a call.
(David Moye brings over 15 years of professional software engineering and technical management experience to bear in the role of Chief Technology Officer at CCIS. As CTO, Mr. Moye is directly responsible for the company's technical direction. He provides strategic input regarding software and hardware technologies to investigate and adopt, for both production and business reasons. A recognized industry thought leader in software architecture, he has BS and MS degrees in Computer Engineering from Virginia Tech, and is finishing his PhD in Computer Science at North Carolina State University. His ability to blend academic theory with industry application helps keep CCIS at the forefront of technology. David has worked with various states on their technology platforms, ensuring they meet changing user needs in a cost‐effective way.)
Copyright © 2011 CC Intelligent Solutions Inc. All rights reserved.
