Go back to Read free chapters

How to program Hashtags

Read and understand how to program Hashtags - Hashtags explained

weAll.com Blog - Popular services explained. The first blog entry 3rd January 2016

The use of Hashtags explained

 

The first episode of Popular services explained will today reveal Hashtag's core functionalities and you have a possibility to find out about the functionalities in depth on code level yourself. This blog is created for developers and those who are learning development and wish to know more. 

 

For example many social application's core functionalities are based on the usage of # hashtags. Users can input hashtags inside 140 character messages and those hashtags are used to group and sort messages, search messages and to categorize these messages. Applications also use hashtags in their content organization and searches. By concept Hashtags are database indexed words that are attached to a message. 

 

How hashtags are then built?

 

As a coding example we have the code written in Java. And you can familiarize yourself with the coding examples. 

 

So we do hashtag parsing with Tokenizing the strings. The following code is fully functional in Java and uses the StringTokenizer class to create an array of words out the the text.

 

Each word is then handled in the while section. The while loops each word and the word is assigned to the token variable.

 

StringTokenizer tokenizer = new StringTokenizer(this.text);

while (tokenizer.hasMoreTokens()) {

String token = tokenizer.nextToken();

 

So no we have all the words that we can handle. Then we check if the word starts with a hashtag mark. If the word starts with a # hashtag mark then we remove the hashtag mark. Trim removes the possible whitespace from the word. If the word is not empty, then add the word to the hashstags array.

 

if (token.startsWith("#")) {

// Strip and clean string

token = token.replaceAll("#", "").trim();

if (!token.equals("")) {

hashtags.add(token);

}

 

The hashtag variable is a format of

 

private List<String> hashtags = new ArrayList<String>();

 

Now that you have all the hashtags parsed as a list you can set them to a message object and save it to the database.

 

In database structure when you have a message and it's related to hashtag the structure looks like this in the database level.

 

You have a Message class and a Hashtag class and database tables acccording to those.

 

Message has an messageId and text columns.

 

Hashtag has an hashtagId and hashtag columns.

 

The connection between these two is a Many to Many connection in the database. One message can have many different hashtags and one hashtag can be connected to many messages.

 

So as we do database normalization which means that we have to simplify these kind of relations we can come up with the following solution.

 

We create a new database table called Image_Hashtag and it has only the following parameters: messageId and hashtagId

 

When we use a Java and Hibernate technology we can describe the relation in the Message class simply as following

 

@ManyToMany(fetch = FetchType.LAZY)

private List<Hashtag> hashtags = new ArrayList<Hashtag>();

 

 

When we save the message ”Lorem ipsum parantesis” to the database the tables will have the following values:

 

Message table

messageId           text

1                            Lorem ipsum parantesis

 

Message_hashtag table

messageId           hashtagId

1                            1

1                            2

1                            3

 

Hashtag table

hashtagId             hashtag

1                             lorem

2                             ipsum

3                             parantesis

 

 

So what does this mean from the point of view of the service? It means that the searches to find messages according the the hashtags will be extremely fast because that matching is done in the database level by the primary keys messageId and hashtagId and not by text. Otherwise you would have to go through every message, tokenize the content every time and search for the corrrect keyword from the content, which is very slow.

 

The speed of the searches is the key on the database level with this approach.

 

As a matter of fact the similar structure is used also when a search engine indexes the web page contents and stores the data of the words to a searchable content.

 

That is our next article about, indexing and how search engines works. After that you can build your own search index.

 

What new ideas or thoughts this chapter gave you?